"If the search engine maintains a dynamic index that allows updates (e.g., document insertions/deletions), then it may even be possible to carry out the updates in a distributed fashion,in which each node takes care of the updates that pertain to its part of the overall index. This approach eliminates the need for a complicated centralized index construction/maintenance process that involves the whole index. However, it is applicable only if documents may be assumed to be independent of each other, not if inter-document information, such as hyperlinks and anchor text, is part of the index."
The indexing of documents is complicated than what we thought. There are too many kinds of documents. For each kind of documents, the index would be different from each other. Thus, the index methords need to be considered in so many different angles in order to meet the requirement of so many documents as much as possible. In addition, while meeting different documents, it is necessary to develop different indexing methords to treat different documents. For instance, how to avoid the link broken, how to extract foreign language words, etc.
没有评论:
发表评论