Anatomy of search engine

Query Interface

The query interface is the page that users see when they navigate to a search engine to enter a search term. The interface was a simple page with a search box and a button to activate the search. One other option users have for customizing the interfaces of their search engines is a capability like the one Google offers.

Crawlers , Spiders and Robots

These are programs that literally crawl around the Web, cataloging data so that it can be searched. In the most basic sense all three programs — crawlers, spiders, and robots — are essentially the same. They all “collect” information about each and every web URL.
This information is then cataloged according to the URL on which they’re located and are stored in a database. Then, when a user uses a search engine to locate something on the Web, the references in the database are searched and the search results are returned.

Database

Every search engine contains or is connected to a system of databases, where data about each URL on the Web (collected by crawlers, spiders, or robots) is stored. These databases are massive storage areas that contain multiple data points about each URL. The data might be arranged in any number of different ways, and will be ranked according to a method of ranking and retrieval that is usually proprietary to the company that owns the search engine.

Ranking and Retrieval

For a web search engine, the retrieval of data is a combination activity of the crawler (or spider or robot), the database, and the search algorithm. Those three elements work in concert to retrieve the word or phrase that a user enters into the search engine’s user interface. To create the best possible SEO for your site, it’s necessary to understand how these page rankings are made for the search engines you plan to target. Those factors can then be taken into consideration and used to your advantage when it’s time to create, change, or update the web site that you want to optimize.

Knowledge about types of Search Engines

Search engines are divided into several types, beyond the primary, secondary, and targeted search engines.

In addition, search engine types are determined by how information is entered into the index or catalog that’s used to return search results. The three types of search engines are:

Crawler-based engines: To this point, the search engines discussed fall largely into this category. A crawler-based search engine (like Google) uses an automated software agent (called a crawler) to visit, read, and index web sites. All the information collected by the crawler is returned to a central repository. This is called indexing. It is from this index that search engine results are pulled. Crawler-based search engines revisit web pages periodically in a time frame determined by the search engine administrator.

Human-powered engines: Human-powered search engines rely on people to submit the information that is indexed and later returned as search results. Sometimes, human powered search engines are called directories. Yahoo! is a good example of what, at one time, was a human-powered search engine. Yahoo! started as a favorites list belonging to two people who needed an easier way to share their favorite web site. Over time, Yahoo! took on a life of its own. It’s no longer completely human-controlled. A newer search engine called Mahalo (www.mahalo.com) is entirely human-powered, however, and it’s creating a buzz on the Web.

Hybrid engine: A hybrid search engine is not entirely populated by a web crawler, nor entirely by human submission. A hybrid is a combination of the two. In a hybrid engine, people can manually submit their web sites for inclusion in search results, but there is also a web crawler that monitors the Web for sites to include. Most search engines today fall into the hybrid category to at least some degree. Although many are mostly populated by crawlers, others have some method by which people can enter their web site information.

Anatomy of a Search Engine