Web Development, Internet Marketing, Search Engine Optimisation, Graphic Designing, Website Designing
+353 1 2910115
info@etrix.ie

Categories

 

Calendar

July 2008
M T W T F S S
« Jun   Aug »
 123456
78910111213
14151617181920
21222324252627
28293031  
 
 

Meta

 

A Web search engine is a program designed to search for information on the World Wide Web. Information may consist of web pages, images and other types of files, including documents and PDFs on the internet. Search engines operate algorithmically or are a mixture of algorithmic and human input.

Major Search Engines

Some popular major search engines are listed below:

  1. AllTheWeb http://www.alltheweb.com/
  2. AltaVista http://www.altavista.com/
  3. Ask.com http://www.ask.com/
  4. Google http://www.google.com
  5. Live Search (MSN) http://www.live.com
  6. Yahoo! Search http://search.yahoo.com

Search Robots

A web crawler (also known as a web spider or a web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. This process of browsing the websites is called web crawling or spidering. These search engine robots or bots seek content on the Internet, moving from one web page to the other and from within individual web pages to gather information.

Many search engines use these robots to locate web pages for indexing. Web crawlers are a type of software agents which start with a list of URLs to visit, identify all the hyperlinks in the visited pages and add them to the list of URLs to visit. They are used to create copies of visited pages in their index for later processing to provide fast searches, to automate maintenance tasks such as checking links and to gather specific types of information such as e-mail addresses. These search engine robots not only have a pre-defined list of web pages but they also follow links on pages they find after indexing the crawled pages.

Unfortunately, search engine robots are not incredibly powerful. These spiders only absorb what they can see. They grab information from the available data like page titles, Meta Tags and textual content to be included in the search engine’s index or database. These bots don’t understand frames, Flash movies, images or JavaScript which proves quite problematic in certain cases.

How search Engines Operate?

Most search engines have robots which crawl the web and collect information to add to their index. Search Engines follow a standard manner of operating, which is outlined below.

Web crawling

Web search engines work by storing information about innumerable web pages, which they retrieve from the World Wide Web itself through a Web crawler - an automated web browser which follows every link it sees. Once a search engine knows about the pages, it has to be able to access those pages through a web crawler which is able to visit and navigate websites, discern information and decide what the website is about.

Once the webpage is accessed, the contents of each page are analyzed and the indexing pattern of that page is determined (for example, keywords are extracted from the titles, headings, or Meta Tags). There are many factors which figure what content is of value and what matters. Each search engine has its own set of rules and standards in order to evaluate and process the information.

Indexing

Web indexing allows the storage of collected data and information to facilitate fast and accurate information retrieval. Compiling all the data that the bots have retrieved is part of building the search engine index, or database.

The data is first analyzed by a web spider or a search robot and is then stored in an index database for use in later queries. Some search engines like Google store all or part of the source page as well whereas some search engines like AltaVista store every word of every page they find. Googlebot is the name given to Google’s web spider which crawls websites to add to the Google index.

Once a website is in the search engine database, bots will visit it on a regular basis to pick up changes and to ensure the availability of the most current data.

Without an index, search engines would require considerable time and computing power to scan every document. For example, an index of (say) 10,000 documents can be queried within milliseconds; while a sequential scan of the same number could take hours.

Avoiding pages from being indexed by Search Engines

You may have a few pages on your site you don’t want in the search engines’ index, for example, a directory that contains internal logs or news articles that require payment to access. You can exclude pages from web crawlers by creating a text file called robots.txt. The robots.txt file contains a list of the pages that search engines shouldn’t access. The process of creating a robots.txt is straightforward and allows you a certain degree of control over how search engines can access your web site.

Searching

Once the website is added into the search engine database, the information is available for users and searchers who use the search engines to gather information through various search options. When a search engine user enters a search query (usually through key words), the search engine performs a variety of steps to ensure that it delivers the best and most relevant response to the question. It provides a list of best-matching web pages. Millions of web pages include a particular word or phrase; but some are more relevant or popular than others because of their content or use of keywords. Search Engines have their own algorithms which they follow to rank websites and web pages in the Search Results.

Search engines should be able to continuously follow leads, or links from one website to the next, so that they can keep on collecting information available on the Internet. A search engine has a good, valuable database, or index, if it delivers the most relevant results to a visitors query within no time while keeping the information updated and renewed.

Post A Comment

Search For Post

About this site

Thanks for visiting Etrix Technology Limited Blog. Spend sometime to look around and check out some of my posts. I would love to hear feedback from you and enjoy your stay.

If you like it here, don't forget to bookmark it!
 

Archives

 

Subscribe For Free

to get the latest update sent to your computer.