How Search Engine Was Working (10 Years Back)

Search engines are complex beasts and the following information is written in a simplistic way to communicate our main points.

The first thing you need to know about Search Engines is what a “spider” is and how it works. Also known as a robot, bot or crawler, a spider is a software program used by a search engine to explore the ever-changing World Wide Web.

There are many different types of spiders on the Internet, but the one we are concerned with automatically collects websites and web pages (fetches). The spider then loads content onto the search engines database. The search engine then indexes the information collected by the ready for retrieval.

The spider also collects inbound and outbound links for the database and later uses those links to return to other websites and web pages to collect further information.

In short, a spider is a type of software that builds a lists URLs to visit. It begins crawling those pages (fetching the pages), then indexes them (analyzing them and breaking them down into words for the index), and adds all the hyperlinks to a list of URLs to later visit and index.

What does this tell you?

Firstly, the information in code both on your page and off page must help the search engine properly index your web pages. Understanding this process and the time delay from collection to indexing your website URLs and them appearing for keyword and key phrase searches and the importance of code both on page and off page should now seem clearer.

How Search Engines Find Web Pages

The most common way spiders find websites is by following links from other websites. The term used to describe the information collected is – ‘found pages’.

Some search engines have a system for submitting your web page address. The best method is to submit the home page for your website (such as https://echovme.in) and the spiders will subsequently find other pages after indexing the main URL.

Submitting multiple pages of the same website or submitting a website multiple times are techniques frowned on by search engines. We were told recently that some search engines penalize pages submitted versus those found by the search engine spiders.

We are not so sure that is right, but we find much better ways of submitting pages so they appear in search engine page results faster.

However, please be cautious about orchestrating an overly aggressive search engine submission program. Also, beware of using automated tools to submit your site to the search engines.

So what is indexing all about?

When a spider finds your web page its indexer stores the full text of the pages it finds. These pages are then stored in the search engine’s index database. What happens next is the search terms are sorted alphabetically and cataloged by where each term is found within the text.

Later in this guide we explore keywords and key phrases and their use when optimizing your web pages. The collection, the analysis of the data and the correct indexing of information using an algorithm, again explained later in greater detail, is to ensure that search engine users achieve rapid access to topical web pages that contain their search term or queries.

To further speed up the process of searches common words such as is, at, on, of, or, how, why, etc. are ignored. Certain single digits as well as single letters are also ignored. These words or characters are known as “stop words”.

Stop words are so common that they do little to narrow a user’s search, and are therefore discarded from the indexing process. In this way the overall performance of search engines is effectively increased.

Additionally images, scripts and some rich media may also be ignored by some search engines in their indexing process. However, much of that is changing and some search engines are able to index text from flash and other formats which were earlier ignored.

In addition to the text of your website, some search engines store the content of your META tags. There is more information on META Tags in Chapter 7, but in brief, META tags are used in the header information on a web page and include information about the website, products, services or ideas.

These META Tags are not viewable by the users of a website, but the spiders pick them up and include them when indexing a web page.

Also, search engines index the hyperlinks on each page. Since the pages linked from a website typically have information related that web page, the search engines can use this information to assist in indexing that page and identifying theme.

This also assists the search engines in determining the link popularity of a web page and also measure the ‘significance’ of the pages linked by each site.

At Google, the ‘significance’ of a web page is measured by PageRank again explained in greater detail in later chapters.

1 CommentLeave a comment

  • How do I copy my WordPress blog onto my computer so I can locally edit and try out plugins before publishing?

Leave a Reply

Your email address will not be published. Required fields are marked *