calsoft enterprise solutions

White Papers

Web Site Performance Analysis

Introduction

One of the primary ways by which the Internet users search web sites is through search engines. That is why a web site with a good search engine listing may see a dramatic increase in traffic. A search engine query often turns up hundreds or thousands of matching web pages. In most cases, only the ten most "relevant" matches are displayed first.

When someone queries a search engine for a keyword related to your site's products or services, does your page appear in the top 10 matches? It is quite natural for anyone who runs a web site to see his web site in the listing of "top ten" results. If you are listed, but not within the first two or three pages of results, you lose, no matter how many engines you submitted your site to.

There are two obstacles to solving this problem. First you’ll have to know the various techniques that will help you move into the top 10 position. Then you’ll have to monitor your progress - a crucial step, that normally consumes a lot of your precious time.

Product Overview

WeSSAT

Where does WeSSAT come into picture? If a web-developer wants to know the ranking of his page for different sets of keywords then he has to go to each search engine and submit his keyword and manually find the position of his web site after traversing number of pages. Just imagine the amount of time involved if his web page is in tenth page of the result. The same process needs to be repeated for all the other search engines.

This is exactly were WeSSAT comes into picture. WeSSAT- Web Site Situation Analysis Tool helps you to automates the process of finding the ranking of your page in the search engines. You can also find the ranking of your competitors. What if you don’t know about your competitors? WeSSAT also provides you with options in finding your unknown competitors.

Features

Multiple Keyword Searches

WeSSAT offers you with the provision of multiple keyword searching. The user may input a number of keywords for simultaneous searching. WeSSAT uses these keywords for searching in the search engines.

Multiple Search Engines

WeSSAT supports simultaneous searching in nine popular search engines namely Altavista, Excite, HotBot, Infoseek, Linkstar, Lycos, PlanetSearch, WebCrawler and Yahoo. By default all the search engines are involved in the search process. The user may select any number of search engines to search for.

Multiple Domain Search

WeSSAT helps in searching your multiple domains simultaneously. These domains are the domains whose position in the respective search engines is found. The user may type in either the complete URL or just the domain whose ranking has to be found.

Competitors Domain Search

WeSSAT also helps to find the position of the competitor domain in same way as the user domain.

Number of Hits

The user can specify the number of hits to be retrieved which ranges between 10 and 200. WeSSAT looks for the domains up to the hits specified by the user

Number of Threads

The user has the options to select the number of threads, which in turn determines the simultaneous searches that WeSSAT does. By default, WeSSAT uses 6 threads for searching though the range is between 1 and 15. If the user has multiple keywords to look for then setting the number of threads to the maximum will speed up the search process.

Identifying Hot Competitors

WeSSAT also lists the hot competitors for the keywords you have submitted. This can be done by checking “Find unknown competitors“ option. WeSSAT uses a comprehensive ranking scheme and depending upon the weightage given to each search engines it lists out the top competitors. The user may give weightage to each of the search engine and also select the number of competitors to be listed. The user may also list some of the domains, which has to be excluded from the competitors (like .com, .gov, .net etc.)

Saving Search Queries

WeSSAT provides options to save your search queries so that same query may be used for future searches. The details like keywords, domains, competitor’s domain are stored in search query. These queries may be used for future searches.

Report

The search result is a report generated in simple html format, which provides all information about the search, which can be viewed in the default browser.

Architecture

The basic architecture of WeSSAT consists of ViewManager, AppManager, Analyser and WebQueryEngine modules.

ViewManager

The basic function of this module is to handle the user input with appropriate UI. It then passes the user input to AppManager.

AppManager

The main module is the AppManager, which initializes, coordinates all the activities of all the other modules. Depending upon the input provided by the ViewManager it initializes WebQueryEngine component for searching.The result retrieved from WebQueryEngine is passed on to Analyser component for analyzing. The analyzed result is retrieved from the Analyser and then output.

WebQueryEngine

This is the module that queries a search engine for the URL for the specific position. This component encapsulates all functionality required for connecting to a search engine, downloading the page and retrieving the results.

Analyser

This component analyzes the result obtained from WebQueryEngine.

Proposed Technique

WeSSAT has been developed using MFC classes. The basically the UI has been developed using CPropertySheet and CPropertyPage classes. For searching a connection has to be established to the server that is the search engine. This is done using WinInet classes like CInternetSession, CHttpConnection etc. As connecting to a search engine and downloading a page may take time, so multithreading has been used. Each thread takes care of connecting to the search engine and downloading

The basic algorithm for searching is as

  • The user inputs all values like keywords, search engines, domains, hits, threads etc for searching which are passed on to AppManager.
  • The AppManager then schedules the searches, creates search items, initializes WebQueryEngine with search items and starts the threads.
  • The WebQueryEngine takes care of posting to the particular search engine, downloading the page, scanning through the page and retrieves the result URLs.
  • The results are passed on by AppManager to AnalyserReporter for analyzing.
  • The AnalyserReporter analyses and passes the result to the AppManager which depending upon the result retrieved further schedules the search.

Algorithms

Scheduling And Coordinating The Search

AppManager is the main module, which coordinates the searches and retrieves result. AppManager first creates search items, which consists of the keyword, search engine, pageno and depth. These search items are placed in Que. Then the AppManager schedules and initializes the threads.

  • Initialize SearchItems queue to null
  • When to add: If remaining_items <(numberOfThreads+1) 2 then add numberOfThreads items to list using AddItem.
  • Repeat
  • Allot item to waiting thread. Request thread to begin searching.
  • When thread returns information, first remove items in the SearchItems queue that are no longer valid.
  • If remaining_items <(numberOfThreads+1)/2 then add numberOfThreads items to list.
  • Until no more items to be searched.
AddItem

Algorithm used to add items to SearchItems queue

  • First Criterion: Select page having lowest depth.
  • Second Criterion: Select keyword phrase highest from the start of the keywordphrase_searchEngine_status stack.

Querying Search Engine

WebQueryEngine takes the major part of search engine specific work. Hence WebQueryEngine has to take into account the major difference between the search engines. The three major area where the search engine differ is

  • The URL: For each of the search engines the posting URL has to be created, depending upon keyword and the page to be retrieved. For all search engine except Yahoo this can be done easily by keeping the keyword and page or hit number as parameter. But incase of Yahoo there is major difference because in its URL it has parameter for number of categories and sites, which would depend upon on search result. So for yahoo the next page URL is scanned and then stored.
  • Posting: Posting to particular search engine can be either a Get Method or Post Method. In all the search engines supported by WeSSAT except Linkstar posting is done by Get method. But incase of Linkstar posting is done by post method. Hence in Linkstar a Post is done with default form values.
  • Scanning: There is major difference while scanning for URLs in the downloaded page in each of the search engine. Depending upon the layout of the page for each of the search engine the scanning and parsing for the URLs is done.
  • All these differences are taken care by WebQueryEngine.

    Analyzing and Ranking

    Analyser analyses the results for finding the position of the domain. Finding the position of the domain is done by scanning through URLs returned. It also analyses to find the hot competitors and rank them. The ranking scheme is done as follows.

    Algorithm For Ranking

    Report Generation

    After analyzing the results are passed on to the AppManger which generates a simple html report. The report contains details of the position of the domains found in the search engines. It also lists out the top competitors for the keywords.

    Conclusion

    WeSSAT is a step towards automating the process of finding the position of the domain in search engines. There are many possible extensions that can be done to WeSSAT, which will increase it performance as well as its usability. The possible extensions may be listed as follows

    • cStore past analysis data and compare/track rankings of web pages over a period of time.
    • Add new search engines. Remove search engines
    • Ability to modify the tool if search engines modifies the format for accepting data.

    Appendix

    Web Site Situation Analysis Tool (ver 1.0)
    Report generated on 06 August 1999


    Search phrase : Enterprise Javabeans Container for transaction management


    Search phrase : Enterprise Javabeans Container for transaction management

    Key: implies URL wasn't found within the specified number of results
    E : implies Error connecting to or retrieving results from the search engine
    Y : implies domain was found in the search engine

    Note: To arrive at the top competitors, Wessat uses the comprehensive ranking scheme that takes care of the position of the returned result search engine and the weightage attached to the search engine.

    Search Engine Weightage

    Altavista: Very High
    Yahoo: Very High
    Excite: High
    HotBot: High
    Lycos: High
    Infoseek: High
    Planetsearch:Medium
    WebCrawler:Medium
    Linkstar:Medium
    Excluded domain: .net,.gov