A Prolog Meta-Search Engine

for the World Wide Web

Emanuele Bolognesi and Antonio Brogi

Department of Computer Science
University of Pisa, Italy


Abstract. Increasing attention is being paid to exploring the actual usability of logic programming for developing Internet applications. We describe the design and implementation of a meta-search engine which has been entirely developed in Prolog. The experiment confirms that logic programming can fruitfully contribute to the development of WWW applications. In particular, the experiment shows how the high-level declarative programming style of logic programming facilitates the development of simple and concise Internet programming solutions.
Contents
1. Introduction
2. Web Searching
3. Design of a Prolog Meta-Search Engine
4. Implementation of PrologCrawler
5. Concluding Remarks
    References



1. Introduction

The continuous expansion of the World Wide Web (WWW) is one of the most impressive phenomena of our days. The number of WWW sites and the amount of information accessible through the WWW are constantly growing. Recent estimations evaluate the number of WWW pages to be around 550 millions in May 1999 [10].

The incredible amount of information available calls for effective ways of searching specific information in the WWW. Developing and maintaining efficient search tools for the WWW is a challenging problem of economical relevance. Many search engines and directories are available, such as Altavista [1], Excite [9], HotBot [12], Infoseek [14], Inktomi [15], Lycos [20], Northern Light [23], LookSmart [19], or Yahoo [32].

While available search engines are massively used by so-called net surfers, they present a number of problems and limitations. As we shall discuss further in section 2, the most visible problems are partial covering of available Web pages, erroneous answers (e.g., links to pages not available any more), and redundancy of results.

So-called meta-search engines have been developed to overcome (some of) the limitations of search engines and directories. Roughly speaking, meta-search engines perform Web searches by combining the results of Web searches performed by existing search engines. All available meta-search engines – such as DogPile [8], MetaCrawler [21] or MetaFind [22] – share the same basic behaviour, though they feature somewhat different functionalities. We shall analyse some advantages and limitations of meta-search engines in section 2.

The ultimate objective of our work was to experiment the usability of logic programming for developing WWW applications. More precisely, as a case-study, we have considered the problem of developing a meta-search engine using the Prolog programming language.

The paper will describe the design and implementation of the PrologCrawler system, which has been developed in SICStus Prolog [28], using the PiLLoW library [5] for accessing the WWW. Section 2 presents a quick overview of state-of-art tools supporting Web searching. The main choices taken in the design of the PrologCrawler system are discussed in section 3, while section 4 is devoted to present the implementation of the system. Finally section 5 contains an assessment of the experiment and some concluding remarks.
 
 

2. Web Searching

As we already mentioned in the introduction, many tools supporting Web searching are available on the net. We shall here briefly present the main features and limitations of search engines and meta-search engines.
 
 

2.1 Search engines

The basic behaviour of search engines can be illustrated as follows. The user employs a browser to open the Web page where the engine is located, and then she types one or more keywords describing the document she would like to find in the WWW. After a while, the user’s browser receives and displays an HTML page containing a list of Web pages that are related to the given keywords.

Search engines perform a matching between the specified keywords and the content of Web pages available on the net. In fact search engines perform their search locally, on previously built indexes. These indexes are automatically generated by spiders that constantly scan the Web in order to suitably index existing pages. Typically search engines offer a simple interface for standard queries and a more structured interface for "advanced" queries, where the user may employ logical connectives to compose keywords and/or customize the format and number of results.

While available search engines and directories are massively used by net surfers, they present a number of problems and limitations. One of such problems concerns the percentage of pages effectively indexed by search engines. A recent estimation at May 99 [30] indicates that available search engines do not cover more than one fourth of existing pages (see figure 1). It is worth noting that the above percentage refers to the total number of indexed pages, i.e., of the pages indexed by at least one search engine, and hence the ratio w.r.t. the total number of Web pages is actually even lower [16]. Moreover, as pointed out in [24], the overlap between different search engines is quite limited.

Another well known limitation of existing search engines are erroneous answers. Namely some of the results returned by search engines are links to pages that are not available any more, or in some cases to pages that are not related to the given keywords. The estimations reported in [17,25] and illustrated in figure 2 point out the difficulty of keeping updated indexes.

It is fair to say that all the above estimations cannot be used to compare different engines, as they refer to a specific time interval. A fair comparison should be based on data relative to wider time intervals in order to take into account also the ability of the search engines to update their indexes.

The redundancy of results is another problem of available search engines. It is not easy however to measure this problem, as the estimations on the quality of the results may be less objective than in the previous cases. For instance, the criteria employed in [27,29] include the percentage of links effectively visited or the relevance score associated by the engine itself.

2.2 Meta-search engines

So-called meta-search engines have been developed to overcome (some of) the limitations of search engines. A meta-search engine features a user interface that is similar to the user interface of a standard search engine. Rather than performing searches on locally built indexes, meta-search engines (also called meta-crawlers) directly send the user query to several search engines and then collect and display the obtained results.

Available meta-search engines, such as DogPile [8], Inference Find [13], MetaCrawler [21], or MetaFind [22], partially solve the problem of partial covering of existing Web pages. Indeed combining the results obtained from different search engines increases the percentage of Web pages considered, since the overlap between the indexes used by the different search engines is quite limited [24].

Meta-search engines differ one another for the functionalities featured, in particular for the way they transmit the user query to the search engines and for the way they collect and present the obtained results [2,26]. For instance, some meta-search engines simply send the (same) user query to the search engines without analysing the syntax of the query. Since different search engines employ different syntaxes for queries, those meta-search engines do not correctly support advanced queries (e.g., queries containing logical connectives). Other meta-search engines (e.g., DogPile) are able to adapt the user query to the syntax of different search engines, though none of them correctly supports the syntax for advanced queries of Altavista.

Meta-search engines differ also in the way they collect and present the results obtained from the search engines. For instance some meta-search engines simply append the obtained results without performing any processing on those results. Some of them directly present parts of the pages returned by the search engines. More sophisticated meta-search engines are able to display the results in a uniform format as well as to perform some processing on the set of results obtained, for instance by removing duplicates (e.g., MetaFind). However currently no meta-search engines supports both a full syntactic conversion of queries and a sophisticated processing of the results.

Summing up, while existing meta-search engines are able to cover a larger percentage of Web pages, they do not completely solve the other two problems of search engines, that is, erroneous answers and redundancy of results.
 
 

3. Design of a Prolog Meta-Search Engine

The objective of this work was to develop a meta-search engine in Prolog and to make it available to standard browsers on the Web. The development of a meta-search engine obviously involves a number of design choices. This section is devoted to discuss the main design choices taken in the development of the PrologCrawler system.
 
 

3.1 Functionalities

The functionalities featured by a meta-search engine concern, on the one hand, the way in which user queries are dealt with and, on the other hand, the type of processing performed on the results collected from the underlying search engines.

As far as user queries are concerned, PrologCrawler supports the use of logical operators (AND, OR, NOT) as well as of quoted sentences in the query string. Once the user has inserted her query, PrologCrawler suitably converts it into the syntax of the different underlying search engines so as to run the searches in a uniform way. Figure 3 shows an example of conversion of a query string from the PrologCrawler syntax into the Excite and Yahoo syntaxes. It is worth noting that a sequence of one or more words is interpreted by default as an AND combination of the words and it is coherently passed to the underlying search engines, thus establishing a top-level uniform default format.
 

Specified query
mickey AND mouse NOT goofy
Query sent to Excite
mickey AND mouse AND NOT goofy
Query sent to Yahoo
+ mickey + mouse – goofy

Figure 3. Syntax of queries.

Moreover, in contrast with most existing meta-search engines, PrologCrawler allows users to specify the maximum number of results to be returned and it correspondingly adapts the number of results to be requested to the underlying search engines (from 10 to 50 results).

The type of processing to be performed on the results obtained from the search engines is obviously critical for the overall performance of the system, especially if we consider processing that requires further accesses to the net. The processing functionalities of PrologCrawler have been hence divided in two classes: Simple processing (which does not require further accesses to the net) and advanced processing (which requires further accesses to the net).

The first type of functionalities include sorting the results (by alphabetical order, title, address, search engine, or relevance), removing duplicated addresses, and removing "secondary pages" – i.e., (possibly) redundant results. For instance suppose that the list of results contains a "primary" page such as "www.di.unipi.it/~brogi/book/" or "www.di.unipi.it/~brogi/book/index.html". Then all Web pages like "www.di.unipi.it/~brogi/book/chap-1.html"are considered "secondary" and are eliminated.

The second type of functionalities includes a check on the existence of links returned by the search engines, so as to eliminate erroneous links (i.e., links to pages that are not anymore available). Notice that the selection of this option perceptibly lowers the system performance. This is why the check is optional and it is presented only in the advanced query interface. The validity check on the existence of a Web page yields also the most recent date in which the page was modified, and such dates can be exploited to sort the results.

Finally, PrologCrawler presents the final results of a meta-search in a uniform HTML format, rather than by simply pasting together parts of the pages returned by the single search engines.
 
 

3.2 Search engines employed

The choice of which search engines to employ is obviously of primary importance in the design of a meta-search engine. Ideally, the chosen search engines should have a homogenous interface and ensure a good covering of the set of existing Web pages. For these reasons, we have chosen to ground PrologCrawler on the following five engines:

We decided not to include directories in order to work on search results as homogeneous as possible. For this reason, rather than using the Yahoo directory (like DogPile) we directly included the Inktomi index via which Yahoo searches Web pages. Moreover the co-presence of Inktomi and Northern Light is one of the innovative aspects of PrologCrawler, as only a few meta-search engines use Inktomi and no meta-search engine uses Northern Light (which contains one of the largest indexes).
 
 

3.3 Interfacing with available search engines

One problem to face in the design of a meta-search engine is how to interface the system with the existing search engines.

Typically the user opens with her browser the Web page of the search engine, types in some keywords and gets the result in a new page returned by the search engine to the user’s browser. The problem of interfacing a meta-search engine with available search engines is therefore two-folded: (1) How to query a search engine and (2) how to extract the results from the Web page returned by the search engine.

The first problem can be solved by exploiting the behaviour of the Common Gateway Interface (CGI) used by the existing search engines. Simply stated, a user query generates a URL address [3] which actually launches a search program running on the server. For instance, the introduction of the keyword "prolog" in the Altavista interface generates a new page with the seeming address:

http://www.altavista.com/cgi-bin/query?pg=q&kl=XX&q=prolog

which actually launches program query whose parameter q is bound to the string "prolog". The syntax employed by the CGI interface to specify the various parameters (such as logical operators or the number of results) can be then exploited to invoke a search engine without the need of a human interaction.

The other problem is how to extract the results of a search from the HTML page returned by the search engine. Some meta-search engines "blindly" copy a part of the page returned by the search engines and include it "as is" into their HTML page of results (e.g., like DogPile). Other meta-search engines (e.g., MetaFind) instead analyse the HTML source code of the page returned by the search engines. This approach permits to filter the information returned by the search engines, for instance by eliminating internal links or links to commercial promotions. PrologCrawler analyses the HTML source code of the page returned by the search engines in order to filter the links and to perform some other processing.
 
 

3.4 Efficiency issues

In a meta-search engine the time spent for networking is much greater than the computation time. Indeed an inaccurate handling of networking operations can dramatically affect the performance of the system, and even make the system not usable in practice.

The fact that our system prototype was going to run on a 200Mhz Pentium PC was one of the reasons for choosing only five underlying search engines. Moreover the system was designed so that queries to the five search engines are run in parallel. This means that the overall networking time needed to fetch the results from the search engines is approximately equal to the time needed to fetch the result from the slowest search engine employed.

Running the searches in parallel also enhances the system reliability, in that the meta-search does not get stuck because of the possible stall or failure of one search instance. Namely even if some search engine does not return its answer, the system is able to give the user its meta-answer by combining the results obtained from the other search engines.
 
 

3.5 WWW interface

Because of its very nature, a meta-search engine must obviously be available on the WWW and it must be accessible by means of standard browsers. We therefore made our PrologCrawler meta-search engine accessible from a standard HTML page via a CGI script written in Perl.
 
 
 
 

4. Implementation of PrologCrawler

The PrologCrawler system has been implemented on a Pentium PC 200Mhz with Red Hat Linux 5.0 operating system and Apache 1.3.3 Web server. The systems was written in SICStus Prolog 3.7.1 using the PiLLoW library for accessing the WWW.

The overall behaviour of PrologCrawler is sketched in figure 4, which illustrates the basic way in which the user interacts with the system via a browser. Namely the user employs a browser to open the PrologCrawler home page and to submit her query. The CGI script then launches the Prolog program which performs the meta-search and returns its results in a new HTML page which is then returned to and visualized by the user’s browser.

The internal behaviour of PrologCrawler is sketched in figure 5. The system consists of several modules that interact one another. As a whole, the system receives the search parameters and produces an HTML page containing the results of the meta-search.

Module Main inputs the search parameters and masters the execution of all other modules. After receiving the search parameters, Main activates module Search which is in charge of sending the query to the search engines and of collecting the obtained results. Module Search in turn activates several multiple instances of the Queryengine module (one for each search engine), which are executed in parallel. The results produced by the Queryengine instances are collected by the Search module, which returns the entire set of results to the Main module.

The Main module then activates the Process module which is in charge of elaborating the entire set of results obtained by the available search engines. Depending on the given search parameters, module Process performs a simple processing - by removing duplicated and secondary links - or an advanced processing of the results - by checking also the existence of links (see section 3.1).

Module Sorting then sorts the obtained list of results according to the criterion chosen by the user (e.g., by alphabetical order, title, address, search engine, relevance, or date). Finally module BuildHTML suitably formats the results into an HTML page to be returned to the user’s browser.

One of the advantages of using Prolog to develop the system is the resulting high-level and compact programming style. For instance, the top-level predicate run/2 of module Main defines the behaviour illustrated in figure 5 as follows:

run(simple(Query,Nres,Ord), ResultsPage) :-
   s_search(Query,Nres,Results),
   s_process(Nres,Results,FilteredResults),
   sort(FilteredResults,Ord,SortedResults),
   build_html(SortedResults,ResultsPage).

run(adv(Query,Nres,Ord,MaxTime,E_check), ResultsPage) :-
   a_search(Query,Nres,MaxTime,Results),
   a_process(Nres,E_check,Results,FilteredResults),
   sort(FilteredResults,Ord,SortedResults),
   build_html(SortedResults,ResultsPage).

The first argument of run/2 is an input argument which contains the search parameters. The second argument is an output argument which will be instantiated by a Prolog term representing the HTML page to be returned to the user’s browser. The constructors simple and adv are used to distinguish simple and advanced meta-searches. The parameters of a simple search include the Query string, the number of results to be returned (Nres) and the criterion for sorting the results (Ord). Advanced searches have two additional parameters: The maximum response time (MaxTime) and a boolean flag denoting the request for the existence check on results (E_check). Notice that Search and Process modules are invoked by means of two different top-level predicates (s_search and a_search, s_process and a_process) depending on whether the requested meta-search is simple or advanced.

It is worth briefly describing a bit more in detail the behaviour of the Queryengine module in order to better illustrate the interaction between Prolog and the WWW. As illustrated in figure 5, module Search activates several instances of Queryengine which are run in parallel. Each instance is in charge of sending the query to one search engine as well as of extracting the obtained results. Search activates each instance of Queryengine by invoking an exec system call where the needed parameters are passed in a command-line fashion. (exec is an operating system utility featured by SICStus Prolog to pass a command to a new shell process for execution.) When an instance of Queryengine terminates, it creates a file containing the results obtained. The Search module hence checks the existence of such files to wait for the termination of the various instances of Queryengine and to collect the produced results. Each file is identified by the process id of its creator, so that multiple sessions of the meta-search engine can work simultaneously.

As we already observed, different search engines employ different syntaxes for queries and different formats for presenting the results. In order to comply with these differences, module Queryengine is structured into three sub-modules, BuildURL, Fetch and Parse, as illustrated in figure 6.

Module BuildURL is in charge of building the URL needed to launch a search engine on a given query. The top-level predicate build_url/4, given then name of the engine (Engine), the query (Query) and the number of desired results (Nres), returns the URL needed for performing the specified query on the given search engine.

build_url(Engine,Query,Nres,URL) :-
   convert_query(Query,Engine,NewQuery),
   engine_data(Engine,Engine_URL,Str1,Str2),
   compose(Engine_URL,Str1,NewQuery,Str2,Nres,URL).

First the Query string is transformed into the syntax of the specific engine (by convert_query). Then the final URL is obtained by suitably combining the transformed query NewQuery with other parameters that are peculiar to the search engine (Engine_URL, Str1, and Str2). For instance the call:

build_url(nlight,’prolog+cgi’,20,URL)

will instantiate the output argument URL to the term:

http(’northernlight.com’,80,’/nlquery.fcg?qr=prolog+AND+cgi&us=20’)

which represents the URL:

http://northernlight.com/nlquery.fcg?qr=prolog+AND+cgi&us=20

Predicate engine_data/4 is defined by a set of unit clauses containing syntax information on the way in which query strings are to be formed for the various engines used:

engine_data(altavista, ‘www.altavista.com’,‘/cgi-bin/query?pg=aq&q=’, []).
engine_data(excite, ‘search.excite.com’, ‘/search.gw?c=web&search=’,‘&perPage=’).
engine_data(nlight, ’northernlight.com’, ’/nlquery.fcg?qr=’,’&us=’).
...

Module Fetch exploits the primitives offered by the PiLLoW library for accessing WWW pages. For instance predicate get_page/3, given a URL and an integer denoting the TimeOut for the connection, returns the content of the corresponding Web page in the form of a Prolog term. The actual Web page is accessed by using PiLLoW’s fetch_url/3 primitive which returns the HTML code (Answer) of the given URL along with other information. The second (input) argument of fetch_url/3 can be used to specify the TimeOut for the connection and for identifying the agent sending the HTTP request. After checking that the page is really an HTML page, get_page/3 exploits PiLLoW’s html2terms/2 primitive to convert the actual content of the HTML page into a list of Prolog terms, which can then be manipulated as any other standard Prolog term:

get_page(URL,TimeOut,PageTerm) :-
   fetch_url(URL,[timeout(TimeOut),from(‘pcrawl@di.unipi.it’)], Answer),
   member(content_type(text,html,_),Answer),
   member(content(Page),Answer),
   html2terms(Page,PageTerm).

The conversion performed by html2terms/2 has revealed to be particularly useful for developing module Parse (figure 6). Parse is in charge of analyzing the HTML page produced by Fetch in order to extract the list of results returned by the search engine, which are formatted according to the convention adopted by the search engine itself.

Indeed html2terms/2 converts an HTML tag of the form:

<tagname attributes> text </tagname>

into the term:

env(tagname,attributes,text)

and the conversion is performed recursively so that html2terms/2 produces a Prolog term that preserves the structure of the original HTML page. Module Parse can then directly exploit Prolog’s unification mechanism to recursively extract the list of results from the corresponding tree-structured term. We found that the possibility of exploiting unification (and recursion) notably simplifies the development of a reliable content analyser of HTML pages – e.g. if compared with standard string-based matching techniques.
 
 

5. Concluding Remarks

We have described the design and implementation of a meta-search engine for the WWW which has been entirely developed in Prolog. The development of PrologCrawler is part of a larger project aimed at investigating and experimenting the usability of logic programming for developing WWW applications. In [4] we have presented a recommender system that relies on the integration of Prolog with WWW technology. The system extracts and analyses information on computer science experts that is (implicitly) available on the WWW in order to suggest referees for a computer science article.

For both experiments we have used SICStus Prolog [28] and the PiLLoW library [5] for accessing the WWW. Indeed the PiLLoW library features a useful support both for accessing the WWW and for creating HTML pages. For instance, in section 4 we have discussed how PiLLoW’s primitives fetch_url/3 and html2terms/2 provide a useful interface for accessing and manipulating the content of Web pages as Prolog terms.

The development of PrologCrawler was initiated as a case-study for experimenting the usability of logic programming for WWW applications. The experiment has produced a meta-search engine which can be used on the WWW and which is comparable with state-of-art meta-search engines. Figure 7 summarizes the functionalities supported by PrologCrawler and compares them with some of the best known available meta-search engines.
 
Prolog Crawler
DogPile
Inference Find
MetaCrawler
MetaFind
Employed

engines

AltaVista

Excite

Infoseek

Northern Light Yahoo-Inktomi

AltaVista

Excite

GoTo.com

Infoseek

Lycos

Magellan

The Mining Co.

PlanetSearch

Thunderstone

WebCrawler

Yahoo!

AltaVista

Excite

Infoseek

Lycos

WebCrawler

Yahoo!

AltaVista

Excite

Infoseek

LookSmart

Lycos

The Mining Co.

WebCrawler

Yahoo!

AltaVista

Excite

Infoseek

PlanetSearch

Webcrawle

Conversion of query syntax
YES
YES
NO
Only "all terms" or "any terms"
YES
Correct query conversion for AltaVista
YES
NO
NO
NO
NO
Consistent default search
YES
YES
NO
NO
YES
Possibility of specifying maximum number of results
YES
NO
NO
YES
NO
Possibility of specifying a timeout for the connections
YES
NO
YES
YES
YES
Data displayed for each result of the meta-search
Title, Address, Description, Engine, Position

Date

As in the underlying search engines
Title
Title

Description

Engine

Title, Address, Description, Engine, Position
Uniform presentation of results
YES
NO
YES
YES
YES
Possibility of choosing the sorting criterion with which results will be presented
YES
NO
NO
NO
YES
Removal of duplicated results
YES
NO
YES
YES
YES
Removal of secondary results
YES
NO
NO
NO
NO
Optional check on the existence of the results of the search
YES
NO
NO
NO
NO

Figure 7. Comparison with other meta-search engines.

Finally, it is interesting to try to comment on the adequacy of Prolog for developing WWW applications. For instance, one obvious question concerns the advantages and disadvantages of using Prolog – rather than Java for instance - for developing a meta-search engine like PrologCrawler. We believe that the high-level declarative programming style is a major advantage featured by logic programming languages. Programming by logically formulating executable specifications notably eases the development of simple and concise programming solutions. For instance, the entire code of PrologCrawler is less than 500 lines long (comments excluded).

A quite common negative comment on logic programming is that available implementations of logic programming languages are not efficient enough. While advances in this direction are sometimes perhaps under-estimated, the impact of Prolog implementation efficiency is quite limited for systems that massively perform networking operations. Indeed, for systems like PrologCrawler, the time needed for performing networking operations is by far larger than the time needed to perform Prolog computations.

In conclusion, the development of PrologCrawler suggests that logic programming can indeed contribute to the development of Internet applications, in the line of the promising results already obtained in this emerging area, e.g. [6,7,11,18,31].
 
 

References
 
[1] Altavista&trade;, copyright © 1995-99 Digital Corporation Inc., www.altavista.com
[2] J. Barker, "What are Meta-Search Engines? When to use and not use them?", UC Berkeley Teaching Library, www.lib.berkeley.edu (January 1999)
[3] T. Berners-Lee, L. Masinter, and M. McCahill, "Uniform Resource Locators", ds.internic.net/rfc/rfc1738.txt (December 1994)
[4] A. Brogi and G. Marongiu. "ExpertFinder: A Prolog Recommender System Integrated with the WWW." In M.C. Meo and M. Vilares (editors), Proceedings of 1999 Joint Conference on on Declarative Programming (AGP99), L'Aquila (Italy), September 1999. 
[5] D. Cabeza, M. Hermenegildo, and S. Varmaa. The PiLLoW/CIAO Library for Internet/WWW Programming using Computational Logic Systems." In Proc. 1st Workshop on Logic Programming Tools for INTERNET Applications, JICSLP 96.
[6] K. Clark and V. S. Lazarou. "Distributed Information Retrieval using a Multi-Agent System and the role of Logic Programming." In [7].
[7] K. De Bosschere, M. Hermenegildo and P. Tarau.(editors). "Proceedings of the Second Workshop on Logic Programming Tools for INTERNET Applications", 1997. clement.info.umoncton.ca/~lpnet/iclp97
[8] Dogpile, www.dogpile.com
[9] Excite, copyright © 1995-99 Excite Inc., www.excite.com
[10] Forrester Research, www.forrester.com
[11] Y. Han, S.W. Loke and L. Sterling. "Agents for Citation Finding on the World Wide Web." In Proceedings of the 2nd Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology, 1997. 
[12] HotBot, copyright © 1996-99 Wired Digital Inc., www.hotbot.com
[13] Inference Find, 1996-99, www.ifind.com
[14] Infoseek, copyright © 1995-99 Infoseek Corporation, www.infoseek.com
[15] Inktomi, www.inktomi.com
[16] S. Lawrence and C. Lee Giles, "Searching the World Wide Web", Science - vol. 280, April 1998, pp. 98.
[17] S. Lawrence and C. Lee Giles, "How big is the Web? How much of the Web do the search engines index? How up to date are the search engines?", NEC Research Institute, www.neci.nj.nec.com/homepages/lawrence/websize.html (April 1998)
[18] S.W. Loke and A. Davison "A Two-level World Wide Web Model with Logic Programming Links", Proceedings of the 2nd Workshop on "Logic Programming Tools for INTERNET Applications", July 1997, pp. 41-54
[19] Looksmart, Copyright © 1996-99 by LookSmart Limited, www.looksmart.com
[20] Lycos, copyright © 1994-99 Lycos Inc., www.lycos.com
[21] MetaCrawler, copyright © 1996-1999 Go2Net Inc, www.go2net.com
[22] MetaFind, www.metafind.com
[23] Northern Light, 1997-99, www.northernlight.com
[24] G. R. Notess, "Toward More Comprehensive Web Searching: Single Searching Versus Megasearching", ONLINE, March 1998
[25] G. R. Notess, "Search Engine Statistics", Search Engine Showdown, www.notess.com/search/stats/(March 1999)
[26] G. R. Notess, "Multiple Search Engines", Search Engine Showdown, www.notess.com/search/multi/ (1999)
[27] M. Pedram, "Introduction to Search Engines", Kansas City Public Library, www.kcpl.lib.mo.us/srchengines.htm (February 1999)
[28] SICStus Prolog User's Manual. www.sics.se/is1/sicstus.html
[29] D. Sullivan, "Media Matrix Search Engine Ratings", Search Engines Watch, www.searchenginewatch.com/reports/mediamatrix.html (December 1998)
[30] D. Sullivan, "Search Engines Sizes", Search Engines Watch, www.searchenginewatch.com/reports/sizes.html (February 1999)
[31] P. Tarau, A. Davison, K. De Bosschere and M. Hermenegildo (editors). "Proc. of the First Workshop on Logic Programming Tools for INTERNET Applications’, 1996. clement.info.umoncton.ca/\char126lpnet/jicslp96
[32] Yahoo!, copyright © 1994-99 by Yahoo Inc, www.yahoo.com