Search Engine Terms A-D
Home
A B
C D E
F G H I J
K L M N O
P Q R S T
U V W X Y
Z
- Adjacency
- A property of the relationship between words in a search engine (or
directory) query. Search engines often allow users to specify that words
should be next to one another or somewhere near one another in the web
pages searched.
-
- Agent
Name Delivery
- The process of sending search engine spiders to a tailored page, yet
directing your visitors to what you want them to see. This is done using
server side includes (or other dynamic content techniques). SSI, for example,
can be used to deliver different content to the client depending on the
value of HTTP_USER_AGENT. Most normal browser software packages have a
user agent string which starts with "Mozilla" (coined from Mosaic
and Godzilla). Most search engine spiders have specific agent names, such
as "Gulliver", "Infoseek sidewinder", "Lycos spider"
and "Scooter".
-
- By switching on the value of HTTP_USER_AGENT (a process known as agent
detection), different pages can be presented at the same URL, so that
normal visitors will never see the page submitted to search engines (and
vice versa).
-
- In practise this is somewhat simplistic. Some search engines pretend
to be "plain mozilla" browsers to prevent use of agent name delivery.
Effective use of agent name delivery can be very difficult, and may not
even work.
-
- How do you spot agent name delivery at work? This is quite difficult,
as the owners of web pages using agent name delivery can control what you
see! You may be able to guess that a page is using this technique if it
appears to be indexed incorrectly or the title or description don't match
the page you see, but this could also have been achieved by switching pages
after the relevant search engine has indexed it. If you really want to
see the search engines' tailored version of a page, write a program (e.g.
a Perl script) to retrieve the URL with HTTP_USER_AGENT set to each of
the strings used by the search engine spiders. If agent name delivery is
in use, one or more of the retrieved pages will be different to the others!
-
- See also hidden text
and IP delivery.
-
- Altavista
- A popular search engine with the largest database on the web, indexing
more than 140 million pages. Its main URL is http://www.altavista.com.
Until 1998, this search engine provided the search facility for Yahoo. Altavista indexes all the words
in a web page, and new pages are normally added to the database fairly
quickly, within a couple of working days. You are asked to submit just
the main page of your site. The Altavista spider will then explore your
site and index a representative sample of the pages. Some problems with
spamming have been noticed. The use of keyword meta tags is penalised.
Altavista places various alternative options before its search results,
including suggested questions (using the Ask
Jeeves service), RealNames.
Paid entries are beginning to appear at the start of the search results.
-
- AOL Netfind
- The default search engine for users of the AOL internet service provider,
and hence a busy site. Its URL is http://www.netfind.com.
It is essentially the same engine as Excite.
-
- Applet
- A small program, often written in Java,
which usually runs in a web browser, as part of a web page. It is possible
that the use of such a program may cause spiders and robots to stop indexing
a page.
-
- ArchitextSpider
- The name of the Excite search engine's spider.
-
- Ask Jeeves
- A meta search engine which can be asked questions in English. This
service is also in use at Altavista.
http://www.askjeeves.com.
-
Home
A B C D E
F G H I J
K L M N O
P Q R S T
U V W X Y
Z
- Bait-and-Switch
- The provision of one page for a search engine or directory and a different
page for other user agents at the same URL. Various methods can be used,
e.g. Agent Name Delivery
or IP Delivery.
-
- Bridge Page
- See Gateway Page.
-
Home
A B C D E
F G H I J
K L M N O
P Q R S T
U V W X Y
Z
- CGI
- Common Gateway Interface - a standard interface between web server
software and other programs running on the same machine.
-
- CGI Program
- Strictly, any program which handles its input and output data according
to the CGI standard. In practice, CGI programs are used to handle forms
and database queries on web pages, and to produce non-static web page content.
-
- Channels, Channel listings
- Lists of links to selected (and usually popular) web sites. The links
are maintained by search engines and directories and are sorted into categories
or channels. Sites are picked by a channel editor, often because of a site's
already high ranking with the search engines. Some search engines and directories
allow visitors to nominate sites for inclusion in their channels.
-
- Client
- A computer, program or process which makes requests for information
from another computer, program or process. Web browsers are client programs.
Search engine spiders are (or can be said to behave as) clients.
-
- Click through
- The process of clicking on a link in a search engine output page to
visit an indexed site.
-
- This is an important link in the process of receiving visitors to a
site via search engines. Good ranking may be useless if visitors do not
click on the link which leads to the indexed site. The secret here is to
provide a good descriptive title and an accurate and interesting description.
-
- Cloaking
- The hiding of page content. Normally carried out to stop page thieves
stealing optimized pages. See also Bait-and-Switch.
Clustering
- The listing of only one page from each web site in a search engine
or directory's list of search results. This avoids occupation of all the
top results by a small number of web sites and makes the list of results
clearer and more useful to the user.
Comment
- The HTML <!-- and --> tags are used to hide text from browsers.
Some search engines ignore text between these symbols but others index
such text as if the comment tags were not there. Comments are often used
to hide javascript code
from non-compliant browsers, and sometimes (notably on Excite)
to provide invisible keywords to some search engines.
-
- Crawler
- See Spider.
-
Home
A B C D E
F G H I J
K L M N O
P Q R S T
U V W X Y
Z
- Dead Link
- An internet link which doesn't lead to a page or site, probably because
the server is down or the page has moved or no longer exists. Most search
engines have techniques for removing such pages from their listings automatically,
but as the internet continues to increase in size, it becomes more and
more difficult for a search engine to check all the pages in the index
regularly. Reporting of dead links helps to keep the indexes clean and
accurate, and this can usually be done by submitting the dead link to the
search engine.
De-listing
- The removal of pages from a search engine's index.
Removal can occur for various reasons, including unreliability of the machine
that hosts a site or because of perceived attempts at spamdexing.
Description
- Descriptive text associated with a web page and displayed, usually
with the page title and URL, when the page appears in a list of pages generated
by a search engine or directory as a result of a query. Some search engines
take this description from the DESCRIPTION Meta
tag - others generate their own from the text in the page. Directories
often use text provided at registration.
Direct Hit
- A system which monitors the search engine users' selections from search
engine results, counting which results are clicked on most, and how long
visitors spend at that site, so as to improve relevancy. Used by HotBot and as a plug-in to Apple's
new innovative Sherlock search system. See www.directhit.com.
Directory
- A server or a collection of servers dedicated to indexing internet
web pages and returning lists of pages which match particular queries.
Directories (also known as Indexes) are normally compiled manually,
by user submission (such as at whatsnew.com),
and often involve an editorial selection and/or categorization process
(such as at LookSmart and
Yahoo).
-
- Dogpile
- A meta search engine. Found at http://www.dogpile.com.
-
- Domain
- A sub-set of internet addresses. Domains are hierarchical, and lower-level
domains often refer to particular web sites within a top-level domain.
The most significant part of the address comes at the end - typical top-level
domains are .com, .edu, .gov, .org (which sub-divide addresses into areas
of use). There are also various geographic top-level domains (e.g. .ar,
.ca, .fr, .ro etc.) referring to particular countries.
-
- The relevance to search engine terminology is that web sites which
have their own domain name (e.g. http://www.nativetongues.com) will often
achieve better positioning than web sites which exist as a sub-directory
of another organisation's domain (e.g. http://ourworld.compuserve.com/homepages/tijana/).
-
- Doorway Page
- See Gateway Page.
-
- Dynamic content
- Information on web pages which changes or is changed automatically,
e.g. based on database content or user information. Sometimes it's possible
to spot that this technique is being used, e.g. if the URL ends with .asp,
.cfm, .cgi or .shtml. It is possible to serve dynamic
content using standard (normally static) .htm or .html type
pages, though. Search engines will currently index dynamic content in a
similar fashion to static content, although they will not usually index
URLs which contain the ? character.
-
Home A B C D
E F G H I
J K L M N
O P Q R S
T U V W X
Y Z