One document matched: draft-rfced-exp-lakov-00.txt


INTERNET DRAFT		EXPIRES AUGUST 1998	INTERNET DRAFT

American University in Bulgaria
Peter Lazarov Lakov


		The Keyword Protocol (KP)
         	<draft-rfced-exp-lakov-00.txt>


Status of This Memo

This document is an Internet-Draft.  Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups.  Note that other groups may also
distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time.  It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as
"work in progress."

To learn the current status of any Internet-Draft, please check
the "1id-abstracts.txt" listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa),  
ftp.nordu.net (Europe), munnari.oz.au (Pacific Rim),
ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast).

Distribution of this document is unlimited.



i) 	Summary
ii) 	Introduction
iii) 	Overview of the job done by the KP server
iv) 	Communication with the server.
iv.i) 	Communication with remote computers (wanders, robots)
iv.ii) 	Communication with local computers (users updating the 
information about their pages)
v) 	Actions taken for each command
vi) 	Messages


i) Summary

This document provides a proposal for a new Internet protocol, 
which should improve the relevance of results of search requests 
for information in the WWW.  It contains a description of the 
basic model and the minimum set of commands necessary to run 
properly.  It is neither meant to nor should be regarded as a 
definitive version, but rather as a suggestion to a new way of 
looking at the relations between the search engines on one hand 
and hosts running WEB servers on the other.


ii) Introduction 
	
The purpose of this protocol is to improve the results of search 
requests sent to search engines like Alta-Vista, Lycos, Infoseek,  
etc.

Whenever a user browsing the WEB sends a search request to any of 
the search engines, there is a high probability that the result 
will contain many irrelevant entries.  This happens because of the 
way WEB wanderers, robots and spiders fill in  and update their 
databases.  Currently their algorithms for finding information is 
visiting different sites, browsing the file and taking out 
keywords which will be used in subsequent queries.  The keywords 
taken out will depend on the algorithms of the robots, but 
generally they will be either too few, or too many.  If robots 
parse only the title of a file, then surely many keywords, useful 
for finding the file, will not be available, for it is not 
possible to include them all in the title.  If, on the other hand, 
the whole file is parsed, surely many keywords, which have nothing 
to do with the topic of the file, will be included, thus enabling 
this file to appear in search results for completely different 
topics.  In any case, the owner of the file (or the WEB server 
supervisor) - the main person interested in assuring that the 
information will reach the target audience - has no power in 
assisting this process.  His/her actions are reduced to the very 
passive role of only providing access to the data.

The HTML tags “contents” and “keywords” can only partially 
alleviate this problem, because many of the HTML files, linked in 
a document already parsed by a robot, do not contain those tags.  
In this case the robot should decide whether keywords should be 
extracted from those files, or they should be disregarded. A 
search engine may extracts keywords from a file which was not 
intended to provide such - in this case it will fill its database 
with worthless data.  Similarly, it may neglect files with some 
important data.  In either cases, the owner of the files has very 
little power in communicating the precise information to the 
robots.

The KP protocol will give the WEB server supervisor and each 
single user an opportunity for very close control of the 
information they provide for public access.  Each user will be 
able to edit the exact keywords necessary to describe his/her 
files.  The cornerstone of the suggestion is that all descriptions 
of the files will be handled by a central and well-known server, 
which will both increase accuracy and decrease the time necessary 
to browse a WEB server.


iii) Overview of the job done by the KP server.

The KP server operates on the client-server paradigm through a 
reliable TCP/IP byte stream using the ASCII character set.  The 
server performs a listen on a well-known port and when a client 
requests a connection to that port, the server accepts the 
connection.  Once it is created, the client starts sending 
commands to the server, which performs the action and returns a 
response and (when applicable) data.  The response of the server 
may be of either predictable, or unpredictable line length.  In 
case the an unpredictable line length answer, the last line 
contains only a full stop. Commands and replies are terminated by 
a new line character (more on the command syntax in part 4.)

>From the point of view of the server, there are two types of 
objects which can contact the server - users and robots (from now 
on, till the end of the document, I shall refer to a robot as 
program written only to contact the KP server and update the 
databases of search engines. Do not confuse with previously 
mentioned robots, web wanders, crawlers.)  The difference between 
the two is that first, users need to supply password, while the 
robot does not. Actually, the robot does not present any 
identification whatsoever, so any person without a user permission 
could login as a robot.  There is no harm taken in this, because 
the information is meant for public use anyway.  Second, users 
have permission to edit some of the information, while robots have 
only read only permissions.


The KP server needs to handle a single copy of each of following 
files with the following proposed fields: 

File: 		DATA 
Fields:		<id>, <user>, <file>, <keywords>


File: 		PASSWORD
Fields:		<user>, <password>


File: 		PATCH
Fields:		<patch_N>, <patch_N+1 >


File: 		USED_PATCH
Fields:		<patch_file>


and many of the following tables:


File: 		P_1, P_2, P_3, …, P_N
Fields:		<id>, <action>, <file>, <keywords>


where the meaning of the fields for each table is as follows:

DATA
	<id>			a unique record identifier (same for the 
				other tables)
	<user>			the name of the user, to whom the file belongs
	<file>			the actual name of the file
	<keywords>		a string with keywords, separated by comma



PASSWORD
	<user>			same as in table DATA
	<password>		the password for <user>, used at login time.



PATCH
	<patch_N>		the name of the previous to last patch file
	<patch_N+1>		the name of the last patch file



USED
	<patch_file>		the name of the patch files, already sent 
				to robots.



P_1, P_2, …P_N
	<id>		same as <id> in DATA
	<action>	action to be taken when the patch file is 
			merged into the database. N is for new (add 
			new record with <id>, <file> and <keywords> 
			like those in the current table), D is for delete 
			(Delete the records with <id> equal to <id> of the 
			current table)
	<file>		same as <file> in DATA
	<keywords>	same as <keywords> in DATA




File DATA contains information concerning all available files and 
their respective keywords.  When a robot contacts KP server for 
the first time, it should first download the file DATA using the 
GETALL command (which sends back all the records of file DATA.  
Then, the robot can send the command NEXTPATCH a number of times 
until it records all the changes done to file DATA.  The rule for 
generating a new patch file is simple: whenever a robot visits the 
last patch file, create a new patch file and use it to store all 
changes thereafter.  

Changes are made only by users (see above,) only with the commands 
ADDFILE and DELETEFILE. Whenever one of these two commands is 
used, the action taken is stored to the last, unvisited by robots, 
patch file. Each user can change only the files, which are 
referred to by DATA<user> as his/her username.




iv) Communication with the server.

Each command should be terminated with CRLF characters. The space left
blank between the commands and the parameters should be considered as
white space.  CRLF characters and white spaces are not shown explicitely
in the description of the commands lest they become too overburdened. 


iv.i) Communication with remote computers (wanders, robots.)

	GETALL			the server sends to the client all the 
				records with the files and keywords. 
				ACTION is N for all the records. 

	GETPATCH <patchname>	the server sends to the client only 
				the records from the file <patchname>. 

	NEXTPATCH <patchname>	the server sends to the client only the 
				name of the next patch file. No records 
				from the patch are actually transferred.  
				If <patchname> is empty, then the return 
				value is the first patch of the whole 
				database. 


iv.ii) Communication with local computers (those updating the files.)

	USER <username>		the client sends to the server the 
				username of the person who wants to 
				update the database. Username robot is 
				reserved for robots, WEB wanderers and 
				staff.

	PASS <password>		the client sends to the server the 
				password of the user.

	ADDFILE <filename, info>	the client sends to the server 
				a line containing a filename (possibly 
				the URL) and the keywords which should 
				get in the search engines’ databases for 
				that file. In case there is already an 
				entry in the server’s database for that 
				file the keywords should be replaced with 
				the new ones. 

	DELETEFILE <filename>	the server deletes the entry for 
				this file from its database. A user would 
				typically want to do this operation if 
				the file is deleted or moved to a new 
				position. If the last patch file has been 
				sent to at least one robot/wander (or 
				there are no patch files yet), the server 
				should create a new patch file and add 
				the entry in it. 

	LISTLIKE <pattern>	the server sends to the client a 
				list of files matching the specified 
				condition. If the <pattern> is empty, the 
				server sends all the files 

	LISTMINE		the server sends to the client only the 
				files belonging to the user currently 
				logged in.

	EXACT			switch exact string comparison ON/OFF. 
				When exact mode is ON, a string is equal 
				to another only when they have the same 
				sequence of characters. When exact is 
				OFF, a string is equal to another when it 
				is a sub-string of the second. All 
				comparison is case-sensitive. When exact 
				mode is OFF, 

	HELP <command>		the server sends a short help message to 
				the client about the command specified. 
				If no command is specified, the server 
				sends the list of all the commands.

	QUIT			request that the connection with the 
				server be terminated.





v) Actions taken for each command.

USER <username>

1) Check if username is “robot”.  If yes, then this is a robot.  
Let it in without asking for password and apply only the commands 
for robots.  If it enters other commands, then send a message 205.

2) If the name is not "robot", check for it in the table 
password.  If the name is not found, send a message 210.  Else 
send message 101.


PASS <password>

1) Check whether user has already logged in.  If yes, send a 
message 204.
2) If the user hasn’t logged in yet, check the password sent 
against the one stored in file password for that user.  If 
different, send 207.  Else send 102.


ADDFILE <filename, info>

1) Add the record to table DATA.
2) Send message 103.
3) Check if there is already a patch file. 
	* If no patch file exists yet, 
		* add field to table PATCH with fields: DATA, P_1. 
		* Create patch file P_1 and add the field to it. 
	* If a patch file exists, 
		*locate the last patch 
		*If it has been sent to robots, 
			* Add field to table PATCH with: P<prev>, P<prev + 1>
			*Create patch file P<prev+1> and add the field to it.
		*If it hasn’t been sent to robots:
			* Add the field to the last patch file.
4) Send confirmation message 103


DELETEFILE <filename>

1) Locate the file and check that it belongs to the user.  If the 
file is not present in the database, send message 208.  If the 
file does not belong to the user, send message 209.  In either of 
the two cases, goto step 6)
2) Delete the record from table DATA.
3) Send message 104.
4) Follow the same steps, as for ADDFILE step 3.
5) Send confirmation message.
6) End of DELETEFILE command.


LISTLIKE <pattern> 

1) Send the user the files matching the pattern.


LISTMINE

1) Send the user only the files belonging to him/her.


EXACT

1) Change the comparison mode.
2) Send the user the message 105 or 106.


HELP <command>

1) Send to the user a help for the command. If <command> is empty, 
send a list of all available commands.


GETALL

1) Send all records of file DATA.


GETPATCH <patchname>

1) Send all records of file <patchname>.


NEXTPATCH <patchname>

2) Locate for  next patch file in PATCH table.
3) Send message 107.
4) Send name of next patch file.








vi) Messages.

101		+OK Enter password.
102		+OK Welcome to KP version 1.0.
103		+OK Your file has been added to the database.
104		+OK The file has been deleted from the database.
105		+OK Exact mode is ON.
106		+OK Exact mode is OFF.
107		+OK The next patch file is:
201		-ERR Unknown command.
202		-ERR Command USER expected.
203		-ERR Command PASS expected.
204		-ERR You have already logged in.
205		-ERR Command not allowed for your class.
206		-ERR No patch file with this name.
207		-ERR Password incorrect. Try again.
208		-ERR File ID not found.
209		-ERR You have no write permission for this file.
210		-ERR User unknown.


Author's Contact Informationa

Peter Lakov
lakov@wizcom.bg

INTERNET DRAFT		EXPIRES AUGUST 1998	INTERNET DRAFT

PAFTECH AB 2003-20262026-04-24 04:24:50