One document matched: draft-liang-irpdl-00.txt



                                                                        
   Internet Draft                                            Wang Liang 
   Document: draft-liang-irpdl-00.txt                              hust 
   Expires: March 2003                                   September 2003 
    
    
            Information Retrieval Protocol for Digital Library 
                       draft-liang-irpdl-00.txt  
    
Status of this Memo 
    
   This document is an Internet-Draft and is in full conformance with 
   all provisions of Section 10 of RFC2026.  
    
   Internet-Drafts are working documents of the Internet Engineering 
   Task Force (IETF), its areas, and its working groups.  Note that      
   other groups may also distribute working documents as Internet-Drafts. 
    
   Internet-Drafts are draft documents valid for a maximum of six months 
   and may be updated, replaced, or obsoleted by other documents at any 
   time.  It is inappropriate to use Internet-Drafts as reference 
   material or to cite them other than as "work in progress." 
    
   The list of current Internet-Drafts can be accessed at 
   http://www.ietf.org/ietf/1id-abstracts.txt. 
    
   The list of Internet-Draft Shadow Directories can be accessed at 
   http://www.ietf.org/shadow.html. 
    
   This Internet-Draft will expire on March 10, 2004. 
 
Copyright Notice  
     
   Copyright (C) The Internet Society (2003).  All Rights Reserved. 
    
    
Abstract 
    
   This document specifies an information retrieval protocol for digital 
   library. This protocol has two parts: standard search Webservice for 
   heterogeneous databases and the method to find and select such search 
   Webservice. By using this protocol, all the databases including web 
   page database, digital issue database, and video database, can 
   release the uniform search Webservice, though these databases may 
   have different metadata standards and architecture. And these 
   Webservice can be easily found and visited by search systems. This 
   very protocol makes it possible that users can obtain all kinds of 
   information on the Internet in single search engine, but not visit 
   lots of different search engines one by one.    
 
 
WANG                   Expires - March 10,2004               [Page 1] 
Internet Draft             Information RPDL            September 2003 
 
 
 
1. Introduction 
 
   Hundreds of databases have been introduced in many libraries, and 
   there are many more free information resources on the Internet. It 
   has become a kind of acrobatics for us to find the complete and 
   precise results about our query in so many databases. Everyone hopes 
   to obtain all kinds of information in one search engine, such as web 
   pages, Videos, but does not care where the information lies in. 
   Webservice [1] give us a good method to realize this desire. As long 
   as these databases can provide Webservice, it will be an easy mission 
   to integrate all kinds of information resources in one search engine. 
   Now Google [2] and some other databases have provided search 
   Webservice. But standard protocol for these searches Webservice does 
   not exist. Even different web search engines Webservice have 
   distinct formats of queries and search results, needless to mention 
   the Webservice of many other kinds of databases. Thus, a uniform 
   Webservice applicable for all the information resources and an 
   efficient method to find such Webservice should be established. This 
   document just achieves these two goals. The protocol comprises of two 
   interacting parts, Standard Search Webservice (SSW) which can be 
   applied to all databases and Search Webservice Description, Discovery 
   and Integration (SDDI) which provides an efficient way to find the 
   appropriate search Webservice. 
 
2 Standard Searches Webservice 
    
   Information Retrieval Protocol defines a standard search Webservice 
   with its classes and functions. Most of databases can distribute the 
   uniform search Webservice by using this definition. 
 
2.1 data encoding 
 
   In order to support searching documents in multiple languages, all 
   requests and responses should be in accordance with the UTF-8 
   encoding. 
    
2.2. The format of query words 
    
   Query words are applied to present request. 
   The query words can include the logical operator 
   Boolean OR: "OR" 
   A OR B: A or B must appear in the results 
   Boolean AND: "AND" 
   A AND B: A and B must appear in the results 
   Boolean NOT: "NOT" 
   NOT A: the word A can't appear in the results 
   The logical operator can be used together. 
 
 
WANG                     Expires - March 10, 2004            [Page 2] 
Internet Draft             Information RPDL            September 2003 
 
 
   For example: (A AND B) AND (NOT C) means the results must include A 
   and B  but can't have C.  
   The character "("and ")" in the query words will be treated as word 
   separators. 
 
2.3 class and function 
    
   The standard Webservice with its class structures and functions are 
   detailed here and also presented in the form of WSDL [3].  
   There are three function components in a search Webservice.1 receive 
   the query words and return results.2 analyze and explain the results. 
   3 depict every recorder of the results. All these functions are 
   implemented with three classes of Webservice.   
    
    
2.3.1 Class Search 
    
   Main function of this class is to submit a query string and a set of 
   parameters to the search service and receive in return a set of 
   search results. 
    
   There are three levels of search function in this class: basic search, 
   advanced search, full search. 
    
   a. Basic search 
    
   basicSearch(title.start,maxResults) 
    
   title: query words; represent the basic description of a recorder. 
   Field "title" is available in all the metadata and databases. Book, 
   video, webpage database all can provide the search in "title" 
    
   start: Zero-based index of the first desired result. 
    
   maxResults: Number of results desired per query. The maximum value 
   per query set to 100, and the minimum is defined as 10. If you make a 
   query that doesn't have many matching items, the actual number of 
   results you get may be smaller than that of you request. 
    
   b. Advanced search 
    
   advanceSearch(title,keywords,author,abstract, 
   dateStart,dateEnd,start,maxResults,order) 
    
   the meaning of title start maxResult is just the same as those in 
   Basic search. The reason to select the keywords, author, and date as 
   the search fields is that these fields are normally available in most 
   databases and metadata.  
    
 
 
WANG                     Expires - March 10,2004             [Page 3] 
Internet Draft             Information RPDL            September 2003 
 
 
   keywords: query words, keywords for recorder 
    
   author : query words, author of the recorder. 
    
   abstract: the basic description of the recorder. 
    
   dateStart,dateEnd: present date range. If you want to limit your 
   results to document that are published within a specific date range, 
   you can use this query term to accomplish this.  
    
   Order: the sort order of the results. It can be "date" which means 
   sorting by date or "correlativity" which means sorting by 
   correlativity, or some other orders else.  
    
   c. Full search 
 
   fullSearch(" ", start, maxResults) 
 
   Full search will provides all the query formats of one database. They 
   will be confirmed by the database owners. 
 
2.3.2 Class SearchResponse 
 
   Each time you issue a search request to the search service, a 
   response is returned back to you. This class describes the meanings 
   of the values returned to you. The character of this class is 
   described as follows.  
    
   TotalResultsCount: The estimated total number of results that exist 
   for the query. 
    
   resultElements: An array of "resultElement" items. This corresponds 
   to the actual list of search results. 
    
   startIndex:Indicates the index (1-based) of the first search result 
   in "resultElements". 
    
   endIndex: Indicates the index (1-based) of the last search result in 
   "resultElements". 
    
   searchTime :Text, floating-point number indicating the total server 
   time to return the search results, which measured in seconds. 
    
2.3.3 Class ResultElement 
 
   This class describes every record in return results. This Class has 
   three characters as follows.  
    
   Sourcename: name of the information source.  
 
 
WANG                     Expires - March 10,2004             [Page 4] 
Internet Draft             Information RPDL            September 2003 
 
 
    
   Title: title of the recorder. 
    
   URL: The URL of the recorder, returned as text, with an absolute URL 
   path. 
    
   Otherinformation: some information such as a snippet of a webpage, 
   author of the recorder. This character will be defined according to 
   different search Webservice. 
    
2.3.4 WSDL of standard search Webservice 
 
   <definitions name="search " 
   targetNamespace="databaseSearch " 
   xmlns:typens=" databaseSearch " 
   xmlns:xsd="http://www.w3.org/2001/XMLSchema" 
   xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" 
   xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" 
   xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/" 
   xmlns="http://schemas.xmlsoap.org/wsdl/"> 
   <types> 
    
   <xsd:schema  
   xmlns=http://www.w3.org/2001/XMLSchema  
   targetNamespace=" databaseSearch "> 
   <xsd:complexType name="SearchResult"> 
   <xsd:all> 
   <xsd:element name="ResultsCount" type="xsd:int" />  
   <xsd:element name="resultElements" type="typens:ResultElementArray"/>  
   <xsd:element name="startIndex" type="xsd:int" />  
   <xsd:element name="endIndex" type="xsd:int" />  
   <xsd:element name="searchTime" type="xsd:double" />  
   </xsd:all> 
   </xsd:complexType> 
    
   <xsd:complexType name="ResultElement"> 
   <xsd:all> 
   <xsd:element name=" Sourcename " type="xsd:string" />  
   <xsd:element name=" title" type="xsd:string" /> 
   <xsd:element name="URL" type="xsd:string" />  
   <xsd:element name=" otherInfomation" type="xsd:string" />  
   </xsd:all> 
   </xsd:complexType> 
    
   <xsd:complexType name="ResultElementArray"> 
   <xsd:complexContent> 
   <xsd:restriction base="soapenc:Array"> 
   <xsd:attribute ref="soapenc:arrayType" 
   wsdl:arrayType="typens:ResultElement[]" />  
 
 
WANG                     Expires - March 10,2004             [Page 5] 
Internet Draft             Information RPDL            September 2003 
 
 
   </xsd:restriction> 
   </xsd:complexContent> 
   </xsd:complexType> 
   </xsd:schema> 
   </types> 
    
   <message name="basicSearch"> 
   <part name="title" type="xsd:string" />  
   <part name="start" type="xsd:int" />  
   <part name="maxResults" type="xsd:int" />  
   </message> 
    
   <message name="advanceSearch"> 
   <part name="title" type="xsd:string" />  
   <part name=" keywords " type="xsd:string" />  
   <part name=" author " type="xsd:string" />  
   <part name=" abstract" type="xsd:string" />  
   <part name=" dateStart " type="xsd:date" />  
   <part name=" dateEnd" type="xsd:date" />  
   <part name="start" type="xsd:int" />  
   <part name="maxResults" type="xsd:int" />  
   <part name="order" type="xsd:string " />   
   </message> 
    
   <message name="fullSearch"> 
   ...... 
   <part name="start" type="xsd:int" />  
   <part name="maxResults" type="xsd:int" />  
   </message> 
    
   <message name="searchResponse"> 
   <part name="return" type="typens:SearchResult" />  
   </message> 
    
   <portType name="SearchPort"> 
   <operation name="basicSearch"> 
   <input message="typens:basicSearch" />  
   <output message="typens:searchResponse" />  
   </operation> 
    
   <operation name="advanceSearch"> 
   <input message="typens:advancSearch" />  
   <output message="typens:searchResponse" />  
   </operation> 
    
   <operation name="fullSearch"> 
   <input message="typens:fullSearch" />  
   <output message="typens:searchResponse" />  
   </operation> 
 
 
WANG                     Expires - March 10,2004             [Page 6] 
Internet Draft             Information RPDL            September 2003 
 
 
    
   </portType> 
    
   <binding name="SearchBinding" type="typens:SearchPort"> 
   <soap:binding style="rpc" 
   transport="http://schemas.xmlsoap.org/soap/http" />  
   <operation name="basicSearch"> 
   <soap:operation soapAction=" searchAction" />  
   <input> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </input> 
   <output> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </output> 
    
   <operation name="advanceSearch"> 
   <soap:operation soapAction=" searchAction" />  
   <input> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </input> 
   <output> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </output> 
   </operation> 
    
   <operation name="fullSearch"> 
   <soap:operation soapAction=" searchAction" />  
   <input> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </input> 
   <output> 
   <soap:body use="encoded" namespace="databaseSearch" 
   encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" />  
   </output> 
    
   </binding> 
   <service name="SearchService"> 
   <port name="SearchPort" binding="typens: SearchBinding"> 
   <soap:address location=" " />  
   </port> 
   </service> 
   </definitions> 
    

 
 
WANG                     Expires - March 10,2004             [Page 7] 
Internet Draft             Information RPDL            September 2003 
 
 
3 the description of the search Webservice 
 
   To describe the search Webservice, we refer to the UDDI[4]. Search 
   Webservice Description, Discovery and Integration (SDDI) is proposed 
   in this part. SDDI will help the search system find and select the 
   appropriate data sources. 
    
3.1 The XML schema of the SDDI 
 
   We use the DC[5] standard to descript the character of the search 
   Webservice. 9 of 15 sub elements of DC are selected and divided into 
   three groups. The other basic information for a web service is also 
   added in the SDDI. Because all the search services use the uniform 
   standard Webservice, the business service, binding template and Model 
   in UDDI will be useless in SDDI. The information like BusinessEntity 
   in UDDI is enough to identify a search Webservice.  
    
   The elements and attributes to describe a search Webservice are 
   represented as follows. 
      
   1 content 
     
   Title: A name given to the resource. 
    
   Description: An account of the content of the resource. 
    
   Language: A language of the intellectual content of the resource. 
    
   2 copyright 
    
   Creator: An entity primarily responsible for making the content of 
   the resource. 
    
   Publisher: An entity responsible for making the resource available 
    
   Rights: Information about rights held in and over the resource.  
    
   3 characters  
    
   Date: A date of an event in the lifecycle of the resource 
    
   Format: The physical or digital manifestation of the resource. 
    
   Identifier: An unambiguous reference to the resource within a given 
   context.  
    
   4 UDDI: Key the UDDI content of this Webservice if available. 
    

 
 
WANG                     Expires - March 10,2004             [Page 8] 
Internet Draft             Information RPDL            September 2003 
 
 
   5 categorybag: This is an optional list of name-value pairs that are 
   used to tag a search Webservice with specific taxonomy information. 
   Some classification methods according to subjects can be adopted, 
   such as (CLC) Chinese Library Classification  (LCC) Library of 
   Congress Classification. 
    
   6 accesspoint: URL of this Webservice 
    
   The XML schema of SDDI as follow. 
    
   <element name = "sourceEntity"> 
   <complexType> 
   <sequence> 
    
    
   <element name= "content" > 
   <complexType> 
   <sequence> 
   <element name = "title" type="string"/ > 
   <element name = "description" type="string"/> 
   <element name = "language" type="language" minOccurs="1" 
   maxOccurs="unbounded"/> 
   </sequence> 
   </complexType> 
   </element> 
    
   <element name = "character" > 
   <complexType> 
   <sequence> 
   <element name = "date" type="date" /> 
   <element name = "formation" type="string" minOccurs="1" 
   maxOccurs="unbounded"/> 
   <element name = "identifer" type="string"/> 
   </sequence> 
   </complexType> 
   </element> 
    
   <element name = "copyright" > 
   <complexType> 
   <sequence> 
   <element name = "creator" type="string" minOccurs="1" 
   maxOccurs="unbounded"/> 
   <element name = "publisher" type="string" /> 
   <element name = "rights" type="string"/> 
   </sequence> 
   </complexType> 
   </element> 
    
   <element name = "categorybag" /> 
 
 
WANG                     Expires - March 10,2004             [Page 9] 
Internet Draft             Information RPDL            September 2003 
 
 
   <complexType> 
   <attribute name= "keyName"/> 
   <attribute name= "keyValue" use = "required"/> 
   </complexType> 
   </element> 
    
   <element ref = "UDDI" minOccurs="0" maxOccurs="1"/ > 
   </sequence> 
   <attribute name= "accesspoint" type="anyURL" use="required"/> 
    
   </complexType> 
   </element> 
    
3.2 The API of SDDI 
 
3.2.1 Publish 
 
   When a library purchase a database, a SDDI of this database will be 
   authorized at the same time and saved at the local servers. The 
   library can revise the SDDI itself according to its own needs. 
    
3.2.2 Inquiry API 
 
   Inquiry API will provide two simple functions that help the search 
   engine find the appreciate Webservice that matches the requirements 
   of users. The definition of element should refer the SDDI. Meanwhile, 
   the element can have complex structure.  
    
   1 find(element,value) 
    
   Return the accesspoint according to the element and its value. For 
   example: 
    
   find(copyright, "Tom", "publisher","right") 
   find(title, "ACM") 
    
   2 get(element1,value,element2) 
    
   Return the value of element2 according to the value of element1.For 
   example: 
    
   Get((title, "ACM", character) 
    
    
Security Considerations 
    
   The security considerations should refer to those of Webservice. 
   There are no any additional security concerns in this protocol. 
    
 
 
WANG                     Expires - March 10,2004            [Page 10] 
Internet Draft             Information RPDL            September 2003 
 
 
References
                     
    
   [1] Webservice , http://www.w3.org/2002/ws/ 
   [2] The web service of Google, http://www.google.com/apis/ 
   [3] WSDL, http://www.w3.org/TR/wsdl 
   [4] UDDI, http://www.uddi.org 
   [5] S. Weibel, J. Kunze, "Dublin Core Metadata for Resource 
       Discovery", rfc2413, September 1998. 
 
    
Author's Addresses 
    
      Wang liang 
      HUST 
      WUHAN 430074 
      China 
      Phone: 86-27-87553494 
      Email:wangliang_f@163.com 
    
    
      Guo YiPing    
      HUST 
      WUHAN 430074 
      China 
      Email:wangliang_f@163.com 
    
    
      Fang Ming   
      HUST 
      WUHAN 430074 
      China 
      Email:fangming_w@263.net 
    
      Xu Yuedong   
      HUST 
      WUHAN 430074 
      China 
      Email: xuyaodong2000@yahoo.com.cn 
     
 
 
 
 
 
 
 
 

 
 
WANG                     Expires - March 10,2004            [Page 11] 
Internet Draft             Information RPDL            September 2003 
 
 
 
Full Copyright Statement  
        
   Copyright (C) The Internet Society (2003).  All Rights Reserved.  
        
   This document and translations of it may be copied and furnished to 
   others, and derivative works that comment on or otherwise explain it  
   or assist in its implementation may be prepared, copied, published 
   and distributed, in whole or in part, without restriction of any  
   kind, provided that the above copyright notice and this paragraph are 
   included on all such copies and derivative works.  However, this 
   document itself may not be modified in any way, such as by removing 
   the copyright notice or references to the Internet Society or other  
   Internet organizations, except as needed for the purpose of 
   developing Internet standards in which case the procedures for 
   copyrights defined in the Internet Standards process must be followed, 
   or as required to translate it into languages other than English.  
        
   The limited permissions granted above are perpetual and will not be 
   revoked by the Internet Society or its successors or assigns.  
        
   This document and the information contained herein is provided on an  
   "AS IS" basis and THE INTERNET SOCIETY, THE INTERNET ENGINEERING  
   TASK FORCE, THE AUTHOR AND THE AUTHOR'EMPLOYER DISCLAIM ALL  
   WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY  
   WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE  
   ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 
   FOR A PARTICULAR PURPOSE.  
    




















 
 
WANG                     Expires - March 10,                [Page 12] 

PAFTECH AB 2003-20262026-04-24 09:06:10