Using Semantic Web Technologies to Build a Educational Knowledge Utility and Digital Library 

 

Harry N. Keeling, Ph.D.

Department of Systems and Computer Science

Howard University, Washington, DC 20059

2300 6th St., NW

Washington, DC 20059

hkeeling@scs.howard.edu

 


Abstract. Over the past few years the Semantic Web has grown into a useful tool for Web-based learning and teaching technology. Many believe that if the information on the Web were harnessed and processed into useful knowledge, its benefits to the educational field would be unlimited.  This paper describes a research effort aimed at doing this. We have developed an advanced software system that utilizes Semantic Web technologies. This paper discusses our project and the knowledge-based system we constructed to convert the Freedmen’s Bureau’s paper-based records into an educational knowledge base assessable by teachers, students and researchers via the Web. 

 

1. Introduction

 

The Semantic Web [2] was first introduced by Tim Berners-Lee, inventor of the World Wide Web (WWW). His vision is “The Semantic Web is an extension of the current Web in which information (on Web pages) is given well-defined meaning. . . creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users. In the near future, these developments will usher in significant new functionality as machines become much better able to process and ‘understand’ the data that they merely display at present.”[1]

 

The (WWW) is currently being used to enhance knowledge delivery to both students and educators [7, 8]. Semantic Web technologies, though still very much in their infancies, are a growing extension to the WWW, where the human-readable information that is currently stored on Web pages is extended and represented in such a way as to make it easily processible by computers.  As the word “semantic” might imply, the Semantic Web seeks to extend Web pages by encoding a representation of the “meaning” of the information stored on Web pages. An awareness of the educational potential of the Semantic Web found its birth in the understanding that the Web has become the largest repository of knowledge on earth.

 

The objective of this paper is to highlight the progress made in an ongoing research project, called ‘The Freedmen’s Bureau Project.’ Currently this effort is supported by Howard University, the National Archives and Records Administration (NARA) and Microsoft.  Its goal is to create an educational knowledge utility and digital library by transforming and publishing the knowledge contained in the decaying paper records collected during the activity of Freedmen’s Bureau. This interdisciplinary research effort seeks to meet two primary objectives:

  • To preserve and disseminate the documents of the Freedmen’s Bureau using .NET technology and the Semantic Web
  • To design, create, and maintain an information infrastructure that facilitates the representation, interpretation and use of the knowledge contained in these documents

 

To achieve these objectives, we are developing a Web-based, educational knowledge utility, the Advanced Knowledge Acquisition and Dissemination System (AKADS). In the AKADS system, coded information is stored in various locations in several forms. AKADS is comprised of:

  1. an innovative methodology for the acquisition, representation and dissemination of information from historical artifacts;
  2. a set of Web-based software tools that support the building and maintenance of this knowledge;
  3. a number of intelligent agents that facilitate user access and distribution of each artifact and its related knowledge; and
  4. a knowledge base for students, teachers and researchers.

 

The enormous amount of historical knowledge collected during the tenure of the Freedmen’s Bureau has been collected. This knowledge is being preserved and disseminated through the operation and ongoing development of AKADS,

 

 

 

2. History of the Freedmen’s Bureau

In 1865, Congress established the Bureau of Refugees, Freedmen, and Abandoned Lands, commonly referred to as the “Freedmen’s Bureau,” to supervise and manage all matters relating to the newly emancipated slaves, and to supervise abandoned and confiscated property. In the years following the Civil War, the Freedmen’s Bureau provided assistance to tens of thousands of former slaves making the transition from slavery to freedom. The bureau issued food and clothing, operated hospitals and refugee camps, established schools, and helped freedmen legalize marriages. It also supervised labor contracts, and worked with African American soldiers and sailors, and their heirs to secure back pay, bounty payments, and pensions.  The subsequent paper-based records are a rich resource of documentation of the Black experience in America.

The records of the Freedmen’s Bureau are a vital source of information for the educational community. These records contain a wide range of data about the African-American experience during slavery and freedom. All of these records are originals and because they are deteriorating, require immediate attention. These records are an important link for African-Americans to their slave and African ancestors. Preserving the records of the Freedmen’s Bureau is a high priority for educators and students interested in the history of the Civil War and post-Civil War eras.

 

The Freedmen’s Bureau Records Preservation Act of 2000

On September 12, 2000, in recognition of the value of this national treasure,  Ms. Millender-McDonald (for herself and Mr. Watts of Oklahoma) introduced The Freedmen’s Bureau Records Preservation Act of 2000 (H.R. 5157) in the House of Representatives. This bill set aside approximately 3 million dollars to be used in support of this effort.  The Act stated that the records of the Freedmen’s Bureau shall be preserved by using:

  1. available technology for restoration of the documents comprising these records so that they can be maintained for future generations; and
  2. innovative imaging and indexing technologies to make these records easily accessible to the public, including educators, students, historians, and genealogists.

 

 

Problems with Documents and Current Preservation Efforts

The Freedmen’s Bureau documents, nearly a century and a half old, rest in the narrow stacks at the National Archives Building in Washington, DC.  Over the years, educators, historians, social scientists, and genealogists have used these increasingly fragile records to study and document the social and economic experiences of Blacks in America, as well as the federal government’s policies toward them following the Civil War. In recent years, however, more and more researchers, with a great deal of frustration and varying degrees of success, have attempted to access these records. To do so, they had to come to Washington. This limited access has added to researcher aggravation, and with frequent handling, the original records have become even more fragile. Some are torn, others crumbling. Another problem with the Freedmen’s Bureau records has been the lack of personal name, place, or event indexes among the bureau’s files that would allow easy retrieval of these records. In the absence of such indexes, researchers can spend countless hours searching through records that may or may not contain information they seek concerning this period of American history.  Moreover, during the period when these records were created there was a shortage of paper.  Subsequently, many of these records contain writing on both sides of the paper, writing in the margins; some of the writing is upside-down and written between lines. 

The sheer volume of information stored in the records is daunting. The Bureau records hold hundreds of thousands of original documents, certificates, and letters that are stored in different states. This is a major hindrance to anyone trying to use the records. Not only are there numerous individual records to be considered, but also the records are all hand-written. This causes another problem because it is difficult to positively discern the text. One additional problem associated with the records is the illiteracy of the freed slaves and their inability to give the spelling of their names, thereby leaving the responsibility of determining the spelling of the person’s name up to the recorder. This causes the traversal of the information to be even more difficult as there might exist various spellings or variants of the same name, e.g. Susan, Susanne and Suzan. The process of extracting knowledge from these documents has proven to be extremely challenging.

Though the current microfilming and indexing of the Freedmen’s Bureau records by NARA and others will make some strides toward preserving this valuable national asset,  these efforts will do little toward “making these records easily accessible to the public” as The Freedmen’s Bureau Records Preservation Act of 2000 calls for.  Further, these efforts will not make these records available to a wide variety of possible users. 

 

3. The Freedmen’s Bureau Project

 

To meet the objectives outlined in The Freedmen’s Bureau Records Preservation Act of 2000, we are creating AKADS; an online Web-based information resource that not only stores and preserves the digitized images of these records, but also offer a variety of search and retrieval services over the Web. We employ the cutting edge technology of the Semantic Web to provide users the ability to:

·         Search document images using a “smart” search engine that employs modern artificial intelligence methods that takes into consideration the semantic content or “meaning” of the data contained in the records.

·         Retrieve the images and related information about these images in a variety of ways.

·         Annotate these records with their particular interpretation and perspective, thus adding new information about these records to the repository

·         Perform a number of functions including name/date/place matching to relate disparate facts about  events, individuals and families

·         Provide a Web portal to other related sites and resources

 

 

AKADS Architecture

 

AKADS is comprised of three distributed components, each performing a key function in the operation of the system.  Figure 1 below illustrates the architecture of AKADS.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 1: Advanced Knowledge Acquisition and Dissemination System Architecture

 

System Components

 

The Knowledge Acquisition component interfaces with knowledge engineers, information architects, and library scientists, through a set of Web-based tools.  These tools allow our research team to encode and transcribe the knowledge contained in the Freedmen’s Bureau records. This knowledge is being represented using the Extensible Markup Language (XML), Web Ontology Language (OWL) ontologies (vocabularies), and representation formalisms like DARPAs Agent Markup Language (DAML) and Resource Definition Framework (RDF).  The knowledge is being preserved in the knowledge base of AKADS. The AKADS Knowledge Dissemination component is comprised of intelligent agents that receive information queries entered by users and then infer the appropriate retrieval from the AKADS knowledge base.  On receipt of a user query, the knowledge Representation/ Preservation component responds with the appropriate digital objects from the AKADS knowledge base.  This system component allows users to conduct research in a more comprehensive and intuitive manner than is currently afforded by today’s traditional search engines.  Search criteria can be either entered directly into the interface via selection from drop down selection lists or the criteria can be entered as a key word search.  Currently, we are adding smart aspects to our search method that will “infer” auxiliary searches that may be related to the users indicated criteria. Building on previous research, an inference engine is being modified and retrofitted [5, 6].

 

 

4. Active Research Areas

To achieve the objectives of the Freedmen’s Bureau project several emerging technologies are being examined. There are an enormous number of documents to be transformed and converted.  The size of the subsequent knowledge acquisition effort is daunting. Clearly, new approaches, methods, and software solutions will be required. To meet this challenge we are focusing our research on the areas discussed in the following paragraphs.

 

Converting Cursive Writing to Text

Since the records are all written cursively, it would facilitate the knowledge acquisition process if some of the content of these documents could be converted to text.  In this area we are testing several software solutions as well as the new “inking technologies” like those employed on tablet PCs. Our project members are researching the possibility that some portion of the writing contained in the letters, marriage certificates, bank records, and other Freedmen’s Bureau documents may be converted to ASCII text.

 

 

 

 

 

 

 

 

 

 

 

 


Figure 2: AKADS subsystem for handwriting recognition

 

 

 

 

Automating Knowledge Acquisition

We are investigating new methods for natural language recognition. Using the text from letters in the Freedman’s Bureau documents, we have developed a working prototype illustrated in the diagram to the right. We have written a period-specific grammar and a Natural Language Processor (NLP). The NLP uses the WordNet ontology to generate Prolog statements.  Next, the knowledge represented in Prolog generated statements is processed and converted into a form suitable for the semantic.  The resulting knowledge base is then accessible via user-generated natural language queries.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 3: Extracting knowledge from text

 

 

Building Web Services Using .NET Technology

To create the entire AKADS knowledge repository, the manpower of many individuals will be needed.  Clearly, the task of extracting the knowledge contained in Freedmen’s Bureau records must be shared with as many knowledge engineers and domain experts as possible.  To facilitate the knowledge base building effort a distributed, Web-accessible Knowledge Acquisition component has been constructed.  This system component is the subject of continuing research. We continue to develop Web services to support interaction with intelligent agents operating on our Lisp server.  This distributed architecture allows knowledge engineering to be done from all parts of the country and the world.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 4: Web services and .NET architecture

 

Building Ontologies and Intelligent Agents                   

Both the Knowledge Acquisition and Dissemination components require the use of ontologies. Ontologies are machine-readable dictionaries that define the shared vocabulary that system components use to communicate in AKADS and in the Semantic Web. To facilitate the ontology and agent building effort, we have expanded earlier research [4] to create a hybrid set of tools. A number of tools have been developed by our team for building ontologies including the one shown here.  The building of intelligent Agents for the Knowledge Dissemination component of AKADS has begun using tools previously developed [3].

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5: AKADS ontology editor tool

HTML Markup Tools

The “digital objects” returned as a result of the user queries shown in figure 1 are comprised of images of the Freedmen’s bureau documents and Web pages written HTML and XML.  The meaning of the content of the images is represented in XML and added to the HTML file using markup tools.  We have developed a Web scrapper tool to assist in gathering content knowledge from the HTML.  Also, we are employing several tools to generate “instances” from various ontologies. These tools include Protégé 2000, RIC, Photostuff, and SMORES, in collaboration with the MINDSWAP Lab at the University of Maryland.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 6: HTML markup tools

 

 

5. Semantic Web Technologies

 

The development of Extensible Markup Language (XML) has made a fundamental contribution towards the development of the Semantic Web because it works as a data language that both machines and humans can easily understand. XML is also the language of Web Services – applications that run on the Internet. The Semantic Web combined with Web Services offers new possibilities for communication and interaction. XML provides the ability to define your own tags and attributes which HTML does not allow. The focus is on ‘what’ the content is versus ‘how’ it is presented. It is designed for storing, delivering, and exchanging information among interactive Web applications across the Internet.

 

The components of AKADS are hosted on various servers and provide information and services to customers and users around the world. Web Services are a new breed of Web application. They are self-contained, self-describing, modular applications that can be published, located, and invoked across the Web.One of the reasons that we chose to develop Web services was to offer a higher level of platform-independence and interoperability. By platform-independent we mean that XML Web services that were built and are running on a specific platform(e.g. Unix) can be called on by applications running on an unrelated platform – for example Microsoft NT. There must, however, be a common messaging protocol for applications to send and receive XML data. Simple Object Access Protocol (SOAP) is the messaging protocol between computers. XML and SOAP are the base technologies of Web Services. SOAP provides a mechanism for communication between an XML Web service and its clients. It represents all communications as XML, is platform independent and supports encoding of information. SOAP messages can be sent via HTTP towards developing a search engine based on the Web services model.

 

For the Semantic Web to perform meaningful functions, computers must have access to information that is structured according to specific rules. Intelligent agents require this structure so they can perform automated reasoning. At present, the Resource Description Framework (RDF) provides this structure. Meaning on the Semantic Web is expressed in RDF. This meaning is expressed as a triple (subject, property/relation, and object). The subject and object each have an identifier provided by a Universal Resource Identifier (URI). These triples tying related things together and thus connect information elements on the Semantic Web.

 

6. Conclusions

 

Our project team has developed a technological infrastructure where we represent, encode, store, and publicize knowledge about a significant number of historical artifacts in a way consistent with contemporary Semantic Web technology. This technology includes abstract representation of data compatible with the WWW and based on defined representation standards. The resulting repository for historical knowledge, supported by highly accessible knowledge acquisition and distribution tools, is a noteworthy educational resource for students and educators. The AKADS system and its related Semantic Web technologies show great promise for continued contributions to the field of education.  Over the next few years this system will be expanded and the technologies it uses will be advanced. 

 

7. Bibliography

 

 

  1. Berners-Lee T., Hendler, J., Lassila, O “The Semantic Web”, Scientific American, May 2001.

 

  1. Fensel, D., Hendler, J. Lieberman, H., and Wahlster, W. eds. Spinning The Semantic Web  Bringing the World Wide Web to its Full Potential , MIT Press Cambridge Massachusetts,London, England 2003.

 

  1. Keeling, H."Developing an Intelligent Educational Agent with Disciple ". In The International Journal of Artificial Intelligence in Education (IJAIED) Vol. 10.2 May, 1999.

 

  1. Keeling, H. "A Methodology for Building Intelligent Educational Agents", In Proceeding of the 9th International Conference on Artificial Intelligence in Education (AI-ED 99). Lemans, France, July 1999.

 

  1. Keeling, H. "Developing Intelligent Educational Agents with the Disciple Learning Agent Shell". In Proceeding of the 4th International Conference, ITS '98, San Antonio, Texas, Springer Verlag, Aug., 1998.

 

  1. Keeling, H. Contributing writer to "Building Intelligent Agents: An Apprenticeship Multistrategy Learning Theory, Methodology, Tool and Case Studies", Academic Press, 1998.

 

  1. McLemee, S. “Internet Studies 1.0: a Disciple is Born, The Chronicle of High Education, March 30, 2001, Washington, DC.

 

  1. Wolpers, M. “Towards P2P-based information systems for E-Leaning using semantic Web technologies”, In the proceedings on the First International Workshop on the Semantic Web for Web-based Learning, June 16-20, 2003 Klagenfurt/Velden, Austria.