Using
Semantic Web Technologies to Build a Educational Knowledge Utility and Digital
Library
Department of Systems and
Computer Science
hkeeling@scs.howard.edu
Abstract. Over the past few years the Semantic
Web has grown into a useful tool for Web-based learning and teaching
technology. Many believe that if the information on the Web were harnessed and
processed into useful knowledge, its benefits to the educational field would be
unlimited. This paper describes a
research effort aimed at doing this. We have developed an advanced software
system that utilizes Semantic Web technologies. This paper discusses our project
and the knowledge-based system we constructed to convert the Freedmen’s
Bureau’s paper-based records into an educational knowledge base assessable by
teachers, students and researchers via the Web.
1. Introduction
The Semantic Web [2] was first
introduced by Tim Berners-Lee, inventor of the World Wide Web (WWW). His vision
is “The Semantic Web is an extension of the current Web in which information
(on Web pages) is given well-defined meaning. . . creating an environment where
software agents roaming from page to page can readily carry out sophisticated
tasks for users. In the near future, these developments will usher in
significant new functionality as machines become much better able to process
and ‘understand’ the data that they merely display at present.”[1]
The (WWW) is currently being used to
enhance knowledge delivery to both students and educators [7, 8]. Semantic Web
technologies, though still very much in their infancies, are a growing
extension to the WWW, where the human-readable information that is currently stored
on Web pages is extended and represented in such a way as to make it easily processible by computers.
As the word “semantic” might imply, the Semantic Web seeks to extend Web
pages by encoding a representation of the “meaning” of the information stored
on Web pages. An awareness of the educational potential of the Semantic Web
found its birth in the understanding that the Web has become the largest
repository of knowledge on earth.
The objective of this paper is to
highlight the progress made in an ongoing research project, called ‘The
Freedmen’s Bureau Project.’ Currently this effort is supported by
To achieve these objectives, we are
developing a Web-based, educational knowledge utility, the Advanced Knowledge
Acquisition and Dissemination System (AKADS). In the AKADS system, coded
information is stored in various locations in several forms. AKADS is comprised
of:
The enormous amount of historical
knowledge collected during the tenure of the Freedmen’s Bureau has been collected.
This knowledge is being preserved and disseminated through the operation and
ongoing development of AKADS,
2. History of the Freedmen’s
Bureau
In 1865, Congress established the Bureau of Refugees, Freedmen,
and
The records of the Freedmen’s Bureau are
a vital source of information for the educational community. These records
contain a wide range of data about the African-American experience during
slavery and freedom. All of these records are originals and because they are
deteriorating, require immediate attention. These records are an important link
for African-Americans to their slave and African ancestors. Preserving the
records of the Freedmen’s Bureau is a high priority for educators and students
interested in the history of the Civil War and post-Civil War eras.
The Freedmen’s Bureau documents, nearly a century and a half old,
rest in the narrow stacks at the
Though the current microfilming and indexing of the Freedmen’s Bureau
records by NARA and others will make some strides toward preserving this
valuable national asset, these efforts
will do little toward “making these records easily accessible to the public” as
The Freedmen’s Bureau
Records Preservation Act of 2000 calls for. Further, these efforts will not make these
records available to a wide variety of possible users.
3. The Freedmen’s Bureau Project
To meet the objectives outlined in The Freedmen’s Bureau Records
Preservation Act of 2000, we are creating AKADS; an online Web-based
information resource that not only stores and preserves the digitized images of
these records, but also offer a variety of search and retrieval services over
the Web. We employ the cutting edge technology of the Semantic Web to provide users
the ability to:
·
Search document images using a “smart” search engine that
employs modern artificial intelligence methods that takes into consideration
the semantic content or “meaning” of the data contained in the records.
·
Retrieve the images and related information about these
images in a variety of ways.
·
Annotate these records with their particular interpretation
and perspective, thus adding new information about these records to the
repository
·
Perform a number of functions including name/date/place matching
to relate disparate facts about events, individuals
and families
·
Provide a Web portal to other related sites and resources
AKADS is
comprised of three distributed components, each performing a key function in
the operation of the system. Figure 1
below illustrates the architecture of AKADS.
Figure 1: Advanced Knowledge Acquisition and Dissemination
System Architecture
The Knowledge
Acquisition component interfaces with knowledge engineers, information
architects, and library scientists, through a set of Web-based tools. These tools allow our research team to encode
and transcribe the knowledge contained in the Freedmen’s Bureau records. This
knowledge is being represented using the
Extensible Markup Language (XML), Web Ontology
Language (OWL) ontologies (vocabularies), and representation formalisms like DARPAs Agent Markup Language (DAML) and Resource Definition
Framework (RDF). The knowledge is being
preserved in the knowledge base of AKADS. The AKADS Knowledge Dissemination
component is comprised of intelligent agents that receive information queries
entered by users and then infer the appropriate retrieval from the AKADS
knowledge base. On receipt of a user
query, the knowledge Representation/ Preservation component responds with the
appropriate digital objects from the AKADS knowledge base. This system component allows users to conduct
research in a more comprehensive and intuitive manner than is currently
afforded by today’s traditional search engines.
Search criteria can be either entered directly into the interface via
selection from drop down selection lists or the criteria can be entered as a
key word search. Currently, we are
adding smart aspects to our search method that will “infer” auxiliary searches
that may be related to the users indicated criteria. Building on previous
research, an inference engine is being modified and retrofitted [5, 6].
4.
Active Research Areas
To achieve the
objectives of the Freedmen’s Bureau project several emerging technologies are
being examined. There are an enormous number of documents to be transformed and
converted. The size of the subsequent
knowledge acquisition effort is daunting. Clearly, new approaches, methods, and
software solutions will be required. To meet this challenge we are focusing our
research on the areas discussed in the following paragraphs.
Converting Cursive Writing to Text
Since the
records are all written cursively, it would facilitate the knowledge
acquisition process if some of the content of these documents could be
converted to text. In this area we are
testing several software solutions as well as the new “inking technologies”
like those employed on tablet PCs. Our project members are researching the
possibility that some portion of the writing contained in the letters, marriage
certificates, bank records, and other Freedmen’s Bureau documents may be
converted to ASCII text.

Figure 2: AKADS
subsystem for handwriting recognition
Automating Knowledge Acquisition
We are
investigating new methods for natural language recognition. Using the text from
letters in the Freedman’s Bureau documents, we have developed a working
prototype illustrated in the diagram to the right. We have written a period-specific
grammar and a Natural Language Processor (NLP). The NLP uses the WordNet ontology to generate Prolog statements. Next, the knowledge represented in Prolog
generated statements is processed and converted into a form suitable for the
semantic. The resulting knowledge base
is then accessible via user-generated natural language queries.

Figure 3:
Extracting knowledge from text
Building Web Services Using .NET
Technology
To create the
entire AKADS knowledge repository, the manpower of many individuals will be
needed. Clearly, the task of extracting
the knowledge contained in Freedmen’s Bureau records must be shared with as
many knowledge engineers and domain experts as possible. To facilitate the knowledge base building
effort a distributed, Web-accessible Knowledge Acquisition component has been
constructed. This system component is
the subject of continuing research. We continue to develop Web services to
support interaction with intelligent agents operating on our Lisp server. This distributed architecture allows knowledge
engineering to be done from all parts of the country and the world.

Figure 4: Web
services and .NET architecture
Building Ontologies
and Intelligent Agents
Both the Knowledge
Acquisition and Dissemination components require the use of ontologies.
Ontologies are machine-readable dictionaries that
define the shared vocabulary that system components use to communicate in AKADS
and in the Semantic Web. To facilitate the ontology and agent building effort,
we have expanded earlier research [4] to create a hybrid set of tools. A number
of tools have been developed by our team for building ontologies
including the one shown here. The
building of intelligent Agents for the Knowledge Dissemination component of
AKADS has begun using tools previously developed [3].
Figure 5: AKADS
ontology editor tool
HTML Markup Tools
The “digital objects” returned as a result of the user queries shown in
figure 1 are comprised of images of the Freedmen’s bureau documents and Web
pages written HTML and XML. The meaning
of the content of the images is represented in XML and added to the HTML file
using markup tools. We have developed a
Web scrapper tool to assist in gathering content knowledge from the HTML. Also, we are employing several tools to
generate “instances” from various ontologies. These
tools include
Protégé 2000, RIC, Photostuff, and SMORES, in
collaboration with the MINDSWAP Lab at the

Figure 6: HTML
markup tools
5.
Semantic Web Technologies
The development of Extensible Markup Language (XML)
has made a fundamental contribution towards the development of the Semantic Web
because it works as a data language that both machines and humans can easily
understand. XML is also the language of Web Services – applications that run on
the Internet. The Semantic Web combined with Web Services offers new
possibilities for communication and interaction. XML provides the ability to
define your own tags and attributes which HTML does not allow. The focus is on
‘what’ the content is versus ‘how’ it is presented. It is designed for storing,
delivering, and exchanging information among interactive Web applications
across the Internet.
The components of AKADS are hosted on various
servers and provide information and services to customers and users around the
world. Web Services are a new breed of Web application. They are
self-contained, self-describing, modular applications that can be published,
located, and invoked across the Web.One of the
reasons that we chose to develop Web services was to offer a higher level of
platform-independence and interoperability. By platform-independent we mean
that XML Web services that were built and are running on a specific
platform(e.g. Unix) can be called on by applications running on an unrelated
platform – for example Microsoft NT. There must, however, be a common messaging
protocol for applications to send and receive XML data. Simple Object Access
Protocol (SOAP) is the messaging protocol between computers. XML and SOAP are
the base technologies of Web Services. SOAP provides a mechanism for
communication between an XML Web service and its clients. It represents all
communications as XML, is platform independent and supports encoding of
information. SOAP messages can be sent via HTTP towards developing a search
engine based on the Web services model.
For the Semantic Web to perform meaningful
functions, computers must have access to information that is structured
according to specific rules. Intelligent agents require this structure so they
can perform automated reasoning. At present, the Resource Description Framework
(RDF) provides this structure. Meaning on the Semantic Web is expressed in RDF.
This meaning is expressed as a triple (subject, property/relation, and object).
The subject and object each have an identifier provided by a Universal Resource
Identifier (URI). These triples tying related things together and thus connect information
elements on the Semantic Web.
6. Conclusions
Our project team has developed a
technological infrastructure where we represent, encode, store, and publicize
knowledge about a significant number of historical artifacts in a way
consistent with contemporary Semantic Web technology. This technology includes
abstract representation of data compatible with the WWW and based on defined
representation standards. The resulting repository for historical knowledge,
supported by highly accessible knowledge acquisition and distribution tools, is
a noteworthy educational resource for students and educators. The AKADS system and its
related Semantic Web technologies show great promise for continued
contributions to the field of education.
Over the next few years this system will be expanded and the
technologies it uses will be advanced.
7. Bibliography