1. A source-oriented approach
  2. A logical environment
  3. Functionality
  4. Integration
  5. Compatibility
The Historical Workstation Project
  Lemmatisation
  StanFEP
  Kleio Image Analysis System (Kleio IAS)
Recent years have witnessed a lively debate in the historical computing community as to
whether the relational database management systems that dominate the worlds of
commerce and administration are appropriate to historical research. Closely related to that
debate, and no less vigorously disputed, is the pragmatic question of whether historians
who use computers in their research should produce their own software or leave it in the
hands of commercial manufacturers. Currently it is considered sensible for historians to
describe their problems to the developers, leaving the former to do their research and the
latter to deliver a product.
One database management system has been created by an historian for historians. Since
1978 Dr Manfred Thaller of the Max-Planck-Institut für Geschichte in Göttingen has been
developing and enhancing Kleio. In its original form (known as CLIO), this system could
only be used on the UNIVAC 1100 series using a Latin command language and
documentation that was available only in German. These features did not help to make it
widely available. By 1987 a PC version was released, but with the same constraints of
command language and documentation. It is therefore unsurprising . though regrettable
. that while Kleio has been widely used for over a decade in the German-speaking
historical world, among non-German speakers it is known only by reputation if at all.See footnote 1 The
interest in the system as described in various English-language publications has been
matched only by the frustration of those who are unable to try it out.
The English version of Kleio that is presented here is the product of an initiative to make Kleio more widely available at last. It has been made possible by the generosity of the Royal Historical Society, the British Academy, the Max-Planck-Institut für Geschichte at Göttingen, the Committee for Advanced Studies of the University of Southampton and the Faculty of Arts of Queen Mary & Westfield College, University of London. It is hoped that it will at least now be possible for historians to assess Kleio for themselves, and to make more informed judgements on the controversial issues raised above.
The principles of Kleio
Kleio, pronounced to rhyme with Ohio, is a complex and versatile system that has been
designed specifically to cater for the computing needs of historians, particularly in those
respects in which commercial software does not cater for them. As Kleio has been under
continuous development since the late 1970s, the range of features offered and areas of
historical computing covered have naturally expanded, and the implementation of new
features has to an extent altered the direction of the project, which since 1989 has been
known as `The Historical Workstation Project'. The underlying principles that motivated
Kleio's development and have guided its implementation have, however, remained largely
unchanged. They can be listed as follows:
1. A source-oriented approach
Kleio allows the historian to enter historical sources into the computer in a form which is
as close as possible to that of the original material, preserving features within the data
where conflicting interpretations are possible. At the most obvious level this means that,
for example, original spelling can be retained, as can original currency. There are two
implications of this principle. First, that a source-oriented approach to data processing
should be followed, i.e. a minimal amount of coding or mark-up should be performed on
a source before analysis. Second, that the methods of analysis of a particular source need
not, indeed should not, be chosen before analysis. For example, Kleio allows the user to
make decisions after data input about possible semantic differences in the written
description of an individual's occupation due to spatial or temporal factors. Similarly,
Kleio provides the facility to input unbroken strings of text so that information can be
"automatically" abstracted from that text, so that the method of analysis does not depend
on the method of entering the data.
In other words, Kleio allows the user to enter data in a format that underlies the source
rather than the user's intentions with that source. This is made possible by allowing the
user to input the data in a flexible format within a structure based on that source, rather
than one designed with the principles of formal database design in mind.
In effect this means that Kleio has the ability to accept all forms of historical source material in a format that relates to the source. Historical sources can present information in forms that are very hard to reconcile with the conventions of a traditional database. On the simplest level, a census return may contain an entry containing two distinct occupations. This can be described as a multi-value variable. In Kleio these entries can be combined in such a way that they remain in context but can be made logically equivalent for processing. Elements can contain any number of entries, which in turn can have different aspects (e.g. the original spelling, or the editor's comments, can be stored
alongside the main version to be processed), views (e.g. Latin and vernacular equivalents
of the same data could be stored as alternative views) and visibility (a quantitative estimate
of the value or reliability of the information). These features allow fuzzy data (for example,
a surname which might also represent an `occupation' . such as Fletcher) to be defined as
such, the user then having to choose at the information retrieval stage how to interpret
such data. Elements are contained in groups of information (not unlike records in relational
database design) which are related to each other in a hierarchical fashion, i.e. are logically
subordinate or superordinate to each other. However, these groups can contain elements
with the same name, so that Kleio can `implicitly join' these related elements where `the
system suspects that the user might eventually be interested to view the two of them as
being one and the same type of reality'.See footnote 2 The size of a database is limited only by the
capacity of the machine being used; the complexity of a database is virtually unlimited
(elements can contain up to 2 million characters; there can be up to 32,000 different
element names, and 32,000 different groups).
All these concepts add up to Kleio's data model which has been described as a `semantic
network tempered by hierarchical considerations'.See footnote 3 This model seems most suited for the
representation of complex historical data. Moving from the technical computing metaphor
to the historical, it also displays close parallels with the concept of scholarly editing. It is
no accident that recent literature on Kleio has stressed the concept of `The Database as
Edition'.See footnote 4
2. A logical environment
Kleio provides a working environment where tools which implement solutions for history-specific problems can be developed and used. Examples of such problems might be the
recognition of names with variant spellings or the solution of chronological problems
encountered in historical data. Kleio contains a number of basic algorithms which allow
it to handle unprocessed historical data. If we input data in a form as close to the source
as possible, it will raise certain historical problems which, while they could be untangled
or solved by the historical researcher, would more efficiently and effectively be solved by
computer. Kleio provides these tools in the form of algorithms which can be altered by the
researcher to cope with the problems inherent in their particular form of data. These tools
include a variant on the well-known Soundex algorithm along with an algorithm for the
pre-treatment (i.e. before Soundex) of phonologically or orthographically similar names,
the Guth algorithm (which quantifies the degree of similarity between two character strings
in numerical terms) (both for nominal record linkage), algorithms to assist in the
conversion of different calendar systems to a pre-defined format, coping with Roman,
Mosaic, Islamic and Byzantine calendars as well as dates in the format <Period> <Feast
Day> <Year> (e.g. `4 days before Maundy Thursday 1853'), and algorithms to cope with
complex numbers mainly to assist in the interpretation of different currency systems.
Some of these algorithms reside in the system; others have to be more fully defined so that
they can become an integral part of the database, affecting data only when required. This
means that assumptions about historical data can be administered within a database but
entirely independently of the data itself. These algorithms, which together amount to
Kleio's logical environment, can be seen as a form of expert system, which can be developed
to make full use of context-sensitive historical data.See footnote 5
3. Functionality
Kleio provides a full set of basic database operations, such as information retrieval and
report generation, accessing the whole structure of a database which is based on an
historical source. Kleio administers sources effectively and efficiently within databases,
allowing the user to navigate easily through highly complex structures, and producing
results to complex and unwieldy queries. Different databases can be linked (for example,
two different databases can be accessed with a single query) or joined (in such a way as
to allow data to be retrieved from either or both); queries can be framed in such a way as
to produce output in a variety of formats.
4. Integration
Kleio aims to provide, where appropriate, simple integrated versions of applications which
would usually have to be realised within different software modules. This goal has been
achieved in two main areas, full-text analysis and mapping. Kleio can perform a limited
number of operations relating to full-text material, including a system of embedded
classifications within a text, as well as searching facilities which allow the user to integrate
full-text retrieval along with more structured material. Choropleth and distribution maps
can be produced, provided that digitised coordinates are prepared for Kleio to administer.
5. Compatibility
Kleio provides interfaces to other general-purpose packages. For example, it is possible to
extract material from structures of source-material into statistical cases, which can be
immediately understood by statistical processing software such as SPSS and SAS. Kleio
also transforms data within complex structures into a format that statistical applications
can understand, using flexible information retrieval and report generation facilities and its
ability to combine information from different subsets of a database.
The Historical Workstation Project
As the Kleio project broadened in scope, fanned out and developed, it took on the aspect
of a collaborative venture, inspired and led from Göttingen by its creator, Manfred Thaller,
but augmented by research contributions from elsewhere. The result is that a number of
refinements, additional algorithms or related packages have been developed. This is not
the place to trace the history of all these developments, several of which are still in
progress. (Those interested will be able to find further information in the bibliography
of English-language publications relating to Kleio at the end of this book.) Below are
described those initiatives which have led to software releases which are already available.
Lemmatisation
The full-text system of Kleio has been enhanced by the integration of a Latin
lemmatization program, developed in Rome, which is available on request.See footnote 6
StanFEP
Alongside Kleio, Manfred Thaller developed a program in the 1980s called the Standard
Format Exchange Program. The purpose of this is to enable historians (and, of course,
others) to mark up electronic documents in a way which keeps integrated the various
versions that might be deemed necessary for the various tasks that the scholar might wish
to perform (e.g. diplomatic transcription, pre-edition, coding for use in a database system,
final edition). The current version of StanFEP is included on disk with Kleio. The software
uses English commands and conventions, though unfortunately the manual and tutorial
still await translation from German.See footnote 7 A second stage of development is currently in
progress (a collaborative project of the Max-Planck-Institut für Geschichte, Göttingen, and
the Historical Informatics Laboratory, Lomonossov University, Moscow).
Kleio Image Analysis System (Kleio IAS)
The most radical addition to the original scope of the software has been a program which
applies the principles of Kleio to the processing of images. In collaboration with the Max-Planck-Institut für Geschichte, Göttingen, the Institut für Realienkunde des Mittelalters und
der Frühen Neuzeit, Krems, has developed Kleio IAS. In this system the textual
description of an image can be bound to the image itself (rather like hypertext) and
displayed simultaneously together. A variety of tools for digital image analysis is also
provided for the enhancement of images, for immediate image retrieval and most recently
pattern recognition.See footnote 8
Current versions of Kleio
The principles of image processing are not so dissimilar to those for other types of data as
to make it necessary for Kleio IAS to be a separate program; on the other hand image
processing obviously makes much heavier demands of computing resources. As a
consequence, Kleio currently runs in two versions.
*
Version 5.1.1 comprises the command language and the non-graphical menu
systems and is machine independent. Because of this machine independence, if
a database should become too large for a PC, a user can move to a more
powerful one or to a mainframe to continue work.
*
Version 6.1.1 (Kleio IAS) is identical to Version 5.1.1 but limited in use to a
number of UNIX platforms, and it also provides a graphical user interface
which is geared to the handling and processing of images. The manual for
Kleio IAS is however included with Version 5.1.1, because it contains a high
proportion of material relating to the present version of the system, and because
we believe that all Kleio users may find it helpful to see how Kleio handles
image data. We also include this volume as a taster of the Windows NT version
of Kleio which should be released during 1994, which will not just handle
images but also provide a graphical interface for the whole of the Kleio
software.
It should also be pointed out that in both 5.1.1 and 6.1.1 the Latin/German and the English
versions are integrated; that is, the user is offered a choice of language at installation. It
is expected that the Latin/German version will be supported for a period of three years
only.
How to use this book
It will already be apparent to anyone who has browsed through the pages of this book that
Kleio is both rich in what it offers and demanding of the user. Kleio cannot be `picked up'
in an afternoon of dexterous exploration of pull-down menus and help screens. Some of
the underlying concepts are complex, and there are no short cuts to learning the command
language, or at least those parts of it which are relevant to the work of the researcher.
Some of the most successful Kleio users have been those who have taken one of the many Kleio courses that are run in various parts of Europe. The Kleio Support Team (see below, p. 345) will gladly provide details of such courses; increasingly they are being offered om
English, and are using the English data sets that have been prepared in connection with
this volume and the current release of the software. This book is however intended as a
practical guide for all those wishing to learn how to use Kleio, whether by taking a course
or on their own. Each of the three parts of the book discusses separate aspects of Kleio.
*
Part I, "Getting Started", introduces the basic concepts and terminology of Kleio
and the simple accessing of a Kleio database, and shows how simple queries
can be effected.
*
Part II, "Kleio Basics", introduces the most frequently used features of Kleio.
All the concepts described in this part are likely to be used on a regular basis.
These include methods of creating a database, more complicated query facilities,
the integration of knowledge within databases, the processing of textual
material, the creation of `look-up tables' for the coding and classification of data,
and also some features to produce more sophisticated output.
*
Part III, "Specialised Features", introduces more complicated features of Kleio,
including nominal record linkage and automated cartography. It also introduces
techniques for family reconstitution. Also included in this part are more
advanced concepts of database design and an alternative method of constructing
databases.
In each section the points made and the techniques introduced are illustrated by examples
and exercises. The practice databases and exercise files are an integral part of the tutorial and
should be installed with the software. The installation notes provided with the software
will explain how to install the software and the tutorial files. Answers to the exercises can
be found at the end of the book.
Finally, it is worth stressing that this book should be used in conjunction with the Reference Manual,See footnote 9 the full formal description of Kleio, which provides the fuller description of the many features of Kleio, including many which there was not space to describe here. The present volume provides frequent cross-references to the Reference Manual.