Introduction



The principles of Kleio

  1. A source-oriented approach

  2. A logical environment

  3. Functionality

  4. Integration

  5. Compatibility

The Historical Workstation Project

  Lemmatisation

  StanFEP

  Kleio Image Analysis System (Kleio IAS)

Current versions of Kleio

How to use this book


Recent years have witnessed a lively debate in the historical computing community as to whether the relational database management systems that dominate the worlds of commerce and administration are appropriate to historical research. Closely related to that debate, and no less vigorously disputed, is the pragmatic question of whether historians who use computers in their research should produce their own software or leave it in the hands of commercial manufacturers. Currently it is considered sensible for historians to describe their problems to the developers, leaving the former to do their research and the latter to deliver a product.

One database management system has been created by an historian for historians. Since 1978 Dr Manfred Thaller of the Max-Planck-Institut für Geschichte in Göttingen has been developing and enhancing Kleio. In its original form (known as CLIO), this system could only be used on the UNIVAC 1100 series using a Latin command language and documentation that was available only in German. These features did not help to make it widely available. By 1987 a PC version was released, but with the same constraints of command language and documentation. It is therefore unsurprising . though regrettable . that while Kleio has been widely used for over a decade in the German-speaking historical world, among non-German speakers it is known only by reputation if at all.See footnote 1 The interest in the system as described in various English-language publications has been matched only by the frustration of those who are unable to try it out.

The English version of Kleio that is presented here is the product of an initiative to make Kleio more widely available at last. It has been made possible by the generosity of the Royal Historical Society, the British Academy, the Max-Planck-Institut für Geschichte at Göttingen, the Committee for Advanced Studies of the University of Southampton and the Faculty of Arts of Queen Mary & Westfield College, University of London. It is hoped that it will at least now be possible for historians to assess Kleio for themselves, and to make more informed judgements on the controversial issues raised above.



The principles of Kleio

Kleio, pronounced to rhyme with Ohio, is a complex and versatile system that has been designed specifically to cater for the computing needs of historians, particularly in those respects in which commercial software does not cater for them. As Kleio has been under continuous development since the late 1970s, the range of features offered and areas of historical computing covered have naturally expanded, and the implementation of new features has to an extent altered the direction of the project, which since 1989 has been known as `The Historical Workstation Project'. The underlying principles that motivated Kleio's development and have guided its implementation have, however, remained largely unchanged. They can be listed as follows:


1. A source-oriented approach

Kleio allows the historian to enter historical sources into the computer in a form which is as close as possible to that of the original material, preserving features within the data where conflicting interpretations are possible. At the most obvious level this means that, for example, original spelling can be retained, as can original currency. There are two implications of this principle. First, that a source-oriented approach to data processing should be followed, i.e. a minimal amount of coding or mark-up should be performed on a source before analysis. Second, that the methods of analysis of a particular source need not, indeed should not, be chosen before analysis. For example, Kleio allows the user to make decisions after data input about possible semantic differences in the written description of an individual's occupation due to spatial or temporal factors. Similarly, Kleio provides the facility to input unbroken strings of text so that information can be "automatically" abstracted from that text, so that the method of analysis does not depend on the method of entering the data.

In other words, Kleio allows the user to enter data in a format that underlies the source rather than the user's intentions with that source. This is made possible by allowing the user to input the data in a flexible format within a structure based on that source, rather than one designed with the principles of formal database design in mind.

In effect this means that Kleio has the ability to accept all forms of historical source material in a format that relates to the source. Historical sources can present information in forms that are very hard to reconcile with the conventions of a traditional database. On the simplest level, a census return may contain an entry containing two distinct occupations. This can be described as a multi-value variable. In Kleio these entries can be combined in such a way that they remain in context but can be made logically equivalent for processing. Elements can contain any number of entries, which in turn can have different aspects (e.g. the original spelling, or the editor's comments, can be stored

alongside the main version to be processed), views (e.g. Latin and vernacular equivalents of the same data could be stored as alternative views) and visibility (a quantitative estimate of the value or reliability of the information). These features allow fuzzy data (for example, a surname which might also represent an `occupation' . such as Fletcher) to be defined as such, the user then having to choose at the information retrieval stage how to interpret such data. Elements are contained in groups of information (not unlike records in relational database design) which are related to each other in a hierarchical fashion, i.e. are logically subordinate or superordinate to each other. However, these groups can contain elements with the same name, so that Kleio can `implicitly join' these related elements where `the system suspects that the user might eventually be interested to view the two of them as being one and the same type of reality'.See footnote 2 The size of a database is limited only by the capacity of the machine being used; the complexity of a database is virtually unlimited (elements can contain up to 2 million characters; there can be up to 32,000 different element names, and 32,000 different groups).

All these concepts add up to Kleio's data model which has been described as a `semantic network tempered by hierarchical considerations'.See footnote 3 This model seems most suited for the representation of complex historical data. Moving from the technical computing metaphor to the historical, it also displays close parallels with the concept of scholarly editing. It is no accident that recent literature on Kleio has stressed the concept of `The Database as Edition'.See footnote 4


2. A logical environment

Kleio provides a working environment where tools which implement solutions for history-specific problems can be developed and used. Examples of such problems might be the recognition of names with variant spellings or the solution of chronological problems encountered in historical data. Kleio contains a number of basic algorithms which allow it to handle unprocessed historical data. If we input data in a form as close to the source

as possible, it will raise certain historical problems which, while they could be untangled or solved by the historical researcher, would more efficiently and effectively be solved by computer. Kleio provides these tools in the form of algorithms which can be altered by the researcher to cope with the problems inherent in their particular form of data. These tools include a variant on the well-known Soundex algorithm along with an algorithm for the pre-treatment (i.e. before Soundex) of phonologically or orthographically similar names, the Guth algorithm (which quantifies the degree of similarity between two character strings in numerical terms) (both for nominal record linkage), algorithms to assist in the conversion of different calendar systems to a pre-defined format, coping with Roman, Mosaic, Islamic and Byzantine calendars as well as dates in the format <Period> <Feast Day> <Year> (e.g. `4 days before Maundy Thursday 1853'), and algorithms to cope with complex numbers mainly to assist in the interpretation of different currency systems.

Some of these algorithms reside in the system; others have to be more fully defined so that they can become an integral part of the database, affecting data only when required. This means that assumptions about historical data can be administered within a database but entirely independently of the data itself. These algorithms, which together amount to Kleio's logical environment, can be seen as a form of expert system, which can be developed to make full use of context-sensitive historical data.See footnote 5


3. Functionality

Kleio provides a full set of basic database operations, such as information retrieval and report generation, accessing the whole structure of a database which is based on an historical source. Kleio administers sources effectively and efficiently within databases, allowing the user to navigate easily through highly complex structures, and producing results to complex and unwieldy queries. Different databases can be linked (for example, two different databases can be accessed with a single query) or joined (in such a way as to allow data to be retrieved from either or both); queries can be framed in such a way as to produce output in a variety of formats.


4. Integration

Kleio aims to provide, where appropriate, simple integrated versions of applications which would usually have to be realised within different software modules. This goal has been achieved in two main areas, full-text analysis and mapping. Kleio can perform a limited number of operations relating to full-text material, including a system of embedded

classifications within a text, as well as searching facilities which allow the user to integrate full-text retrieval along with more structured material. Choropleth and distribution maps can be produced, provided that digitised coordinates are prepared for Kleio to administer.


5. Compatibility

Kleio provides interfaces to other general-purpose packages. For example, it is possible to extract material from structures of source-material into statistical cases, which can be immediately understood by statistical processing software such as SPSS and SAS. Kleio also transforms data within complex structures into a format that statistical applications can understand, using flexible information retrieval and report generation facilities and its ability to combine information from different subsets of a database.


The Historical Workstation Project

As the Kleio project broadened in scope, fanned out and developed, it took on the aspect of a collaborative venture, inspired and led from Göttingen by its creator, Manfred Thaller, but augmented by research contributions from elsewhere. The result is that a number of refinements, additional algorithms or related packages have been developed. This is not the place to trace the history of all these developments, several of which are still in progress. (Those interested will be able to find further information in the bibliography of English-language publications relating to Kleio at the end of this book.) Below are described those initiatives which have led to software releases which are already available.


Lemmatisation

The full-text system of Kleio has been enhanced by the integration of a Latin lemmatization program, developed in Rome, which is available on request.See footnote 6


StanFEP

Alongside Kleio, Manfred Thaller developed a program in the 1980s called the Standard Format Exchange Program. The purpose of this is to enable historians (and, of course, others) to mark up electronic documents in a way which keeps integrated the various versions that might be deemed necessary for the various tasks that the scholar might wish to perform (e.g. diplomatic transcription, pre-edition, coding for use in a database system, final edition). The current version of StanFEP is included on disk with Kleio. The software uses English commands and conventions, though unfortunately the manual and tutorial still await translation from German.See footnote 7 A second stage of development is currently in progress (a collaborative project of the Max-Planck-Institut für Geschichte, Göttingen, and the Historical Informatics Laboratory, Lomonossov University, Moscow).


Kleio Image Analysis System (Kleio IAS)

The most radical addition to the original scope of the software has been a program which applies the principles of Kleio to the processing of images. In collaboration with the Max-Planck-Institut für Geschichte, Göttingen, the Institut für Realienkunde des Mittelalters und der Frühen Neuzeit, Krems, has developed Kleio IAS. In this system the textual description of an image can be bound to the image itself (rather like hypertext) and displayed simultaneously together. A variety of tools for digital image analysis is also provided for the enhancement of images, for immediate image retrieval and most recently pattern recognition.See footnote 8


Current versions of Kleio

The principles of image processing are not so dissimilar to those for other types of data as to make it necessary for Kleio IAS to be a separate program; on the other hand image processing obviously makes much heavier demands of computing resources. As a consequence, Kleio currently runs in two versions.

*     Version 5.1.1 comprises the command language and the non-graphical menu systems and is machine independent. Because of this machine independence, if a database should become too large for a PC, a user can move to a more powerful one or to a mainframe to continue work.

*     Version 6.1.1 (Kleio IAS) is identical to Version 5.1.1 but limited in use to a number of UNIX platforms, and it also provides a graphical user interface which is geared to the handling and processing of images. The manual for Kleio IAS is however included with Version 5.1.1, because it contains a high proportion of material relating to the present version of the system, and because we believe that all Kleio users may find it helpful to see how Kleio handles image data. We also include this volume as a taster of the Windows NT version of Kleio which should be released during 1994, which will not just handle images but also provide a graphical interface for the whole of the Kleio software.

It should also be pointed out that in both 5.1.1 and 6.1.1 the Latin/German and the English versions are integrated; that is, the user is offered a choice of language at installation. It is expected that the Latin/German version will be supported for a period of three years only.


How to use this book

It will already be apparent to anyone who has browsed through the pages of this book that Kleio is both rich in what it offers and demanding of the user. Kleio cannot be `picked up' in an afternoon of dexterous exploration of pull-down menus and help screens. Some of the underlying concepts are complex, and there are no short cuts to learning the command language, or at least those parts of it which are relevant to the work of the researcher.

Some of the most successful Kleio users have been those who have taken one of the many Kleio courses that are run in various parts of Europe. The Kleio Support Team (see below, p. 345) will gladly provide details of such courses; increasingly they are being offered om

English, and are using the English data sets that have been prepared in connection with this volume and the current release of the software. This book is however intended as a practical guide for all those wishing to learn how to use Kleio, whether by taking a course or on their own. Each of the three parts of the book discusses separate aspects of Kleio.

*     Part I, "Getting Started", introduces the basic concepts and terminology of Kleio and the simple accessing of a Kleio database, and shows how simple queries can be effected.

*     Part II, "Kleio Basics", introduces the most frequently used features of Kleio. All the concepts described in this part are likely to be used on a regular basis. These include methods of creating a database, more complicated query facilities, the integration of knowledge within databases, the processing of textual material, the creation of `look-up tables' for the coding and classification of data, and also some features to produce more sophisticated output.

*     Part III, "Specialised Features", introduces more complicated features of Kleio, including nominal record linkage and automated cartography. It also introduces techniques for family reconstitution. Also included in this part are more advanced concepts of database design and an alternative method of constructing databases.

In each section the points made and the techniques introduced are illustrated by examples and exercises. The practice databases and exercise files are an integral part of the tutorial and should be installed with the software. The installation notes provided with the software will explain how to install the software and the tutorial files. Answers to the exercises can be found at the end of the book.

Finally, it is worth stressing that this book should be used in conjunction with the Reference Manual,See footnote 9 the full formal description of Kleio, which provides the fuller description of the many features of Kleio, including many which there was not space to describe here. The present volume provides frequent cross-references to the Reference Manual.


Footnote: 1 A French introduction was also produced in 1990; Josef Smets, Créer une base de données historiques avec Kleio. Halbgraue Reihe zur historischen Fachinformatik, A7 (St. Katharinen, 1990).
Footnote: 2 Manfred Thaller, `What is "Source Oriented Data Processing"; What is a "Historical Information Science"?', paper given to conference on `New Information Technologies in Historical Research and Teaching', June 1992 in Uzhgorod, Ukraine, published in Russian in Istoriia i comp'iuter. Novye informatsionnye tekhnologii v istoricheskikh issledovaniiakh i obrazovanii, eds. Leonid I. Borodkin & Wolfgang Levermann. Halbgraue Reihe zur historischen Fachinformatik, A15 (St. Katharinen, 1993), pp. 5_18. English typescript, p. 4.
Footnote: 3  Manfred Thaller, `The Historical Workstation Project', Computers and the Humanities, 25 (1991), pp. 149_62 (p. 155).
Footnote: 4 Ibid., pp. 156_59; see also Susanne Botzem, Ingo H. Kropac, `Integrated Computer Supported Editing, Approaches and Strategies', in Historical Social Research/Historische Sozialforschung, 16:4 (1991), pp. 106_15, and Susanne Botzem, Ingo H. Kropac, `As You Like It or Archiving, Editing and Analysing Medieval Manuscripts', in Histoire et Informatique. Ve Congrès `History & Computing', 4_7 Septembre 1990 à Montpellier, ed. J. Smets (Montpellier, 1992), pp. 267_78.
Footnote: 5 Manfred Thaller, `Databases and Expert Systems as Complementary Tools for Historical Research', Tijdschrift voor Geschiedenis, 103 (1990), pp. 233_47 (pp. 240_42).
Footnote: 6 Andrea Bozzi & Giuseppe Cappelli, `A Latin Morphological Analyser', in Data Base Oriented Source Editions. Papers from two sessions at the 23rd International Congress of Medieval Studies, Kalamazoo, 5_8 May 1988, ed. M. Thaller, pp. 47_54.
Footnote: 7 The manual is Kathrin Homann, StanFEP. Programm zur freien Konvertierung von Daten. Halbgraue Reihe zur historischen Fachinformatik , B6 (St. Katharinen, 1990); the tutorial is Martin Gierl, Thomas Grotum & Thomas Werner, Der Schritt von der Quelle zur historischen Datenbank. StanFEP: Ein Arbeitsbuch. Halbgraue Reihe zur historischen Fachinformatik , A6 (St. Katharinen, 1990). An English introduction is Kathrin Homann, `StanFEP . Standardization without Standards', in Histoire et Informatique. Ve Congrès "History & Computing", 4_7 Septembre 1990 à Montpellier, ed. J. Smets 1992), pp. 289_99.
Footnote: 8 The manual is Gerhard Jaritz, Images. A Primer of Computer-Supported Analysis with Kleio IAS. Halbgraue Reihe zur historischen Fachinformatik, A22 (St. Katharinen, 1993). See also Manfred Thaller, `The Processing of Manuscripts', in Images and Manuscripts in Historical Computing, ed. M. Thaller. Halbgraue Reihe zur historischen Fachinformatik, A14 (St. Katharinen, 1992), pp. 41_72, and idem, `The Archive on the Top of your Desk? On Self-Documenting Image Files', in Image Processing in History: towards Open Systems, eds. Jurij Fikfak & Gerhard Jaritz. Halbgraue Reihe zur historischen Fachinformatik, A16 (St. Katharinen, 1993), pp. 21_44.
Footnote: 9 Manfred Thaller, Kleio. A Database System. Halbgraue Reihe zur historischen Fachinformatik, B11 (St. Katharinen, 1993).