Technische Universität München
Institut für Informatik
VD 17 - Cooperative Cataloging in a Scalable
Digital Library System
Institut für Informatik
der Technischen Universität München
VD 17 - Cooperative Cataloging in a Scalable Digital
Library System
Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des Akademischen Grades eines
Doktors
der Naturwissenschaften
genehmigten Dissertation.
Vorsitzender: Univ.-Prof. Dr. Christoph Zenger
Prüfer der Dissertation:
1. Univ.-Prof. Rudolf Bayer, Ph.D.
2. Hon.-Prof. Dr. Albert Endres,
Universität Stuttgart
Die Dissertation wurde am 12.10.1999 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 10.12.1999 angenommen.
This work has been performed during my position
as research scientist from 1996 to 1999 at FORWISS, Bavarian Research Center
for Knowledge-Based Systems in Munich Germany. In particular, this thesis is
based on experiences and ideas gained in the long-term digital library project
in Germany - VD17, funded by the Deutsche
Forschungsgemeinschaft (German Research Foundation). I would like to
thank all who have contributed to my success in this endeavor.
Firstly, my special thanks are owed to my
advisor and the initiator of the VD17 project, Prof. Rudolf Bayer, Ph.D., who
is incredibly busy, but always available for fruitful discussions and great guidance.
Further, I would like to thank the co-advisor Prof. Dr. Albert Endres for his
interest in participating in the dissertation committee.
I am grateful to all my colleagues of the
Knowledge Base research group for the wonderful working atmosphere; especially
Dr. Peter Baumann, Ulrike Sommer,
Wolfgang Wohner and Paula Furtado for their cooperative and constructive
remarks. I really hope that I will work again with every single one of these
incredible people.
A number of other people have made important
contributions as well, in particular Dr. Stephan Wiesener from Anderson
Consulting for his encouragement and constructive discussions; Dr. Marianne
Dörr from the Bavarian State Library for her great collaboration in the VD17
project; Dr. Thomas Baker from Asian Information Technology Thailand and GMD
Bonn; Dr. Shigeo Sugimoto from ULIS Japan and Pavel Vogel from Institut für
Informatik of the Technische Universität München for their review and helpful
comments to produce the best possible dissertation; Liselotte Steinherr-Isen
for her linguistic help; Otto Krischer for the support in the implementation of
the digital library system VD17, and many other colleagues and friends from
different working groups and international Digital Libraries activities who have
helped me broaden and deepen my understanding of Digital Libraries and shape
the research topics and perspectives mentioned in this work.
I would not have finished this thesis without
the support and the energy my wife Luisa
gave me. Last but not least, my special thanks and admiration go to my
parents Miloud and Mimouna without whom this would never have been
possible.
I alone have made this dissertation, but
together all of these exceptional people have made me.
With the advent of WWW, several collections of
information and multimedia applications are easily accessible worldwide.
Digital libraries play a key role in integrating all those heterogeneous,
multi-format, multimedia and multilingual information resources. In the
framework of the VD17 project, a digital library system has been developed, in
which all prints of the 17th Century published in German-speaking
countries are being cataloged, partially digitized, and made publicly
accessible via the Internet. The synchronization of online transactions of
catalogers and the scalability were of big importance in this R&D project.
Therefore, in this thesis a scalable system architecture has been implemented
which aims at improving the working performance and providing the opportunity
for other interested partners to join this project. Furthermore, a collection
of benchmark programs has been developed to measure the best case performance
of this approach.
Mit der Erfindung des WWW sind viele Informationen und Multimedia-Anwendungen weltweit und einfach zugänglich. Digitale Bibliotheken spielen eine entscheidende Rolle für die Integration heterogener und mehrsprachiger Multimedia-Informationsressourcen. Im Projekt VD17 wurde ein Bibliothekssystem entwickelt, in dem alle Drucke des 17. Jahrhunderts aus dem damaligen deutschen Sprachraum kooperativ erfaßt, teilweise digitalisiert und öffentlich über das Internet zur Verfügung gestellt werden. Ein Synchronisationskonzept für die kooperative Katalogisierung und die Skalierbarkeit des Systems sind in diesem F&E-Projekt von großer Bedeutung. Dazu wurde eine skalierbare Systemarchitektur implementiert, die eine Steigerung der Performance und die Teilnahme weiterer Bibliotheken an dem Projekt ermöglicht. Weiterhin wurden verschiedene Benchmarking-Programme entwickelt, um diese Systemarchitektur zu evaluieren und die beste Konfiguration zu identifizieren.
1. Introduction 1
1.1. Motivation and
Goal of this Thesis 1
1.2. Outline of the
Subsequent Chapters.. 4
2. The Field of
Digital Libraries 7
2.1 Historical
Development............... 8
2.2 An Overview of
Selected DL Projects. 10
3. System Architecture of Digital Libraries................. 21
3.1 Information
Formats. 21
3.2 Distribution............. 25
3.3 Information
Discovery and Retrieval............. 26
3.4 Multilingual
Information Retrieval and Access.. 29
3.5 Metadata 31
3.6 Methods of
Federations............. 34
3.7 User Interface 37
3.8 Storage Techniques............. 38
3.9 Preservation............. 40
3.10 Summary 41
4. The Digital Library
Project VD17........ 45
4.1 Background............. 46
4.2 System
Architecture of VD17 47
4.2.1 MAB Format 49
4.2.2 MAB Database......... 54
4.2.3 Document
Database......... 58
4.2.4 Import and
Export Interfaces......... 66
4.2.5 Update Manager......... 71
4.2.6 Image Management Component....... 82
4.2.7 Web Gateway......... 84
4.2.8 User Interface......... 86
4.3 User Studies and
Evaluation 88
4.4 Summary 92
5.
Scalability and Tuning. 95
5.1 Introduction............. 95
5.2 Basic Tuning
Principles 97
5.3 Discussion of Relevant
Solutions 99
5.3.1 Optimistic
Concurrency Control......... 99
5.3.2 Object
Versioning....... 101
5.3.3 Structured Lock
Protocols....... 101
5.3.4 Hierarchical locks....... 102
5.4 Summary........... 103
6. A Partitioning Approach 105
6.1 Description........... 105
6.2 A Scalable
Architecture........... 107
6.2.1 Database Partitioning...... 108
6.2.2 Partition
Manager....... 111
6.2.3 Update Manager....... 112
6.2.4 Query Manager....... 113
6.3 Synchronization
and Locking Concept 113
6.4 Measurements......... 122
7. Conclusion............... 131
7.1 Summary........... 131
7.2 Limitations and
Perspectives........... 135
8. Bibliography............... 141