Technische Universität München

Institut für Informatik

 

 

 

 

 

VD 17 - Cooperative Cataloging in a Scalable

 Digital Library System

 

 

 

 

Dissertation

 

 

 

 

 

El Hachmi  Haddouti

 

 

 

 

 

 

 

 

 

Institut für Informatik

der Technischen Universität München

 

 

 

VD 17 - Cooperative Cataloging in a Scalable Digital

 Library System

 

 

 

El Hachmi   Haddouti

 

 

Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des Akademischen Grades eines

 

Doktors der Naturwissenschaften

 

genehmigten Dissertation.

 

 

Vorsitzender:                   Univ.-Prof. Dr. Christoph Zenger 

Prüfer der Dissertation:

1.      Univ.-Prof. Rudolf Bayer, Ph.D.

2.      Hon.-Prof. Dr. Albert Endres,
Universität Stuttgart

 

 

Die Dissertation wurde am 12.10.1999 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 10.12.1999 angenommen.

 

 

Acknowledgements

 

This work has been performed during my position as research scientist from 1996 to 1999 at FORWISS, Bavarian Research Center for Knowledge-Based Systems in Munich Germany. In particular, this thesis is based on experiences and ideas gained in the long-term digital library project in Germany - VD17, funded by the Deutsche Forschungsgemeinschaft (German Research Foundation). I would like to thank all who have contributed to my success in this endeavor.

Firstly, my special thanks are owed to my advisor and the initiator of the VD17 project, Prof. Rudolf Bayer, Ph.D., who is incredibly busy, but always available for fruitful discussions and great guidance. Further, I would like to thank the co-advisor Prof. Dr. Albert Endres for his interest in participating in the dissertation committee.

I am grateful to all my colleagues of the Knowledge Base research group for the wonderful working atmosphere; especially Dr. Peter Baumann, Ulrike Sommer,  Wolfgang Wohner and Paula Furtado for their cooperative and constructive remarks. I really hope that I will work again with every single one of these incredible people.

A number of other people have made important contributions as well, in particular Dr. Stephan Wiesener from Anderson Consulting for his encouragement and constructive discussions; Dr. Marianne Dörr from the Bavarian State Library for her great collaboration in the VD17 project; Dr. Thomas Baker from Asian Information Technology Thailand and GMD Bonn; Dr. Shigeo Sugimoto from ULIS Japan and Pavel Vogel from Institut für Informatik of the Technische Universität München for their review and helpful comments to produce the best possible dissertation; Liselotte Steinherr-Isen for her linguistic help; Otto Krischer for the support in the implementation of the digital library system VD17, and many other colleagues and friends from different working groups and international Digital Libraries activities who have helped me broaden and deepen my understanding of Digital Libraries and shape the research topics and perspectives mentioned in this work.

I would not have finished this thesis without the support and the energy my wife Luisa   gave me. Last but not least, my special thanks and admiration go to my parents Miloud and Mimouna without whom this would never have been possible.  

I alone have made this dissertation, but together all of these exceptional people have made me.

 

 

 

Abstract

 

With the advent of WWW, several collections of information and multimedia applications are easily accessible worldwide. Digital libraries play a key role in integrating all those heterogeneous, multi-format, multimedia and multilingual information resources. In the framework of the VD17 project, a digital library system has been developed, in which all prints of the 17th Century published in German-speaking countries are being cataloged, partially digitized, and made publicly accessible via the Internet. The synchronization of online transactions of catalogers and the scalability were of big importance in this R&D project. Therefore, in this thesis a scalable system architecture has been implemented which aims at improving the working performance and providing the opportunity for other interested partners to join this project. Furthermore, a collection of benchmark programs has been developed to measure the best case performance of this approach.

 

 

 

Kurzfassung

 

Mit der Erfindung des WWW sind viele Informationen und Multimedia-Anwendungen weltweit und einfach zugänglich. Digitale Bibliotheken spielen eine entscheidende Rolle für die Integration heterogener und mehrsprachiger  Multimedia-Informationsressourcen.  Im Projekt VD17 wurde ein Bibliothekssystem entwickelt, in dem alle Drucke des 17. Jahrhunderts aus dem damaligen deutschen Sprachraum kooperativ erfaßt, teilweise digitalisiert und öffentlich über das Internet zur Verfügung gestellt werden. Ein Synchronisationskonzept für die kooperative Katalogisierung und die Skalierbarkeit des Systems sind in diesem F&E-Projekt von großer Bedeutung. Dazu wurde eine skalierbare Systemarchitektur implementiert, die eine Steigerung der Performance und die Teilnahme weiterer Bibliotheken an dem Projekt ermöglicht. Weiterhin wurden verschiedene Benchmarking-Programme entwickelt, um diese Systemarchitektur zu evaluieren und die beste Konfiguration zu identifizieren. 

  

 

 

 

 

 

 

 

 

Table of Contents

 

 

 

 

1. Introduction        1

1.1. Motivation and Goal of this Thesis 1

1.2. Outline of the Subsequent Chapters.. 4

 

2. The Field of Digital Libraries        7

2.1 Historical Development............... 8

2.2 An Overview of Selected DL Projects. 10

 

3.  System Architecture of Digital Libraries................. 21

3.1 Information Formats. 21

3.2 Distribution............. 25

3.3 Information Discovery and Retrieval............. 26

3.4 Multilingual Information Retrieval and Access.. 29

3.5  Metadata 31

3.6 Methods of Federations............. 34

3.7 User Interface 37

3.8 Storage Techniques............. 38

3.9  Preservation............. 40

3.10 Summary 41

 

 

4. The Digital Library Project VD17........ 45

4.1 Background............. 46

4.2 System Architecture of VD17 47

4.2.1 MAB Format 49

4.2.2 MAB Database......... 54

4.2.3 Document Database......... 58

4.2.4 Import and Export Interfaces......... 66

4.2.5 Update Manager......... 71

4.2.6 Image  Management Component....... 82

4.2.7 Web Gateway......... 84

4.2.8 User Interface......... 86

4.3 User Studies and Evaluation 88

4.4 Summary 92

 

5. Scalability and Tuning. 95

5.1 Introduction............. 95

5.2 Basic Tuning Principles 97

5.3 Discussion of Relevant Solutions 99

5.3.1 Optimistic Concurrency Control......... 99

5.3.2 Object Versioning....... 101

5.3.3 Structured Lock Protocols....... 101

5.3.4 Hierarchical locks....... 102

5.4 Summary........... 103

 

6.  A Partitioning Approach 105

6.1 Description........... 105

6.2 A Scalable Architecture........... 107

6.2.1 Database  Partitioning...... 108

6.2.2 Partition Manager....... 111

 

 

6.2.3 Update Manager....... 112

6.2.4 Query Manager....... 113

6.3 Synchronization and Locking Concept 113

6.4 Measurements......... 122

 

7. Conclusion............... 131

7.1 Summary........... 131

7.2 Limitations and Perspectives........... 135

 

8. Bibliography............... 141