Using an Advanced Text Index Structure for Corpus Exploration in Digital Humanities

作者:Tobias Englmeier, CIS, Ludwig-Maximilians University, Munich, Germany
Marco Büchler, Institute of Computer Science, University of Göttingen, Göttingen, Germany
Stefan Gerdjikov, FMI, University of Sofia “St. Kliment Ohridski”, Sofia, Bulgaria
Klaus U. Schulz , CIS, Ludwig-Maximilians University, Munich, Germany

转载来源:Digital Humanities Quarterly, 2021, Volume 15 Number 1,



Tobias Englmeier

 Tobias Englmeier is a PhD candidate at the Centrum für Informations- und Sprachverarbeitung (CIS) at the Ludwig Maximilians University of Munich. His PhD project is centered around the topics of string matching and OCR postcorrection. Additionally he has been involved in the conception and implementation of numerous Digital Humanities projects coordinated by the IT Gruppe Geisteswissenschaften (ITG) at the Ludwig Maximilians University of Munich.

Marco Büchler 

Marco Büchler holds a Diploma in Computer Science. From 2006 to 2014 he worked as a Research Associate in the Natural Language Processing Group at Leipzig University. From April 2008 to March 2011 Marco served as the technical Project Manager for the eAQUA project and continued to work in that capacity for the following eTRACES project. In March 2013 he received his PhD in eHumanities. Since May 2014 he leads a Digital Humanities Research Group at the Göttingen Centre for Digital Humanities. His research includes Natural Language Processing on Big Humanities Data. Specifically, he works on Historical Text Reuse Detection and its application in the business world. In addition to his primary responsibilities, Marco manages the Medusa project (Big Scale co-occurrence and NGram framework) as well as the TRACER machine for detecting historical text reuse.

Stefan Gerdjikov 

Stefan Gerdjikov is an Assistent Professor at the Faculty for Informatics and Mathematics in the University of Sofia. He holds a PhD degree in Mathematics from the University of Sofia. His prime research area is Natural Language Processing where he studies approximate search techniques and index structures for text mining.

Klaus U. Schulz 

Klaus U. Schulz is Professor in Computational Linguitics and since 1992 the technical director of the Centrum für Informations- und Sprachverarbeitung (CIS) at the Ludwig Maximilians University of Munich. The work of Professor Schulz concentrates on Semantic Search, Construction of Ontologies and Taxonomies, Digital Libraries, Language Technology for Optical Character Recognition and Document Analysis and Finite-State Technology.