eScience Center and FRT use Machine Learning for Digital Textual Analysis

The Faculty of Religion and Theology has received a grant from the eScience Center for the project “Morphological Parser for Inflectional Languages Using Deep Learning”.

04/22/2021 | 5:05 PM

The project is based on the experience that has been built up in more than four decades of the computational linguistic analysis of the Hebrew Bible at the Eep Talstra Centre for Bible and Computer (ETCBC) and some of the unique aspects of the encoding of the ETCBC linguistic database of the Hebrew Bible. These unique aspects do justice to the fact that Biblical Hebrew, like many other Semitic languages, is an inflectional language with rich morphology.

Machine Learning to automate adding linguistic annotations 
What can be said with one word in Biblical Hebrew sometimes needs five or six words in an English translation. Therefore, to add linguistic annotations to a text, it is better to encode the smaller parts of a word (morphemes) rather than the complete words (as is usually done in, e.g., the preparation of English or Dutch text corpora). This, however, is very labour-intensive. The new project will endeavour to use Machine Learning to automate this process for Hebrew and Syriac texts. It also has the potential for the automatic encoding of other inflectional languages such as Arabic or Sanskrit.

The team 
The project team consists of Wido van Peursen (applicant), Constantijn Sikkel, Martijn Naaijer, Mathias Coeckelbergs, Cody Kingham, and computer scientists of the e-Science Center. The research will take place in the context of the research group “Digital Approaches to Sacred Texts”.

What is eScience Center? 
Van Peursen: “The eScience Center is a leading institution for the use of digital methods in academic research, empowering researchers across all disciplines. We are very grateful to have this opportunity to cooperate with the dedicated team of Research Software Engineers of the e-Science Center.”

