Seminario di Cultura Digitale – GROBID

Mercoledì 3 marzo 2021
Ioana Marasescu-Galleron (Université Sorbonne-Nouvelle)
GROBID Dictionaries and the annotation of the Dictionnaire universel (1701)

First published in 1690 by Antoine Furetière, the Dictionnaire universel is reprinted in a heavily corrected and enlarged version in 1701. With more than 3000 pages, the dictionary offers a wealth of informations not only about the 18th century French language, but also about the trends, discoveries and mentalities of the era.
Annotating by hand this complex work is an extremely time consuming task. In order to speed the acquisition of data, the French ANR project BasNum decided to use the GROBID-dictionaries module. GROBID is a CRF based system, implemented in Java, allowing to automatically extract TEI structures from various document types.
In spite of the inconsistencies of our historical dictionary, GROBID-dictionaries helped us to produce a working document split in entries, senses, etymologies and other fields. In this presentation, I will detail our workflow, insisting on the specificities and difficulties of each step, and including a short demonstration about how the system works.