Fabrizio Sebastiani – Supervised Machine Learning for Written Text

Fabrizio Sebastiani, ISTI-CNR

Supervised Machine Learning for the Analysis and Management Written Text. The ISTI-CNR experience.

Seminario di Cultura Digitale – Mercoledì 5 ottobre 2022

Many text analysis tasks are either expensive, or time-consuming, or tedious, or difficult to carry out; examples are (a) assigning subject codes, or thesaurus entries (from a predefined taxonomy) to scientific papers, (b) guessing, among a set of candidates, the most likely author of a text of unknown or disputed paternity, (c) determining whether a textual comment (on a product, on a political candidate, etc.) conveys a positive or a negative opinion about its subject. Can these tasks be automated, so that high volumes of text can be processed without effort and in no time?
Alternatively, can we build tools that support the work of humans who manually carry out these tasks? These are the goals of machine learning, possibly the most important subdiscipline of artificial intelligence, when applied to automatic text analysis. In this lecture I will give an overview of the work that the AI4Text research group at ISTI-CNR is carrying out, especially focusing on tasks such as (a) automatically guessing the author of a text of uncertain or disputed paternity, (b) supporting (via “technology-assisted review” techniques) the work of humans who need to annotate large quantities of textual documents, and (c) automatically classifying texts written in resource-poor languages (e.g., Pashto) by piggybacking on resource-rich ones (e.g., English).