The 4th International Conference on Natural Language Processing for Digital Humanities – NLP4DH 2024


The 4th International Conference on Natural Language Processing for Digital Humanities (NLP4DH 2024) will be organized together with EMNLP 2024. The proceedings of the conference will be published in the ACL anthology. The conference will take place in Miami, USA on November 16, 2024.

The focus of the conference is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or generation aspect. A list of suitable topics includes but is not limited to:


Paper submission

We solicit original and unpublished work related to digital humanities and natural language processing (NLP4DH). Short papers can be up to 4 pages in length and long papers up to 8 pages. Both submission formats can have an unlimited number of pages for references. All submissions must follow the ACL stylesheet (Overleaf template).

The submissions must be anonymous and they will be peer-reviewed by our program committee. The peer review is double blind.

Papers must be submitted using SoftConf by the submission deadline. At least one of the authors of an accepted paper must attend the event to present the paper. EMNLP 2024 is in charge of registration fees.

We also accept papers already reviewed in the ACL Rolling Review (ARR) that have not been committed to another venue. A paper may not be simultaneously under review through ARR and NLP4DH. A paper that has or will receive reviews through ARR may not be submitted for direct review to NLP4DH, but must use the ARR submission track on SoftConf and provide the URL to the OpenReview forum of the ARR submission (https://openreview.net/forum?id=XXXXXXXXXXX).

Accepted papers (short and long) will be published in the proceedings that will appear in the ACL Anthology. Accepted papers will also be given an additional page to address the reviewers’ comments. The length of a camera ready submission can then be 5 pages for a short paper and 9 for a long paper with an unlimited number of pages for references.

The authors of the accepted papers will be invited to submit an extended version of their paper to a special issue in the Journal of Data Mining & Digital Humanities.

Lightning talk submission

You may also contribute to the event by submitting a lightning talk. Lightning talks are submitted as 750-word abstracts using Google Forms. Lightning talks are suited for discussing ideas or presenting work in progress. The lightning proceedings will be published on Zenodo.

Important dates

All times are Anywhere on Earth (AoE).


If you have any questions, you can email mika.hamalainen@metropolia.fi


Schedule

16.11.2024 - All times are local time in Miami



9:00-9:10 Merrick 2

Opening words


9:10 - 10:30 Merrick 2

Oral session 1 - Chair: Mika Hämäläinen


9:10 - 9:30

Lightning talks


9:30 - 9:50

Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits - Emily Sofi Öhman and Aatu Liimatta


9:50 - 10:10

Tracing the Genealogies of Ideas with Sentence Embeddings - Lucian Li


10:10 - 10:30

Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark - Funing Yang and Carolyn Jane Anderson


10:30 - 11:00

Coffee break ☕


11:00 - 12:40 Merrick 2

Oral session 2 - Chair: So Miyagawa


11:00 - 11:20

Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis - Ray Umphrey, Jesse Roberts, Lindsey Roberts


11:20 - 11:40

Extracting Relations from Ecclesiastical Cultural Heritage Texts - Giulia Cruciani


11:40 - 12:00

Constructing a Sentiment-Annotated Corpus of Austrian Historical Newspapers: Challenges, Tools, and Annotator Experience - Lucija Krusic


12:00 - 12:20

It is a Truth Individually Acknowledged: Cross-references On Demand - Piper Vasicek, Courtni Byun, Kevin Seppi


12:20 - 12:40

Extracting Position Titles from Unstructured Historical Job Advertisements - Klara Venglarova, Raven Adam, Georg Vogeler


12:40 - 13:10

Lunch 🍔


13:10 - 15:30 Merrick 2

Oral session 3 - Chair: Emily Öhman


13:10 - 13:30

Language Resources From Prominent Born-Digital Humanities Texts are Still Needed in the Age of LLMs - Natalie Hervieux, Peiran Yao, Susan Brown, Denilson Barbosa


13:30 - 13:50

NLP for Digital Humanities: Processing Chronological Text Corpora - Adam Pawłowski, Tomasz Walkowiak


13:50 - 14:10

A Multi-task Framework with Enhanced Hierarchical Attention for Sentiment Analysis on Classical Chinese Poetry: Utilizing Information from Short Lines - Quanqi Du and Veronique Hoste


14:10 - 14:30

Exploring Similarity Measures and Intertextuality in Vedic Sanskrit Literature - So Miyagawa, Yuki Kyogoku, Yuzuki Tsukagoshi, Kyoko Amano


14:30 - 14:50

Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction - Laura Manrique-Gomez, Tony Montes, Arturo Rodriguez Herrera, Ruben Manrique


14:50 - 15:10

Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870--1900) - Pascale Feldkamp, Alie Lassche, Jan Kostkan, Márton Kardos, Kenneth Enevoldsen, Katrine Baunvig, Kristoffer Nielbo


15:10 - 15:30

Deciphering Psycho-Social Effects of Eating Disorder: Analysis of Reddit Posts using Large Language Models and Topic Modeling - Medini Chopra, Anindita Chatterjee, Lipika Dey, Partha Pratim Das


15:30 - 16:30 Riverfront Central

Posters and coffee ☕


15:30 - 16:30

Topic-Aware Causal Intervention for Counterfactual Detection - Thong Thanh Nguyen, Truc-My Nguyen


15:30 - 16:30

UD for German Poetry - Stefanie Dipper, Ronja Laarmann-Quante


15:30 - 16:30

Molyé: A Corpus-Based Approach to Language Contact in Colonial France - Rasul Dent, Juliette Janes, Thibault Clerice, Pedro Ortiz Suarez, Benoît Sagot


15:30 - 16:30

Improving Latin Dependency Parsing by Combining Treebanks and Predictions - Hanna-Mari Kristiina Kupari, Erik Henriksson, Veronika Laippala, Jenna Kanerva


15:30 - 16:30

From N-Grams to Pre-Trained Multilingual Models for Language Identification - Thapelo Andrew Sindane, Vukosi Marivate


15:30 - 16:30

Visualising Changes in Semantic Neighbourhoods of English Noun Compounds over Time - Malak Rassem, Myrto Tsigkouli, Chris W Jenkins, Filip Miletić, Sabine Schulte im Walde


15:30 - 16:30

SEFLAG: Systematic Evaluation Framework for NLP Models and Datasets in Latin and Ancient Greek - Konstantin Schulz, Florian Deichsler


15:30 - 16:30

A Two-Model Approach for Humour Style Recognition - Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat


15:30 - 16:30

N-Gram-Based Preprocessing for Sandhi Reversion in Vedic Sanskrit - Yuzuki Tsukagoshi, Ikki Ohmukai


15:30 - 16:30

Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams - Roberts Darģis, Guntis Bārzdiņš, Inguna Skadiņa, Baiba Saulite


15:30 - 16:30

Computational Methods for the Analysis of Complementizer Variability in Language and Literature: The Case of Hebrew "she-" and "ki" - Avi Shmidman, Aynat Rubinstein


15:30 - 16:30

From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations - Erik Henriksson, Amanda Myntti, Saara Hellström, Selcen Erten-Johansson, Anni Eskelinen, Liina Repo, Veronika Laippala


15:30 - 16:30

Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages - J. A. Meaney, Beatrice Alex, William Lamb


15:30 - 16:30

Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus - Craig Messner, Thomas Lippincott


15:30 - 16:30

Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts - Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, Taro Watanabe


15:30 - 16:30

Evaluating LLM Performance in Character Analysis: A Study of Artificial Beings in Recent Korean Science Fiction - Woori Jang, Seohyon Jung


15:30 - 16:30

Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin - Svetlana Gorovaia, Gleb Schmidt, Ivan P. Yamshchikov



16:30 - 17:30

Virtual posters (Zoom breakout rooms) - Chair: Yuri Bizzoni


16:30 - 17:30

Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models - Nikita Neveditsin, Ambuja Salgaonkar, Pawan Lingras, Vijay Mago


16:30 - 17:30

Enhancing Swedish Parliamentary Data: Annotation, Accessibility, and Application in Digital Humanities - Shafqat Mumtaz Virk, Claes Ohlsson, Nina Tahmasebi, Henrik Björck, Leif Runefelt


16:30 - 17:30

Adapting Measures of Literality for Use with Historical Language Data - Adam Roussel


16:30 - 17:30

Vector Poetics: Parallel Couplet Detection in Classical Chinese Poetry - Maciej Kurzynski, Xiaotong Xu, Yu Feng


16:30 - 17:30

Intersecting Register and Genre: Understanding the Contents of Web-Crawled Corpora - Amanda Myntti, Liina Repo, Elian Freyermuth, Antti Kanner, Veronika Laippala, Erik Henriksson


16:30 - 17:30

Text vs. Transcription: A Study of Differences Between the Writing and Speeches of U.S. Presidents - Mina Rajaei Moghadam, Mosab Rezaei, Gülşat Aygen, Reva Freedman


16:30 - 17:30

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language - Xinmeng Hou


16:30 - 17:30

Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration - Ryo Igarashi, So Miyagawa


16:30 - 17:30

Exploring Large Language Models for Qualitative Data Analysis - Tim Fischer, Chris Biemann


16:30 - 17:30

Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs - Chahan Vidal-Gorène, Nadi Tomeh, Victoria Khurshudyan


16:30 - 17:30

Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference for Cost-Effective Cultural Heritage Dataset Generation - William Thorne, Ambrose Robinson, Bohua Peng, Chenghua Lin, Diana Maynard


16:30 - 17:30

Assessing Large Language Models in Translating Coptic and Ancient Greek Ostraca - Audric-Charles Wannaz, So Miyagawa


16:30 - 17:30

The Social Lives of Literary Characters: Combining Citizen Science and Language Models to Understand Narrative Social Networks - Andrew Piper, Michael Xu, Derek Ruths


16:30 - 17:30

Multi-Word Expressions in Biomedical Abstracts and Their Plain English Adaptations - Sergei Bagdasarov, Elke Teich


16:30 - 17:30

Assessing the Performance of ChatGPT-4, Fine-Tuned BERT and Traditional ML Models on Moroccan Arabic Sentiment Analysis - Mohamed Hannani, Abdelhadi Soudi, Kristof Van Laerhoven


16:30 - 17:30

Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-Based User Embeddings - Mika Hämäläinen, Jack Rueter, Khalid Alnajjar


16:30 - 17:30

Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification - Keito Inoshita


16:30 - 17:30

Generating Interpretations of Policy Announcements - Andreas Marfurt, Ashley Thornton, David Sylvan, James Henderson


16:30 - 17:30

Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses - Erkki Mervaala, Ilona Kousa


16:30 - 17:30

CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration - Anton Eklund, Mona Forsman, Frank Drewes


16:30 - 17:30

Empowering Teachers with Usability-Oriented LLM-Based Tools for Digital Pedagogy - Melany Vanessa Macias, Lev Kharlashkin, Leo Einari Huovinen, Mika Hämäläinen


Organizers

Mika Hämäläinen

Metropolia University of Applied Sciences

Emily Öhman

Waseda University

So Miyagawa

The University of Tsukuba / National Institute for Japanese Language and Linguistics (NINJAL)

Khalid Alnajjar

F-Secure Oyj

Yuri Bizzoni

Aarhus University

Program committee