The 4th International Conference on Natural Language Processing for Digital Humanities – NLP4DH 2024

The 4th International Conference on Natural Language Processing for Digital Humanities (NLP4DH 2024) will be organized together with EMNLP 2024. The proceedings of the conference will be published in the ACL anthology. The conference will take place in Miami, USA on November 16, 2024.

The focus of the conference is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or generation aspect. A list of suitable topics includes but is not limited to:

Text analysis and processing related to humanities using computational methods
Thorough error analysis of an NLP system using (digital) humanities methods
Dataset creation and curation for NLP (e.g. digitization, digitalization, datafication, and data preservation).
Research on cultural heritage collections such as national archives and libraries using NLP
NLP for error detection, correction, normalization and denoising data
Generation and analysis of literary works such as poetry and novels
Analysis and detection of text genres

Lightning proceedings

Proceedings

Previous NLP4DH

NLP4DH 2025

Paper submission

We solicit original and unpublished work related to digital humanities and natural language processing (NLP4DH). Short papers can be up to 4 pages in length and long papers up to 8 pages. Both submission formats can have an unlimited number of pages for references. All submissions must follow the ACL stylesheet (Overleaf template).

The submissions must be anonymous and they will be peer-reviewed by our program committee. The peer review is double blind.

Papers must be submitted using SoftConf by the submission deadline. At least one of the authors of an accepted paper must attend the event to present the paper. EMNLP 2024 is in charge of registration fees.

We also accept papers already reviewed in the ACL Rolling Review (ARR) that have not been committed to another venue. A paper may not be simultaneously under review through ARR and NLP4DH. A paper that has or will receive reviews through ARR may not be submitted for direct review to NLP4DH, but must use the ARR submission track on SoftConf and provide the URL to the OpenReview forum of the ARR submission (https://openreview.net/forum?id=XXXXXXXXXXX).

Accepted papers (short and long) will be published in the proceedings that will appear in the ACL Anthology. Accepted papers will also be given an additional page to address the reviewers’ comments. The length of a camera ready submission can then be 5 pages for a short paper and 9 for a long paper with an unlimited number of pages for references.

The authors of the accepted papers will be invited to submit an extended version of their paper to a special issue in the Journal of Data Mining & Digital Humanities.

Lightning talk submission

You may also contribute to the event by submitting a lightning talk. Lightning talks are submitted as 750-word abstracts using Google Forms. Lightning talks are suited for discussing ideas or presenting work in progress. The lightning proceedings will be published on Zenodo.

Important dates

Direct paper submission (long and short): September 1, 2024
ARR commitment submission: September 22, 2024
Notification of acceptance (direct submissions): September 22, 2024
Notification of acceptance (ARR submissions): September 27, 2024
Camera ready deadline (direct and ARR): October 4, 2024
Conference: November 16, 2024

All times are Anywhere on Earth (AoE).

If you have any questions, you can email mika.hamalainen@metropolia.fi

Schedule

16.11.2024 - All times are local time in Miami

9:00-9:10 Merrick 2

Opening words

9:10 - 10:30 Merrick 2

Oral session 1 - Chair: Mika Hämäläinen

9:10 - 9:30

Lightning talks

9:30 - 9:50

Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits - Emily Sofi Öhman and Aatu Liimatta

9:50 - 10:10

Tracing the Genealogies of Ideas with Sentence Embeddings - Lucian Li

10:10 - 10:30

Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark - Funing Yang and Carolyn Jane Anderson

10:30 - 11:00

Coffee break ☕

11:00 - 12:40 Merrick 2

Oral session 2 - Chair: So Miyagawa

11:00 - 11:20

Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis - Ray Umphrey, Jesse Roberts, Lindsey Roberts

11:20 - 11:40

Extracting Relations from Ecclesiastical Cultural Heritage Texts - Giulia Cruciani

11:40 - 12:00

Constructing a Sentiment-Annotated Corpus of Austrian Historical Newspapers: Challenges, Tools, and Annotator Experience - Lucija Krusic

12:00 - 12:20

It is a Truth Individually Acknowledged: Cross-references On Demand - Piper Vasicek, Courtni Byun, Kevin Seppi

12:20 - 12:40

Extracting Position Titles from Unstructured Historical Job Advertisements - Klara Venglarova, Raven Adam, Georg Vogeler

12:40 - 13:10

Lunch 🍔

13:10 - 15:30 Merrick 2

Oral session 3 - Chair: Emily Öhman

13:10 - 13:30

Language Resources From Prominent Born-Digital Humanities Texts are Still Needed in the Age of LLMs - Natalie Hervieux, Peiran Yao, Susan Brown, Denilson Barbosa

13:30 - 13:50

NLP for Digital Humanities: Processing Chronological Text Corpora - Adam Pawłowski, Tomasz Walkowiak

13:50 - 14:10

A Multi-task Framework with Enhanced Hierarchical Attention for Sentiment Analysis on Classical Chinese Poetry: Utilizing Information from Short Lines - Quanqi Du and Veronique Hoste

14:10 - 14:30

Exploring Similarity Measures and Intertextuality in Vedic Sanskrit Literature - So Miyagawa, Yuki Kyogoku, Yuzuki Tsukagoshi, Kyoko Amano

14:30 - 14:50

Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction - Laura Manrique-Gomez, Tony Montes, Arturo Rodriguez Herrera, Ruben Manrique

14:50 - 15:10

Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870--1900) - Pascale Feldkamp, Alie Lassche, Jan Kostkan, Márton Kardos, Kenneth Enevoldsen, Katrine Baunvig, Kristoffer Nielbo

15:10 - 15:30

Deciphering Psycho-Social Effects of Eating Disorder: Analysis of Reddit Posts using Large Language Models and Topic Modeling - Medini Chopra, Anindita Chatterjee, Lipika Dey, Partha Pratim Das

15:30 - 16:30 Riverfront Central

Posters and coffee ☕

15:30 - 16:30

Topic-Aware Causal Intervention for Counterfactual Detection - Thong Thanh Nguyen, Truc-My Nguyen

15:30 - 16:30

UD for German Poetry - Stefanie Dipper, Ronja Laarmann-Quante

15:30 - 16:30

Molyé: A Corpus-Based Approach to Language Contact in Colonial France - Rasul Dent, Juliette Janes, Thibault Clerice, Pedro Ortiz Suarez, Benoît Sagot

15:30 - 16:30

Improving Latin Dependency Parsing by Combining Treebanks and Predictions - Hanna-Mari Kristiina Kupari, Erik Henriksson, Veronika Laippala, Jenna Kanerva

15:30 - 16:30

From N-Grams to Pre-Trained Multilingual Models for Language Identification - Thapelo Andrew Sindane, Vukosi Marivate

15:30 - 16:30

Visualising Changes in Semantic Neighbourhoods of English Noun Compounds over Time - Malak Rassem, Myrto Tsigkouli, Chris W Jenkins, Filip Miletić, Sabine Schulte im Walde

15:30 - 16:30

SEFLAG: Systematic Evaluation Framework for NLP Models and Datasets in Latin and Ancient Greek - Konstantin Schulz, Florian Deichsler

15:30 - 16:30

A Two-Model Approach for Humour Style Recognition - Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat

15:30 - 16:30

N-Gram-Based Preprocessing for Sandhi Reversion in Vedic Sanskrit - Yuzuki Tsukagoshi, Ikki Ohmukai

15:30 - 16:30

Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams - Roberts Darģis, Guntis Bārzdiņš, Inguna Skadiņa, Baiba Saulite

15:30 - 16:30

Computational Methods for the Analysis of Complementizer Variability in Language and Literature: The Case of Hebrew "she-" and "ki" - Avi Shmidman, Aynat Rubinstein

15:30 - 16:30

From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations - Erik Henriksson, Amanda Myntti, Saara Hellström, Selcen Erten-Johansson, Anni Eskelinen, Liina Repo, Veronika Laippala

15:30 - 16:30

Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages - J. A. Meaney, Beatrice Alex, William Lamb

15:30 - 16:30

Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus - Craig Messner, Thomas Lippincott

15:30 - 16:30

Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts - Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, Taro Watanabe

15:30 - 16:30

Evaluating LLM Performance in Character Analysis: A Study of Artificial Beings in Recent Korean Science Fiction - Woori Jang, Seohyon Jung

15:30 - 16:30

Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin - Svetlana Gorovaia, Gleb Schmidt, Ivan P. Yamshchikov

16:30 - 17:30

Virtual posters (Zoom breakout rooms) - Chair: Yuri Bizzoni

16:30 - 17:30

Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models - Nikita Neveditsin, Ambuja Salgaonkar, Pawan Lingras, Vijay Mago

16:30 - 17:30

Enhancing Swedish Parliamentary Data: Annotation, Accessibility, and Application in Digital Humanities - Shafqat Mumtaz Virk, Claes Ohlsson, Nina Tahmasebi, Henrik Björck, Leif Runefelt

16:30 - 17:30

Adapting Measures of Literality for Use with Historical Language Data - Adam Roussel

16:30 - 17:30

Vector Poetics: Parallel Couplet Detection in Classical Chinese Poetry - Maciej Kurzynski, Xiaotong Xu, Yu Feng

16:30 - 17:30

Intersecting Register and Genre: Understanding the Contents of Web-Crawled Corpora - Amanda Myntti, Liina Repo, Elian Freyermuth, Antti Kanner, Veronika Laippala, Erik Henriksson

16:30 - 17:30

Text vs. Transcription: A Study of Differences Between the Writing and Speeches of U.S. Presidents - Mina Rajaei Moghadam, Mosab Rezaei, Gülşat Aygen, Reva Freedman

16:30 - 17:30

Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language - Xinmeng Hou

16:30 - 17:30

Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration - Ryo Igarashi, So Miyagawa

16:30 - 17:30

Exploring Large Language Models for Qualitative Data Analysis - Tim Fischer, Chris Biemann

16:30 - 17:30

Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs - Chahan Vidal-Gorène, Nadi Tomeh, Victoria Khurshudyan

16:30 - 17:30

Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference for Cost-Effective Cultural Heritage Dataset Generation - William Thorne, Ambrose Robinson, Bohua Peng, Chenghua Lin, Diana Maynard

16:30 - 17:30

Assessing Large Language Models in Translating Coptic and Ancient Greek Ostraca - Audric-Charles Wannaz, So Miyagawa

16:30 - 17:30

The Social Lives of Literary Characters: Combining Citizen Science and Language Models to Understand Narrative Social Networks - Andrew Piper, Michael Xu, Derek Ruths

16:30 - 17:30

Multi-Word Expressions in Biomedical Abstracts and Their Plain English Adaptations - Sergei Bagdasarov, Elke Teich

16:30 - 17:30

Assessing the Performance of ChatGPT-4, Fine-Tuned BERT and Traditional ML Models on Moroccan Arabic Sentiment Analysis - Mohamed Hannani, Abdelhadi Soudi, Kristof Van Laerhoven

16:30 - 17:30

Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-Based User Embeddings - Mika Hämäläinen, Jack Rueter, Khalid Alnajjar

16:30 - 17:30

Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification - Keito Inoshita

16:30 - 17:30

Generating Interpretations of Policy Announcements - Andreas Marfurt, Ashley Thornton, David Sylvan, James Henderson

16:30 - 17:30

Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses - Erkki Mervaala, Ilona Kousa

16:30 - 17:30

CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration - Anton Eklund, Mona Forsman, Frank Drewes

16:30 - 17:30

Empowering Teachers with Usability-Oriented LLM-Based Tools for Digital Pedagogy - Melany Vanessa Macias, Lev Kharlashkin, Leo Einari Huovinen, Mika Hämäläinen

Organizers

Mika Hämäläinen

Metropolia University of Applied Sciences

Emily Öhman

Waseda University

So Miyagawa

The University of Tsukuba / National Institute for Japanese Language and Linguistics (NINJAL)

Khalid Alnajjar

F-Secure Oyj

Yuri Bizzoni

Aarhus University

Program committee

Joshua Wilbur, University of Tartu
Stefania Degaetano-Ortlieb, Saarland University
Luke Gessler, University of Colorado Boulder
Leo Leppänen, University of Helsinki
Quan Duong, University of Helsinki
Iana Atanassova, University of Franche-Comté
Won Ik Cho, Samsung
Tyler Shoemaker, Dartmouth College
Jouni Tuominen, University of Helsinki
Enrique Manjavacas, Arevalo University of Leiden
Kenichi Iwatsuki, Mirai Translate
Matej Martinc, Jožef Stefan Institute
Maciej Janicki, University of Helsinki
Shuo Zhang, Bose
Aynat Rubinstein, The Hebrew University of Jerusalem
Frederik Arnold, Humboldt University of Berlin
Thibault Clerice, National Institute for Research in Digital Science and Technology
Nicolas Gutehrlé, University Bourgogne Franche-Comté
Lama Alqazlan, University of Warwick
Lidia Pivovarova, University of Helsinki
Balázs Indig, Eötvös Loránd University
Pierre Magistry, Institut national des langues et civilisations orientales
Yoshifumi Kawasaki, The University of Tokyo
Anna Dmitrieva, University of Helsinki
Antti Kanner, University of Helsinki
Maria Antoniak, Allen Institute for AI
Katerina Korre, University of Bologna
Daniela Teodorescu, University of Alberta
Dongqi Pu, Saarland University
Nils Hjortnaes, Indiana University Bloomington
Noémi Ligeti-Nagy, Hungarian Research Centre for Linguistics
Allison Lahnala, University of Bonn
Gabriel Simmons, University of California, Davis
Vilja Hulden, University of Colorado Boulder
Jaihyun Park, Nanyang Technological University
Jonne Sälevä, Brandeis University
Martin Ruskov, University of Milan
Youngsook Song, Sionic AI
Pascale Moreira, Aarhus University
Maciej Kurzynski, Lingnan University
Aatu Liimatta, University of Helsinki
Sourav Das, Indian Institute of Information Technology Kalyani
Sebastian Oliver Eck, University of Music Franz Liszt Weimar
Elissa Nakajima, Wickham Waseda University
Nicole Miu Takagi, Waseda University
Ken Kawamura, Revelata Inc
Bo Dang, San Francisco Bay University
Jack Rueter, University of Helsinki

Page updated

Google Sites

Report abuse