The 4th International Conference on Natural Language Processing for Digital Humanities – NLP4DH 2024
The 4th International Conference on Natural Language Processing for Digital Humanities (NLP4DH 2024) will be organized together with EMNLP 2024. The proceedings of the conference will be published in the ACL anthology. The conference will take place in Miami, USA on November 16, 2024.
The focus of the conference is on applying natural language processing techniques to digital humanities research. The topics can be anything of digital humanities interest with a natural language processing or generation aspect. A list of suitable topics includes but is not limited to:
Text analysis and processing related to humanities using computational methods
Thorough error analysis of an NLP system using (digital) humanities methods
Dataset creation and curation for NLP (e.g. digitization, digitalization, datafication, and data preservation).
Research on cultural heritage collections such as national archives and libraries using NLP
NLP for error detection, correction, normalization and denoising data
Generation and analysis of literary works such as poetry and novels
Analysis and detection of text genres
Paper submission
We solicit original and unpublished work related to digital humanities and natural language processing (NLP4DH). Short papers can be up to 4 pages in length and long papers up to 8 pages. Both submission formats can have an unlimited number of pages for references. All submissions must follow the ACL stylesheet (Overleaf template).
The submissions must be anonymous and they will be peer-reviewed by our program committee. The peer review is double blind.
Papers must be submitted using SoftConf by the submission deadline. At least one of the authors of an accepted paper must attend the event to present the paper. EMNLP 2024 is in charge of registration fees.
We also accept papers already reviewed in the ACL Rolling Review (ARR) that have not been committed to another venue. A paper may not be simultaneously under review through ARR and NLP4DH. A paper that has or will receive reviews through ARR may not be submitted for direct review to NLP4DH, but must use the ARR submission track on SoftConf and provide the URL to the OpenReview forum of the ARR submission (https://openreview.net/forum?id=XXXXXXXXXXX).
Accepted papers (short and long) will be published in the proceedings that will appear in the ACL Anthology. Accepted papers will also be given an additional page to address the reviewers’ comments. The length of a camera ready submission can then be 5 pages for a short paper and 9 for a long paper with an unlimited number of pages for references.
The authors of the accepted papers will be invited to submit an extended version of their paper to a special issue in the Journal of Data Mining & Digital Humanities.
Lightning talk submission
You may also contribute to the event by submitting a lightning talk. Lightning talks are submitted as 750-word abstracts using Google Forms. Lightning talks are suited for discussing ideas or presenting work in progress. The lightning proceedings will be published on Zenodo.
Important dates
Direct paper submission (long and short): September 1, 2024
ARR commitment submission: September 22, 2024
Notification of acceptance (direct submissions): September 22, 2024
Notification of acceptance (ARR submissions): September 27, 2024
Camera ready deadline (direct and ARR): October 4, 2024
Conference: November 16, 2024
All times are Anywhere on Earth (AoE).
If you have any questions, you can email mika.hamalainen@metropolia.fi
Schedule
16.11.2024 - All times are local time in Miami
9:00-9:10 Merrick 2
Opening words
9:10 - 10:30 Merrick 2
Oral session 1 - Chair: Mika Hämäläinen
9:10 - 9:30
Lightning talks
9:30 - 9:50
Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits - Emily Sofi Öhman and Aatu Liimatta
9:50 - 10:10
Tracing the Genealogies of Ideas with Sentence Embeddings - Lucian Li
10:10 - 10:30
Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark - Funing Yang and Carolyn Jane Anderson
10:30 - 11:00
Coffee break ☕
11:00 - 12:40 Merrick 2
Oral session 2 - Chair: So Miyagawa
11:00 - 11:20
Investigating Expert-in-the-Loop LLM Discourse Patterns for Ancient Intertextual Analysis - Ray Umphrey, Jesse Roberts, Lindsey Roberts
11:20 - 11:40
Extracting Relations from Ecclesiastical Cultural Heritage Texts - Giulia Cruciani
11:40 - 12:00
Constructing a Sentiment-Annotated Corpus of Austrian Historical Newspapers: Challenges, Tools, and Annotator Experience - Lucija Krusic
12:00 - 12:20
It is a Truth Individually Acknowledged: Cross-references On Demand - Piper Vasicek, Courtni Byun, Kevin Seppi
12:20 - 12:40
Extracting Position Titles from Unstructured Historical Job Advertisements - Klara Venglarova, Raven Adam, Georg Vogeler
12:40 - 13:10
Lunch 🍔
13:10 - 15:30 Merrick 2
Oral session 3 - Chair: Emily Öhman
13:10 - 13:30
Language Resources From Prominent Born-Digital Humanities Texts are Still Needed in the Age of LLMs - Natalie Hervieux, Peiran Yao, Susan Brown, Denilson Barbosa
13:30 - 13:50
NLP for Digital Humanities: Processing Chronological Text Corpora - Adam Pawłowski, Tomasz Walkowiak
13:50 - 14:10
A Multi-task Framework with Enhanced Hierarchical Attention for Sentiment Analysis on Classical Chinese Poetry: Utilizing Information from Short Lines - Quanqi Du and Veronique Hoste
14:10 - 14:30
Exploring Similarity Measures and Intertextuality in Vedic Sanskrit Literature - So Miyagawa, Yuki Kyogoku, Yuzuki Tsukagoshi, Kyoko Amano
14:30 - 14:50
Historical Ink: 19th Century Latin American Spanish Newspaper Corpus with LLM OCR Correction - Laura Manrique-Gomez, Tony Montes, Arturo Rodriguez Herrera, Ruben Manrique
14:50 - 15:10
Canonical Status and Literary Influence: A Comparative Study of Danish Novels from the Modern Breakthrough (1870--1900) - Pascale Feldkamp, Alie Lassche, Jan Kostkan, Márton Kardos, Kenneth Enevoldsen, Katrine Baunvig, Kristoffer Nielbo
15:10 - 15:30
Deciphering Psycho-Social Effects of Eating Disorder: Analysis of Reddit Posts using Large Language Models and Topic Modeling - Medini Chopra, Anindita Chatterjee, Lipika Dey, Partha Pratim Das
15:30 - 16:30 Riverfront Central
Posters and coffee ☕
15:30 - 16:30
Topic-Aware Causal Intervention for Counterfactual Detection - Thong Thanh Nguyen, Truc-My Nguyen
15:30 - 16:30
UD for German Poetry - Stefanie Dipper, Ronja Laarmann-Quante
15:30 - 16:30
Molyé: A Corpus-Based Approach to Language Contact in Colonial France - Rasul Dent, Juliette Janes, Thibault Clerice, Pedro Ortiz Suarez, Benoît Sagot
15:30 - 16:30
Improving Latin Dependency Parsing by Combining Treebanks and Predictions - Hanna-Mari Kristiina Kupari, Erik Henriksson, Veronika Laippala, Jenna Kanerva
15:30 - 16:30
From N-Grams to Pre-Trained Multilingual Models for Language Identification - Thapelo Andrew Sindane, Vukosi Marivate
15:30 - 16:30
Visualising Changes in Semantic Neighbourhoods of English Noun Compounds over Time - Malak Rassem, Myrto Tsigkouli, Chris W Jenkins, Filip Miletić, Sabine Schulte im Walde
15:30 - 16:30
SEFLAG: Systematic Evaluation Framework for NLP Models and Datasets in Latin and Ancient Greek - Konstantin Schulz, Florian Deichsler
15:30 - 16:30
A Two-Model Approach for Humour Style Recognition - Mary Ogbuka Kenneth, Foaad Khosmood, Abbas Edalat
15:30 - 16:30
N-Gram-Based Preprocessing for Sandhi Reversion in Vedic Sanskrit - Yuzuki Tsukagoshi, Ikki Ohmukai
15:30 - 16:30
Evaluating Open-Source LLMs in Low-Resource Languages: Insights from Latvian High School Exams - Roberts Darģis, Guntis Bārzdiņš, Inguna Skadiņa, Baiba Saulite
15:30 - 16:30
Computational Methods for the Analysis of Complementizer Variability in Language and Literature: The Case of Hebrew "she-" and "ki" - Avi Shmidman, Aynat Rubinstein
15:30 - 16:30
From Discrete to Continuous Classes: A Situational Analysis of Multilingual Web Registers with LLM Annotations - Erik Henriksson, Amanda Myntti, Saara Hellström, Selcen Erten-Johansson, Anni Eskelinen, Liina Repo, Veronika Laippala
15:30 - 16:30
Testing and Adapting the Representational Abilities of Large Language Models on Folktales in Low-Resource Languages - J. A. Meaney, Beatrice Alex, William Lamb
15:30 - 16:30
Examining Language Modeling Assumptions Using an Annotated Literary Dialect Corpus - Craig Messner, Thomas Lippincott
15:30 - 16:30
Evaluating Language Models in Location Referring Expression Extraction from Early Modern and Contemporary Japanese Texts - Ayuki Katayama, Yusuke Sakai, Shohei Higashiyama, Hiroki Ouchi, Ayano Takeuchi, Ryo Bando, Yuta Hashimoto, Toshinobu Ogiso, Taro Watanabe
15:30 - 16:30
Evaluating LLM Performance in Character Analysis: A Study of Artificial Beings in Recent Korean Science Fiction - Woori Jang, Seohyon Jung
15:30 - 16:30
Sui Generis: Large Language Models for Authorship Attribution and Verification in Latin - Svetlana Gorovaia, Gleb Schmidt, Ivan P. Yamshchikov
16:30 - 17:30
Virtual posters (Zoom breakout rooms) - Chair: Yuri Bizzoni
16:30 - 17:30
Classification of Buddhist Verses: The Efficacy and Limitations of Transformer-Based Models - Nikita Neveditsin, Ambuja Salgaonkar, Pawan Lingras, Vijay Mago
16:30 - 17:30
Enhancing Swedish Parliamentary Data: Annotation, Accessibility, and Application in Digital Humanities - Shafqat Mumtaz Virk, Claes Ohlsson, Nina Tahmasebi, Henrik Björck, Leif Runefelt
16:30 - 17:30
Adapting Measures of Literality for Use with Historical Language Data - Adam Roussel
16:30 - 17:30
Vector Poetics: Parallel Couplet Detection in Classical Chinese Poetry - Maciej Kurzynski, Xiaotong Xu, Yu Feng
16:30 - 17:30
Intersecting Register and Genre: Understanding the Contents of Web-Crawled Corpora - Amanda Myntti, Liina Repo, Elian Freyermuth, Antti Kanner, Veronika Laippala, Erik Henriksson
16:30 - 17:30
Text vs. Transcription: A Study of Differences Between the Writing and Speeches of U.S. Presidents - Mina Rajaei Moghadam, Mosab Rezaei, Gülşat Aygen, Reva Freedman
16:30 - 17:30
Mitigating Biases to Embrace Diversity: A Comprehensive Annotation Benchmark for Toxic Language - Xinmeng Hou
16:30 - 17:30
Enhancing Neural Machine Translation for Ainu-Japanese: A Comprehensive Study on the Impact of Domain and Dialect Integration - Ryo Igarashi, So Miyagawa
16:30 - 17:30
Exploring Large Language Models for Qualitative Data Analysis - Tim Fischer, Chris Biemann
16:30 - 17:30
Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs - Chahan Vidal-Gorène, Nadi Tomeh, Victoria Khurshudyan
16:30 - 17:30
Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference for Cost-Effective Cultural Heritage Dataset Generation - William Thorne, Ambrose Robinson, Bohua Peng, Chenghua Lin, Diana Maynard
16:30 - 17:30
Assessing Large Language Models in Translating Coptic and Ancient Greek Ostraca - Audric-Charles Wannaz, So Miyagawa
16:30 - 17:30
The Social Lives of Literary Characters: Combining Citizen Science and Language Models to Understand Narrative Social Networks - Andrew Piper, Michael Xu, Derek Ruths
16:30 - 17:30
Multi-Word Expressions in Biomedical Abstracts and Their Plain English Adaptations - Sergei Bagdasarov, Elke Teich
16:30 - 17:30
Assessing the Performance of ChatGPT-4, Fine-Tuned BERT and Traditional ML Models on Moroccan Arabic Sentiment Analysis - Mohamed Hannani, Abdelhadi Soudi, Kristof Van Laerhoven
16:30 - 17:30
Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-Based User Embeddings - Mika Hämäläinen, Jack Rueter, Khalid Alnajjar
16:30 - 17:30
Corpus Development Based on Conflict Structures in the Security Field and LLM Bias Verification - Keito Inoshita
16:30 - 17:30
Generating Interpretations of Policy Announcements - Andreas Marfurt, Ashley Thornton, David Sylvan, James Henderson
16:30 - 17:30
Order Up! Micromanaging Inconsistencies in ChatGPT-4o Text Analyses - Erkki Mervaala, Ilona Kousa
16:30 - 17:30
CIPHE: A Framework for Document Cluster Interpretation and Precision from Human Exploration - Anton Eklund, Mona Forsman, Frank Drewes
16:30 - 17:30
Empowering Teachers with Usability-Oriented LLM-Based Tools for Digital Pedagogy - Melany Vanessa Macias, Lev Kharlashkin, Leo Einari Huovinen, Mika Hämäläinen
Organizers
Metropolia University of Applied Sciences
Waseda University
The University of Tsukuba / National Institute for Japanese Language and Linguistics (NINJAL)
F-Secure Oyj
Aarhus University
Program committee
Joshua Wilbur, University of Tartu
Stefania Degaetano-Ortlieb, Saarland University
Luke Gessler, University of Colorado Boulder
Leo Leppänen, University of Helsinki
Quan Duong, University of Helsinki
Iana Atanassova, University of Franche-Comté
Won Ik Cho, Samsung
Tyler Shoemaker, Dartmouth College
Jouni Tuominen, University of Helsinki
Enrique Manjavacas, Arevalo University of Leiden
Kenichi Iwatsuki, Mirai Translate
Matej Martinc, Jožef Stefan Institute
Maciej Janicki, University of Helsinki
Shuo Zhang, Bose
Aynat Rubinstein, The Hebrew University of Jerusalem
Frederik Arnold, Humboldt University of Berlin
Thibault Clerice, National Institute for Research in Digital Science and Technology
Nicolas Gutehrlé, University Bourgogne Franche-Comté
Lama Alqazlan, University of Warwick
Lidia Pivovarova, University of Helsinki
Balázs Indig, Eötvös Loránd University
Pierre Magistry, Institut national des langues et civilisations orientales
Yoshifumi Kawasaki, The University of Tokyo
Anna Dmitrieva, University of Helsinki
Antti Kanner, University of Helsinki
Maria Antoniak, Allen Institute for AI
Katerina Korre, University of Bologna
Daniela Teodorescu, University of Alberta
Dongqi Pu, Saarland University
Nils Hjortnaes, Indiana University Bloomington
Noémi Ligeti-Nagy, Hungarian Research Centre for Linguistics
Allison Lahnala, University of Bonn
Gabriel Simmons, University of California, Davis
Vilja Hulden, University of Colorado Boulder
Jaihyun Park, Nanyang Technological University
Jonne Sälevä, Brandeis University
Martin Ruskov, University of Milan
Youngsook Song, Sionic AI
Pascale Moreira, Aarhus University
Maciej Kurzynski, Lingnan University
Aatu Liimatta, University of Helsinki
Sourav Das, Indian Institute of Information Technology Kalyani
Sebastian Oliver Eck, University of Music Franz Liszt Weimar
Elissa Nakajima, Wickham Waseda University
Nicole Miu Takagi, Waseda University
Ken Kawamura, Revelata Inc
Bo Dang, San Francisco Bay University
Jack Rueter, University of Helsinki