The following papers were accepted at NLP4DH 2026, co-located with ACL 2026 in San Diego, California.
In Search of Lost Adventure Novels: Supervised Genre Retrieval and Corpus Refinement in Gallica
Jean Barré
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
Ahan Chatterjee, Matthias Schöffel, Matthias Aßenmacher, Marinus Wiedner, Esteban Garces Arias
Scaling Sentence Similarity for Classical Tibetan with Automatic Annotations
Shay Cohen, Jingyi Yang, Gal Rabinovitz, Sonam Choden, Ofir Shtrosberg, Nicola Bajetta, Goody Ben Horin, Rebecca Sundén, Omri Drori, Sonam Jamtsho, Dorji Wangchuk, Kfir Bar, Orna Almogi, Shai Fine
Computational Authorship Attribution in the Children's Tales of Oscar and Constance Wilde: The Case of "The Selfish Giant"
Liviu P Dinu, Alina Iacob, Cosmin Ciotlos
Perspectives -- Interactive Document Clustering for Qualitative Data Analysis
Tim Fischer, Chris Biemann
Directional Alignment and Narrative Agency in Human–LLM Co-Writing
Halfdan Nordahl Fundal, Yuri Bizzoni
**Exploring Topological Invariance in Semantic Embeddings
Fangzhou Gao, Justin Brody
Tracing Thematic Change in Early English-Language Science Fiction, 1818-1930
Jonathan Gordon
Fluency and Faithfulness in Human and Machine Literary Translation
Sarah Griebel, Ted Underwood
Artistic Interventions for NLP Annotation Challenges: The Stress Test of Machinic Glossolalia
Tyler Grimes, Marshall Washington
Between Whispers and Screams: Loudness Standard Deviation as a Proxy for Explicit Content Detection in US Romance Novels
Svenja Guhr
From OCR to Analysis: Tracking Correction Provenance in Digital Humanities Pipelines
Haoze Guo, Ziqi Wei
Register Mixing Is the Norm on the Web
Erik Henriksson, Alireza Razzaghi, Tuomas Lundberg, Antti Kanner, Veronika Laippala
Measuring Embedding Sensitivity to Authorial Style in French: Comparing Literary Texts with Language Model Rewritings
Benjamin Icard, Lila Sainero, Alice Breton, Evangelia Zve, Jean-Gabriel Ganascia
Frequency Accelerates Semantic Change: Evidence from 500 Years of Korean
Cheonkam Jeong, Yeeun Choi
Narrative Landscape: Mapping Narrative Dispositions Across LLMs
Donghoon Jung, Jiwoo Choi, Songeun Chae, Seohyon Jung
From Advocacy to Judgment: Training-Free Analytic Essay Scoring with Multi-Agent Debate and Exemplar Retrieval
Ali Keramati, Shiyuan Zhou, Sharad Mehrotra, Mark Warschauer
StoicLLM: Preference Optimization for Philosophical Alignment in Small Language Models
Ishmam Khan, Sindhuja Thogarrati, Shuo Zhang
Evaluating Open-Source LLMs for Text Summarization and Named Entity Recognition in Long, Unstructured Text
Pauline Kister, Miriam Schirmer
Twenty's Plenty: Semantic Scaffolding and Span Architecture for 19-Label NER in Medieval Latin Charters
Tamás Kovács, Giuseppe Consolo, Georg Vogeler
Never Care For What They Say? Platform vs Genre Rules in Online Horror Narratives (2007--2024)
Alexandre Lionnet-Rollin, Florian Cafiero
Quantifying Text Reuse Across Three Kṛṣṇa Yajurveda Recensions: Using Multi-Algorithm Computational Collation
So Miyagawa, Kyoko Amano, Yuzuki Tsukagoshi, Yuki Kyogoku
Bias Mitigation in Hiring-Related NLP: Interactions Between Masking, Rewriting, and Adversarial Debiasing
Alexandre Puttick, Rami El-Wazzi
Temporal Text Classification with Large Language Models
Nishat Raihan, Marcos Zampieri
Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining
Sebastian Reichbauer, Shu Okabe, Alexander Fraser
Unlocking Medieval Texts: How Large Language Models Transform POS Tagging for Historical Romance Languages
Matthias Schöffel, Esteban Garces Arias
Computational Modeling of Educational Theory in Low-Socioeconomic Contexts
Jadon Swearingen, Mustafa Ocal, Md Tarique Hasan Khan, Labiba Jahan
PHMartialLawNER: A Tagalog Named Entity Recognition Corpus for the Philippine Martial Law Era
Abdiel Clarence Tabuzo, Vladimir Gray Velazco, Cassandra Cabral, Moneah Shaila Lacsam, Charmaine Salvador Ponay
Modeling the "Dalet" Clitic in Historical Hebrew Texts: A New Prefix-Segmented BERT Model and Stylistic Analysis
Rachel Tal, Cheyn Shmuel Shmidman, Avi Shmidman
Statistical Structure in Indus Sign Sequences
Tanishk Tiwari
Data Contamination in Neural Hieroglyphic Translation: A Reproducibility Study
Ammar Toutou, Abdelrahman Harb, Christine Basta
Beyond Genre Categories: How Narrative Pattern Coherence and Spanning Distance Shape Film Success
Zhichao Wang, ZEYU LYU
Prompting the Past: Linguistic Transformations and Cultural Accuracy in AI-Generated Image Reconstructions for Multivocal Cultural Heritage
Ravini Wimalasuriya, Lea Krause, Gert-Jan Burgers
Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke
Yu Wu, Ananth Mahadevan, Filip Ginter, Michael Mathioudakis, Mikko Tolonen
100,000+ Movie Reviews from Kazakhstan: Russian, Kazakh, and Code-Switched Texts
Rustem Yeshpanov
Beyond Prompt-Sensitive Emotion Words: Stable Embeddings for Tang Poetry Analysis
Linyue Zhang, Feiyue Li