Selected Publications
Selected publications with short summaries.
Below is a curated list of 10 of my publications (listed in reverse chronological order), each with a short summary highlighting its impact.
A full list of publications is available here.
2026
The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models
K. Löhr, S. Yuan, M. Färber
EACL 2026
📄 Read the paper
This paper studies political bias and stereotype propagation in major LLMs. Beyond explicit persona prompting, it shows that multilingual language variation can reveal even stronger implicit stereotypes. The work highlights the societal relevance of trustworthy AI and the need for more careful evaluation of LLM behavior.
2025
Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery
Y. Susanti, M. Färber
KDD 2025
📄 Read the paper
While traditional methods rely on observational data, knowledge-based causal discovery uses metadata (like variable names or context) to infer causality – a promising but currently unreliable approach when using LLMs alone. To improve stability and accuracy, we propose a method that combines LLMs with Knowledge Graphs. By identifying informative metapath-based subgraphs and ranking them using a Learning-to-Rank model, our approach improves zero-shot LLM prompts for causal inference.
SQuAI: Scientific Question-Answering with Multi-Agent Retrieval-Augmented Generation
I. Besrour, J. He, T. Schreieder, M. Färber
CIKM 2025
📄 Read the paper
SQuAI is a multi-agent retrieval-augmented generation framework for scientific question answering. It combines decomposition, retrieval, filtering, and citation-grounded answer generation to provide more faithful and transparent answers over large scholarly corpora. The system reflects my current work on trustworthy LLMs and AI for science.
ComplexTempQA: A 100m Dataset for Complex Temporal Question Answering
R. Gruber, A. Abdallah, M. Färber, A. Jatowt
EMNLP 2025
📄 Read the paper
ComplexTempQA introduces a large-scale benchmark for temporal question answering with over 100 million question-answer pairs. It substantially expands the scale and scope of temporal QA resources and supports research on more realistic, time-aware AI systems.
2024
Embedded Named Entity Recognition using Probing Classifiers
N. Popovič, M. Färber
EMNLP 2024
📄 Read the paper
Streaming text generation enhances the responsiveness of language model applications like chat assistants, while real-time semantic extraction – such as named entity recognition (NER) – is valuable for tasks like fact-checking and retrieval-augmented generation. To address this, we introduce EMBER, a method for streaming NER in decoder-only language models that avoids fine-tuning and adds minimal inference overhead.
Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-Based Causal Discovery
Y. Susanti, M. Färber
ISWC 2024
📄 Read the paper
We introduce a method for extracting causal relationships from text by integrating small language models (under 1B parameters) with knowledge graphs via prompt-based learning. The approach outperforms larger models more efficiently and at lower cost, highlighting the power of combining knowledge graphs with compact AI models.
GNNAVI: Navigating the Information Flow in Large Language Models by Graph Neural Network
S. Yuan, E. Nie, M. Färber, H. Schmid, H. Schütze
Findings of ACL 2024
📄 Read the paper
This paper proposes a parameter-efficient fine-tuning technique for large language models using Graph Neural Networks (GNNs) to improve information flow. GNNAVI achieves state-of-the-art accuracy in few-shot tasks while updating less than 0.5% of model parameters, demonstrating high efficiency and scalability.
2023
SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
M. Färber, D. Lamprecht, J. Krause, L. Aung, P. Haase
ISWC 2023 – Best Paper Award
📄 Read the paper
SemOpenAlex is one of the largest scholarly knowledge graphs, with 26 billion RDF triples. It provides open access to global research metadata and supports scientific discovery, interdisciplinary research, and large-scale knowledge integration. The paper received the ISWC Best Paper Award.
2022
Few-Shot Document-Level Relation Extraction
N. Popovič, M. Färber
NAACL 2022
📄 Read the paper
This paper defines a new benchmark for few-shot learning in document-level relation extraction. It addresses the lack of scalable, domain-specific datasets and introduces a strategy for building more realistic NLP benchmarks in low-resource settings.
2019
The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data
M. Färber
ISWC 2019
📄 Read the paper
As the sole author, I introduced the largest open scholarly knowledge graph at the time, with over 8 billion triples. This dataset has since supported numerous research and industry applications.
2018
Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO
M. Färber, F. Bartscherer, C. Menne, A. Rettinger
Semantic Web Journal, 2018 – Outstanding Paper Award
📄 Read the paper
This highly cited journal article presents a framework for evaluating the quality of major knowledge graphs. It remains a foundational reference for semantic web research and linked data quality assessment, and it received the Outstanding Paper Award of the Semantic Web Journal.