highlights | Michael Färber's Research Group

Selected publications (curated). For the full list, see /publications/.

2026

PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs

Zhan Qu, Shuzhou Yuan, and Michael Färber

In AAAI, Singapore, 2026

For insiders: We build a framework for generating classical Chinese Songci that satisfies strict tone and rhyme constraints.

For everyone: Songci poetry has rules like sheet music, and we teach an AI to write it while staying on the right tones and rhymes, with a strict checker that keeps it honest.

Abstract PDF

This paper presents a systematic investigation into the constrained generation capabilities of large language models (LLMs) in producing Songci, a classical Chinese poetry form characterized by strict structural, tonal, and rhyme constraints defined by Cipai templates. We first develop a comprehensive, multi-faceted evaluation framework that includes: (i) a formal conformity score, (ii) automated quality assessment using LLMs, (iii) human evaluation, and (iv) classification-based probing tasks. Using this framework, we evaluate the generative performance of 18 LLMs, including 3 proprietary models and 15 open-source models across four families, under five prompting strategies: zero-shot, one-shot, completion-based, instruction-tuned, and chain-of-thought. Finally, we propose a Generate-Critic architecture in which the evaluation framework functions as an automated critic. Leveraging the critic’s feedback as a reward signal, we fine-tune three lightweight open-source LLMs via supervised fine-tuning (SFT), resulting in improvements of up to 5.88% in formal conformity. Our findings offer new insights into the generative strengths and limitations of LLMs in producing culturally significant and formally constrained literary texts.

2025

Paths to Causality: Finding Informative Subgraphs within Knowledge Graphs for Knowledge-Based Causal Discovery

Yuni Susanti, and Michael Färber

In KDD, 2025

For insiders: We propose a neurosymbolic method for knowledge-based causal discovery that selects relevant knowledge graph subgraphs to ground LLM prompting.

For everyone: We help AI reason about cause and effect by highlighting the most informative routes through a knowledge map, like handing a detective the best trail of clues.

Abstract DOI PDF

Inferring causal relationships between variable pairs is crucial for understanding multivariate interactions in complex systems. Knowledge-based causal discovery—which involves inferring causal relationships by reasoning over the metadata of variables (e.g., names or textual context)—offers a compelling alternative to traditional methods that rely on observational data. However, existing methods using Large Language Models (LLMs) often produce unstable and inconsistent results, compromising their reliability for causal inference. To address this, we introduce a novel approach that integrates Knowledge Graphs (KGs) with LLMs to enhance knowledge-based causal discovery. Our approach identifies informative metapath-based subgraphs within KGs and further refines their selection using Learning-to-Rank models. The top-ranked subgraphs are then incorporated into zero-shot prompts, improving the effectiveness of LLMs in inferring the causal relationship. Extensive experiments on biomedical and open-domain datasets demonstrate that our method outperforms most baselines by up to 44.8 points in F1 scores, evaluated across diverse LLMs and KGs. Our code and datasets are available on GitHub.

2024

Embedded Named Entity Recognition using Probing Classifiers

Nicholas Popovic, and Michael Färber

In EMNLP, Miami, FL, USA, 2024

For insiders: EMBER enables fast NER in decoder-only language models via probing, adding minimal overhead and avoiding destructive fine-tuning.

For everyone: We add “live labels” to a chatbot as it writes, like sticky notes appearing while you type instead of only after the text is finished.

Abstract PDF

Streaming text generation has become a common way of increasing the responsiveness of language model powered applications, such as chat assistants. At the same time, extracting semantic information from generated text is a useful tool for applications such as automated fact checking or retrieval augmented generation. Currently, this requires either separate models during inference, which increases computational cost, or destructive fine-tuning of the language model. Instead, we propose an approach called EMBER which enables streaming named entity recognition in decoder-only language models without fine-tuning them and while incurring minimal additional computational cost at inference time. Specifically, our experiments show that EMBER maintains high token generation rates, with only a negligible decrease in speed of around 1% compared to a 43.64% slowdown measured for a baseline. We make our code and data available online1, including a toolkit2 for training, testing, and deploying efficient token classification models optimized for streaming text generation.
GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network

Shuzhou Yuan, Ercong Nie, Michael Färber, and 2 more authors

In ACL, Bangkok, Thailand, 2024

For insiders: GNNavi guides information flow in prompt-based fine-tuning via a graph neural layer, improving few-shot learning while updating only a small fraction of parameters.

For everyone: GNNavi steers how information flows during prompting, like a traffic controller that routes signals to the right places so the model learns better.

Abstract DOI PDF

Large Language Models (LLMs) exhibit strong In-Context Learning (ICL) capabilities when prompts with demonstrations are used. However, fine-tuning still remains crucial to further enhance their adaptability. Prompt-based fine-tuning proves to be an effective fine-tuning method in low-data scenarios, but high demands on computing resources limit its practicality. We address this issue by introducing a prompt-based parameter-efficient finetuning (PEFT) approach. GNNAVI leverages insights into ICL’s information flow dynamics, which indicates that label words act in prompts as anchors for information propagation. GNNAVI employs a Graph Neural Network (GNN) layer to precisely guide the aggregation and distribution of information flow during the processing of prompts by hardwiring the desired information flow into the GNN. Our experiments on text classification tasks with GPT-2 and Llama2 show GNNAVI surpasses standard prompt-based fine-tuning methods in few-shot settings by updating just 0.2% to 0.5% of parameters. We compare GNNAVI with prevalent PEFT approaches, such as prefix tuning, LoRA and Adapter in terms of performance and efficiency. Our analysis reveals that GNNAVI enhances information flow and ensures a clear aggregation process.

2023

SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples

Michael Färber, David Lamprecht, Johan Krause, and 2 more authors

In ISWC, 2023

For insiders: We release SemOpenAlex, a scholarly knowledge graph with 26 billion triples, dumps, SPARQL access, and embeddings, enabling large-scale semantic science analytics and search.

For everyone: SemOpenAlex is an open “Google Maps for science”, built as a connected map of papers and authors so others can navigate research at web scale.

Abstract PDF

We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as recommending collaborators, publications, and venues, including explainability capabilities. Finally, SemOpenAlex can serve for RDF query optimization benchmarks, creating scholarly knowledge-guided language models, and as a hub for semantic scientific publishing.

2022

Few-Shot Document-Level Relation Extraction

Nicholas Popovic, and Michael Färber

In NAACL, Seattle, WA, USA, 2022

For insiders: We introduce a few-shot benchmark for document-level relation extraction and reveal challenges beyond sentence-level settings, including realistic NOTA behavior.

For everyone: We create a benchmark for extracting relationships from whole documents with only a few examples, like testing whether a student understood the full story and not just one line.

Abstract DOI PDF

We present FREDo, a few-shot document-level relation extraction (FSDLRE) benchmark. As opposed to existing benchmarks which are built on sentence-level relation extraction corpora, we argue that document-level corpora provide more realism, particularly regarding none-of-the-above (NOTA) distributions. Therefore, we propose a set of FSDLRE tasks and construct a benchmark based on two existing supervised learning data sets, DocRED and sciERC. We adapt the state-of-the-art sentence-level method MNAV to the document-level and develop it further for improved domain adaptation. We find FSDLRE to be a challenging setting with interesting new characteristics such as the ability to sample NOTA instances from the support set. The data, code, and trained models are available online.

2020

Citation Recommendation: Approaches and Datasets

Michael Färber, and Adam Jatowt

Int. J. Digit. Libr., 2020

For insiders: We survey citation recommendation methods and datasets and highlight evaluation pitfalls and open challenges for assisting scientific writing.

For everyone: This survey explains citation recommendation, like a GPS for references, and summarizes which datasets and tests are needed before such tools deserve trust.

Abstract DOI PDF

Citation recommendation addresses the task of recommending relevant citations for a given text. Owing to the rapid growth of published scientific literature on the one hand and the necessity of citing the most appropriate works when authoring scientific texts on the other hand, citation recommendation has emerged as an important research area. In recent years, numerous approaches and evaluation datasets have been proposed. However, to the best of our knowledge, no comprehensive literature survey has explicitly focused on citation recommendation. In this article, we provide a thorough introduction to automatic citation recommendation research. We then present an overview of existing approaches and datasets, identifying their differences and commonalities across multiple dimensions. Finally, we discuss evaluation methods, highlight general challenges in evaluation, and outline ways to address them. While we restrict our survey to citation recommendation for scientific publications, as this document type has been studied most extensively, many of the observations and discussions are also applicable to other text types, such as news articles and encyclopedic content.

2018

Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO

Michael Färber, Frederic Bartscherer, Carsten Menne, and 1 more author

Semantic Web, 2018

For insiders: We compare major knowledge graphs using a systematic data-quality framework and help practitioners choose the right graph for their application needs.

For everyone: We create a consumer-style test report for major knowledge graphs, comparing data quality so developers can choose the right “map of the world” for their use case.

Abstract DOI PDF

In recent years, several large, cross-domain, and openly available knowledge graphs (KGs) have been created, including DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Despite their widespread use, these knowledge graphs have not yet been subjected to a comprehensive comparative analysis. In this survey, we introduce a set of data quality criteria for systematically analyzing knowledge graphs and apply these criteria to compare the aforementioned KGs. In addition, we propose a framework to support the selection of the most suitable knowledge graph for a given application scenario.