highlights

Selected publications with short, scannable takeaways.

Selected publications (curated). For the full list, see /publications/.

2026

  1. PoeTone: A Framework for Constrained Generation of Structured Chinese Songci with LLMs
    Zhan Qu, Shuzhou Yuan, and Michael Färber
    In AAAI, Singapore, 2026
    For insiders: We build a framework for generating classical Chinese Songci that satisfies strict tone and rhyme constraints.
    For everyone: Songci poetry has rules like sheet music, and we teach an AI to write it while staying on the right tones and rhymes, with a strict checker that keeps it honest.

2025

  1. Paths to Causality: Finding Informative Subgraphs within Knowledge Graphs for Knowledge-Based Causal Discovery
    Yuni Susanti, and Michael Färber
    In KDD, 2025
    For insiders: We propose a neurosymbolic method for knowledge-based causal discovery that selects relevant knowledge graph subgraphs to ground LLM prompting.
    For everyone: We help AI reason about cause and effect by highlighting the most informative routes through a knowledge map, like handing a detective the best trail of clues.

2024

  1. Embedded Named Entity Recognition using Probing Classifiers
    Nicholas Popovic, and Michael Färber
    In EMNLP, Miami, FL, USA, 2024
    For insiders: EMBER enables fast NER in decoder-only language models via probing, adding minimal overhead and avoiding destructive fine-tuning.
    For everyone: We add “live labels” to a chatbot as it writes, like sticky notes appearing while you type instead of only after the text is finished.
  2. GNNavi: Navigating the Information Flow in Large Language Models by Graph Neural Network
    Shuzhou Yuan, Ercong Nie, Michael Färber, and 2 more authors
    In ACL, Bangkok, Thailand, 2024
    For insiders: GNNavi guides information flow in prompt-based fine-tuning via a graph neural layer, improving few-shot learning while updating only a small fraction of parameters.
    For everyone: GNNavi steers how information flows during prompting, like a traffic controller that routes signals to the right places so the model learns better.

2023

  1. SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
    Michael Färber, David Lamprecht, Johan Krause, and 2 more authors
    In ISWC, 2023
    For insiders: We release SemOpenAlex, a scholarly knowledge graph with 26 billion triples, dumps, SPARQL access, and embeddings, enabling large-scale semantic science analytics and search.
    For everyone: SemOpenAlex is an open “Google Maps for science”, built as a connected map of papers and authors so others can navigate research at web scale.

2022

  1. Few-Shot Document-Level Relation Extraction
    Nicholas Popovic, and Michael Färber
    In NAACL, Seattle, WA, USA, 2022
    For insiders: We introduce a few-shot benchmark for document-level relation extraction and reveal challenges beyond sentence-level settings, including realistic NOTA behavior.
    For everyone: We create a benchmark for extracting relationships from whole documents with only a few examples, like testing whether a student understood the full story and not just one line.

2020

  1. Citation Recommendation: Approaches and Datasets
    Michael Färber, and Adam Jatowt
    Int. J. Digit. Libr., 2020
    For insiders: We survey citation recommendation methods and datasets and highlight evaluation pitfalls and open challenges for assisting scientific writing.
    For everyone: This survey explains citation recommendation, like a GPS for references, and summarizes which datasets and tests are needed before such tools deserve trust.

2018

  1. Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO
    Michael Färber, Frederic Bartscherer, Carsten Menne, and 1 more author
    Semantic Web, 2018
    For insiders: We compare major knowledge graphs using a systematic data-quality framework and help practitioners choose the right graph for their application needs.
    For everyone: We create a consumer-style test report for major knowledge graphs, comparing data quality so developers can choose the right “map of the world” for their use case.