datasets
Selected datasets and resources from our group.
Selected datasets and resources (curated). For context and publications, see /publications/ and /projects/.
- Knowledge Graph · 2025 · Open Science, GitHub, RDF, Knowledge GraphRDF KG of ~200k GitHub repos linked to scientific papers (incl. SPARQL endpoint + ontology).License: CC BY 4.0
- Dataset · 2025 · Hugging Face, Model Metadata, LLMsStructured dataset of 10,000+ Hugging Face NLP models with enriched metadata.License: CC BY 4.0Links:Zenodo
- GML Dataset · 2024 · Graph ML, Benchmark, RDFHeterogeneous graph ML dataset derived from the LinkedMDB RDF KG (AutoRDF2GML).License: CC BY 4.0Links:Zenodo
- GML Dataset · 2024 · Graph ML, Benchmark, RDFHeterogeneous graph ML dataset derived from the AIFB RDF KG (AutoRDF2GML).License: CC BY 4.0Links:Zenodo
- Knowledge Graph · 2024 · Machine Learning, RDF, Knowledge GraphRDF KG for ~400k ML papers incl. tasks, datasets, methods, evaluations, and results.License: CC BY 4.0
- Knowledge Graph · 2023 · RDF, Knowledge Graph, SemOpenAlexSubset of SemOpenAlex focusing on the Semantic Web community (with embeddings).Links:Zenodo
- GML Dataset · 2023 · Graph ML, Benchmark, SemOpenAlexHeterogeneous graph ML benchmark derived from SemOpenAlex-SemanticWeb.License: CC0Links:Zenodo
- GML Dataset · 2023 · Graph ML, Benchmark, LPWCHeterogeneous graph ML benchmark derived from LPWC (papers/datasets/tasks/methods + features).License: CC BY-SA 4.0Links:Zenodo
- Dataset · 2023 · Scholarly Full Text, NLP, CitationsStructured full-text corpus from arXiv (open subset: permissively licensed papers).License: See arXiv licenses
- Knowledge Graph · 2023 · Scholarly Data, RDF, Knowledge GraphMassive scholarly RDF knowledge graph (publications, authors, venues, concepts).License: CC0
- Dataset · 2021 · NLP, Citations, Low-resourceSynthetic + manually annotated references for citation field extraction in Cyrillic scripts.Links:Zenodo
- Knowledge Graph · 2021 · Scholarly Data, RDF, Knowledge GraphMassive RDF knowledge graph of scholarly metadata.
- Knowledge Graph · 2021 · Datasets, RDF, Knowledge GraphRDF KG of datasets linked to publications that mention them.
- Dataset · 2020 · Linked Data, NLP, Media, BrexitSemantically enriched cross-media dataset for Brexit (news, social media, TV; multilingual).Links:Zenodo
- Dataset · 2020 · NLP, Media Bias, CrowdsourcingSentence- and article-level bias annotations for news (Ukraine crisis; multi-dimensional labels).License: CC BY-NC 4.0Links:Zenodo
- Knowledge Graph · 2019 · Neural Networks, Metadata, RDF, Knowledge GraphRDF dataset with metadata about neural networks (FAIRnets KG + search).License: CC BY 4.0Links:Zenodo · Search system
- Software · 2018 · Linked Data, API, Knowledge GraphSource code for a Linked Data API (wrapper) for Crunchbase.License: CC BY 4.0Links:Zenodo
- Dataset · 2016 · Linked Data, Knowledge Graph, StartupsRDF dataset snapshot of Crunchbase (jobs, websites, organizations, people, products, acquisitions).License: CC BY-NC 4.0 / CC BY 4.0 (historic snapshot; see record note)Links:DOI
- Dataset · 2014 · Linked Data, NLP, Cross-lingualCross-lingual Linked Data lexica (RDF) for multilingual/cross-lingual information access.Links:Zenodo