Downloads 2016-10

This pages provides all DBpedia datasets as links to files in bzip2 compression. The DBpedia datasets are licensed under the terms of the Creative Commons Attribution-ShareAlike License and the GNU Free Documentation License
In addition to the RDF version of the data, we also provide a tabular version of some of the core DBpedia data sets as CSV and JSON files. See DBpediaAsTables.

See also the change log for recent changes and developments.

Contents

1. Wikipedia Input Files

All xml source files of Wikipedia are now hosted alongside the extracted data and can be found in the dataset table as 'pages-articles' dataset.

All datasets were extracted from these Wikipedia dump files, generated in October 2016. Please refer to to DataIDs of this release for more information about the source datasets.

2. Ontology

The ontology version used while extracting all datasets can be downloaded here:

 

 

3. Datasets

 

The following table provides all datasets extracted by the extraction framework for every Wikipedia language with more than 10.000 articles.
Select the languages you are interested in on the top of the table, filter the list of datasets with the search function.
Click on the dataset names to obtain additional information. Click on the question mark next to a download link to preview file contents.

We provide all datasets in two serializations:

  • turtle (ttl): provides data in n-triple format (<subject>  <predicate>  <object> .) as a subset of turtle serialization
  • quad-turtle (tql): the quad turtle serialization (<subject>  <predicate>  <object> <graph/context>.) adds context information to every triple, containing the graph name and provenance information on each triple.

Some datasets are available in two versions:

  • localized: These datasets contain triples extracted from the respective Wikipedia, including the ones whose URIs do not have an equivalent Wikidata entry.
  • cannonicalized: These datasets contain triples extracted from the respective Wikipedia whose subject and object resource have an equivalent Wikidata entry (marked with an *). Using these datasets provides the same unique subject URI over any language edition for the same entity.
 
 

 

4. Links to other datasets

 

A table with all links will appear here in a short while. In the meantime please refer directly to the download directory to get all link-sets of this release:

Links

 

5. NLP Datasets

 

DBpedia also includes a number of NLP Datasets - datasets specifically targeted at supporting Computational Linguistics and Natural Language Processing (NLP) tasks. Among those, we highlight the Lexicalization Dataset, Topic Signatures, Thematic Concepts and Grammatical Genders.

With this release DBpedia also includes three large datasets in the Natural Language Interchange Format (NIF), containing the entire text of the wiki-page. The exact content of these three datasets is exemplified below:

nif-context.ttl

The full text of a wiki page as the context for all subsequent information about this page.

dbr:Anthropology?dbpv=2016-04&nif=context     a     nif:#Context .

dbr:Anthropology?dbpv=2016-04&nif=context    nif:isString    "Anthropology is the study of humanity. Its main subdivisions are social anthropology and cultural anthropology, which describes the workings of societies around the world, linguistic anthropology, which investigates the influence of language in social life, and biological or physical anthropology, which concerns long-term development of the human organism. Archaeology, which studies past human cultures through investigation of physical evidence, is thought of as a branch of anthropology in the United States, although in Europe, it is viewed as a discipline in its own right, or grouped under related disciplines such as history." .

dbr:Anthropology?dbpv=2016-04&nif=context    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:endIndex      "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:sourceUrl     <http://en.wikipedia.org/wiki/Anthropology> .
dbr:Anthropology?dbpv=2016-04&nif=context    nif:predLang     <http://lexvo.org/id/iso639-3/eng> .

nif-page-structure​.ttl

The structure of the wiki page as nif:Structure instances, such as Section, Paragraph and Title.

dbr:Anthropology?dbpv=2016-04&nif=context    nif:hasSection    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

dbr:Anthropology?dbpv=2016-04&nif=section_0_634    a    nif:Section    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:hasParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:firstParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    .
dbr:Anthropology?dbpv=2016-04&nif=section_0_634    nif:lastParagraph    dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_63    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:beginIndex    "0"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:endIndex    "330"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_330    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    a    nif:Paragraph    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:beginIndex    "331"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:endIndex    "634"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context    .
dbr:Anthropology?dbpv=2016-04&nif=paragraph_331_634    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=section_0_634    .

 

nif-text-links.ttl

All in-text links of a wiki page as nif:Word or nif:Phrase.

dbr:Anthropology?dbpv=2016-04&nif=word_29_37    a    nif:Word .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:beginIndex    "29"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:endIndex    "37"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Human .
dbr:Anthropology?dbpv=2016-04&nif=word_29_37    nif:anchorOf    "humanity" .

dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    a    nif:Phrase    .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:referenceContext    dbr:Anthropology?dbpv=2016-04&nif=context .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:beginIndex    "65"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:endIndex    "84"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger> .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:superString    dbr:Anthropology?dbpv=2016-04&nif=paragraph_0_634 .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    <http://www.w3.org/2005/11/its/rdf#taIdentRef>    dbr:Social_anthropology .
dbr:Anthropology?dbpv=2016-04&nif=phrase_65_84    nif:anchorOf    "social anthropology" .


6. Dataset Metadata as DataIDs

 

Starting with the release 2016-04 we provide extensive dataset metadata by adding DataIDs for all extracted languages to the respective language directories. Use these files to gather additional information about the Datasets and the files which represent them. 
A dcat:Catalog file (ttl, json-ld) pointing to all DataIDs (via dcat:record) can be found in the root folder of this release.


7. Previous versions of DBpedia