Best Practice | Benefits | DBpedia with DataID | Statement |
---|---|---|---|
1. Provide metadata Provide metadata for both human users and computer applications. |
|
central concept of DataID | |
2. Provide
descriptive metadata
Provide metadata that describes the overall features of datasets and distributions. |
|
central concept of DataID | |
3. Provide structural metadata Provide metadata that describes the schema and internal structure of a distribution. |
|
using void:vocabulary points out the DBpedia ontology in use | |
4. Provide data license information Provide a link to or copy of the license agreement that controls use of the data. |
|
licensing of data can be provided via dct:license and odrl:Policy instances on Dataset and Distribution level (in the case of DBpedia: http://purl.oclc.org/NET/rdflicense/cc-by-sa3.0) | |
5. Provide data provenance information Provide complete information about the origins of the data and any changes you have made. |
|
central concept of DataID; for DBpedia: complete record of involved Agents and source Dataset | |
6. Provide data quality information Provide information about data quality and fitness for particular purposes. |
|
not supported by DataID core - one of the DataID extension ontologies will cover DQ by importing DQV | |
7. Provide a version indicator Assign and indicate a version number or date for each dataset. |
|
version numbers are provided in the query of the Dataid/Dataset/Distributon uri, without that parameter we reference the latest version of the resource (we have pointers for all prov:Entities for next, prev. and latest version) | |
8. Provide version history Provide a complete version history that explains the changes made in each version. |
|
is indirectly provided by the diff between various DataIDs and general documentation of new releases | |
9. Use persistent URIs as identifiers of datasets Identify each dataset by a carefully chosen, persistent URI. |
|
true; URIs defined in a DataID graph | |
10. Use persistent URIs as identifiers within datasets Reuse other people's URIs as identifiers within datasets where possible. |
|
DBpedia resource uris | |
11. Assign URIs to dataset versions and series Assign URIs to individual versions of datasets as well as to the overall series. |
|
see: Provide a version indicator | |
12. Use machine-readable standardized data formats Make data available in a machine-readable, standardized data format that is well suited to its intended or potential use. |
|
DBpedia is published as linked data in RDF | |
13. Use locale-neutral data representations Use locale-neutral data structures and values, or, where that is not possible, provide metadata about the locale used by data values. |
|
partially true for DBpedia (e.g. dates) | |
14. Provide data in multiple formats Make data available in multiple formats when more than one format suits its intended or potential use. |
|
DBpedia is published in multiple RDF serializations & on a public SPARQL endpoint | |
15. Reuse vocabularies, preferably standardized ones Use terms from shared vocabularies, preferably standardized ones, to encode data and metadata. |
|
DBpedia: rdfs, dct and others | |
16. Choose the right formalization level Opt for a level of formal semantics that fits both data and the most likely applications. |
|
Difficult to address, since DBpedia is a community effort. In general we try to keep the DBpedia ontology as shallow as possible. | |
17. Provide bulk download Enable consumers to retrieve the full dataset with a single request. |
|
true for sub-datasets; whole language editions can not be collected with one click | |
18. Provide Subsets for Large Datasets If your dataset is large, enable users and applications to readily work with useful subsets of your data. |
|
true: DataIDs are structured in 'Main Datasets' for each DBpedia language edition containing multiple sub datasets. | |
19. Use content negotiation for serving data available in multiple formats Use content negotiation in addition to file extensions for serving data available in multiple formats. |
|
Yes as far as the official endpoint is concerned. | |
20. Provide real-time access When data is produced in real time, make it available on the Web in real time or near real-time. |
|
Provided, when it comes to DBpedia-live. The official DBpedia releases are snap shots of data. | |
21. Provide data up to date Make data available in an up-to-date manner, and make the update frequency explicit. |
|
see: Provide real-time access | |
22. Provide an explanation for data that is not available For data that is not available, provide an explanation about how the data can be accessed and who can access it. |
|
The primary data provided are static dump files, which should always be accessible, for every release. Resources The data not represented in the public endpoint is not accounted for its absence there. | |
23. Make data available through an API For data that is not available, provide an explanation about how the data can be accessed and who can access it. |
|
Some of the data (mostly from the english language edition) is available via the official SPARQL endpoint of DBpedia. | |
24. Use Web Standards as the foundation of APIs Provide complete information on the Web about your API. Update documentation as you add features or make changes. |
|
true: Sparql endpoint sponsored by Open Link | |
25. Provide complete documentation for your API Provide complete information on the Web about your API. Update documentation as you add features or make changes. |
|
outside of scope for DBpedia; The official endpoint conforms to SPARQL 1.1. and the api documentation is provided by Open Link, the provider of the endpoint. | |
26. Avoid Breaking Changes to Your API Avoid changes to your API that break client code, and communicate any changes in your API to your developers when evolution happens. |
|
outside of scope for DBpedia; since the official DBpedia endpoint is following the SPARQL 1.1. specification, this should not be the case | |
27. Preserve identifiers When removing data from the Web, preserve the identifier and provide information about the archived resource. |
|
DBpedia follows Wikipedia when it comes to deleted wiki pages, providing dbo:redirect, pointing out the resource Wikipedia is redirecting to. The identifier itself is preserved. | |
28. Assess dataset coverage Assess the coverage of a dataset prior to its preservation. |
|
difficult to realize | |
29. Gather feedback from
data consumers Provide a readily discoverable means for consumers to offer feedback. |
|
DBpedia is in the process of providing a triple level Feedback Loop. At the moment Feedback is collected via multiple mailing lists. | |
30. Make feedback available Make consumer feedback about datasets and distributions publicly available. |
|
All current and future means of feedback will be readily available for anyone. | |
31. Enrich data by generating new data Enrich your data by generating new data when doing so will enhance its value. |
|
New data is been generated, for example based on NLP algorithms on the Wikipage texts. | |
32. Provide Complementary Presentations Enrich data by presenting it in complementary, immediately informative ways, such as visualizations, tables, Web applications, or summaries. |
|
This is a task for the DBpedia community. We do provide DBpedia releases as tables though. | |
33. Provide Feedback to the Original Publisher Let the original publisher know when you are reusing their data. If you find an error or have suggestions or compliments, let them know. |
|
Difficult to extend the feedback loop to Wikipedia editors. | |
34. Follow Licensing Terms Find and follow the licensing requirements from the original publisher of the dataset. |
|
We are following the licenses in place by Wikipedia. | |
35. Cite the Original Publication Acknowledge the source of your data in metadata. If you provide a user interface, include the citation visibly in the interface. |
|
We point out the original source in the dataset metadata (orig. XML dump), as well as triple level (orig. Wikipedia page). |