# DBpedia model artifact
Contains models of DBpedia Spotlight for different languages.
DBpedia model artifact version 2022.03.01
+ ## Update, September 2020
This update generates and upload to dbpedia databus the artifacts of dbpedia-models and wikistatsextractor for all langauges.
Contains some fixes from [Klaus82](https://github.com/dbpedia-spotlight/model-quickstarter/pull/20)
The update is available in [Julio-Noe's Github repo](https://github.com/Julio-Noe/model-quickstarter) and will be merged soon into the spotlight repo.
## wikistatextractor (spotlight-wikistats)
The [wikistatsextractor](https://github.com/dbpedia-spotlight/wikistatsextractor/) extracts statistics statistics from Wikipedia Dump files. It extracts the same 4 files initially produced by pignlproc for dbpedia spotlight.
File | Long name | Line Format
------------- | ------------- | -------------
uriCount | Articles counts | ```
\t\t```
pairCount | Pair-wise occurrence counts | ```\t\t```
sfAndTotalCount | Surface forms counts | ```\t\t```
tokenCount | Context vectors | ```\t{(context_token1,count1),(context_token2, count2),... }```
These files are now available to download in the Databus spotlight-wikistats repository. A short description of each file is as follows (examples could be found in the [DBpedia wiki](https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Raw-data)):
- uriCount: Contains the number of mentions for each article
- pairCount: Defines the number of times a surface form (plain text) was used to reference a resource (DBpedia URI)
- sfAndTotalCount: Establish how many times a surface form was used as an anchor and how many times it occurred in text (regardless of wheter that was part of an anchor).
### model-quickstarter (spotlight-model)
The dbpedia-models was generated with the [DBpedia model-quickstarter](https://github.com/dbpedia-spotlight/model-quickstarter) tool.
## Citation
If you use the current (statistical version) of DBpedia Spotlight or the data/models created using this repository, please cite the following paper.
```bibtex
@inproceedings{isem2013daiber,
title = {Improving Efficiency and Accuracy in Multilingual Entity Extraction},
author = {Joachim Daiber and Max Jakob and Chris Hokamp and Pablo N. Mendes},
year = {2013},
booktitle = {Proceedings of the 9th International Conference on Semantic Systems (I-Semantics)}
}
```