Skip to content

Frankfurt-BigDataLab/RDF_DataQuality_Checking

Repository files navigation

RDF DataQuality Checking

programmed in Java (version 20) see pom file for dependencies

main class: GUI.DataCheck


Intro

The main idea of this tool is to execute pre-defined SPARQL queries (rules) against a SPARQL endpoint to identify data quality issues (inconsistencies within the data, missing values and outliers). The results of these SPARQL queries are compiled and listed in an Excel file (a spreadsheet for each query and an overview spreadsheet). In the resulting Excel file, domain experts can enter comments about the status of the found issue (could be no error or to identify reasons for inconsistencies or missing data). This commented Excel file can be used for the next run of the data quality check and the tool will retain: a) the date an issue was first reported and b) the comments made by the domain experts.

We implemented the tool based on RDF/SPARQL to allow better reuse. The tool can be used by anyone who makes their data available via a SPARQL endpoint. Groups sharing the same model can reuse and share their generated sPARQL queries. The tool comes with the rules we have generated for our Corpus Nummorum coin data, based on the Nomisma.org ontology.


Credits

The implementation of this tool was iniciated in order to increase the data quality of the data within the Corpus Nummorum (CN) project (https://www.corpus-nummorum.eu/).

The tool was supported by a number of theses:

  • Modernisierung eines Legacy-Systems zur Datenqualitätsprüfung und Entwicklung eines Testdatenmanagement-Tools im Kontext von Linked Open Data (Bachelor Thesis) – Anna-Lena Buccoli (https://zenodo.org/record/8403298)
  • Benutzerschnittstelle für die Erstellung von Datenqualit¨atsabfragen in SPARQL (Bachelor Thesis) – Elif Tugba Dichter und Vladyslav Matsuyev
  • Information Propagation across Versions in Context of a SPARQL-Rules System (Bachelor Thesis) – Kateryna Kvasnytsia

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages