RDF DataQuality Checking

programmed in Java (version 20) see pom file for dependencies

main class: GUI.DataCheck

Intro

The main idea of this tool is to execute pre-defined SPARQL queries (rules) against a SPARQL endpoint to identify data quality issues (inconsistencies within the data, missing values and outliers). The results of these SPARQL queries are compiled and listed in an Excel file (a spreadsheet for each query and an overview spreadsheet). In the resulting Excel file, domain experts can enter comments about the status of the found issue (could be no error or to identify reasons for inconsistencies or missing data). This commented Excel file can be used for the next run of the data quality check and the tool will retain: a) the date an issue was first reported and b) the comments made by the domain experts.

We implemented the tool based on RDF/SPARQL to allow better reuse. The tool can be used by anyone who makes their data available via a SPARQL endpoint. Groups sharing the same model can reuse and share their generated sPARQL queries. The tool comes with the rules we have generated for our Corpus Nummorum coin data, based on the Nomisma.org ontology.

Credits

The implementation of this tool was iniciated in order to increase the data quality of the data within the Corpus Nummorum (CN) project (https://www.corpus-nummorum.eu/).

The tool was supported by a number of theses:

Modernisierung eines Legacy-Systems zur Datenqualitätsprüfung und Entwicklung eines Testdatenmanagement-Tools im Kontext von Linked Open Data (Bachelor Thesis) – Anna-Lena Buccoli (https://zenodo.org/record/8403298)
Benutzerschnittstelle für die Erstellung von Datenqualit¨atsabfragen in SPARQL (Bachelor Thesis) – Elif Tugba Dichter und Vladyslav Matsuyev
Information Propagation across Versions in Context of a SPARQL-Rules System (Bachelor Thesis) – Kateryna Kvasnytsia

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
src/main/java		src/main/java
.gitignore		.gitignore
2024_10_16_DQ_Tool_manual.pdf		2024_10_16_DQ_Tool_manual.pdf
LICENCE.md		LICENCE.md
README.md		README.md
pom.xml		pom.xml
ruleObjects		ruleObjects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RDF DataQuality Checking

Intro

Credits

About

Uh oh!

Releases

Packages

Languages

License

Frankfurt-BigDataLab/RDF_DataQuality_Checking

Folders and files

Latest commit

History

Repository files navigation

RDF DataQuality Checking

Intro

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages