Charles Explorer logo
🇬🇧

Reusable transformations of Data Cube Vocabulary datasets from the fiscal domain

Publication at Faculty of Mathematics and Physics |
2016

Abstract

Shared data models provide leverage for reusable data transformations. Common modelling patterns and data structures can make data transformations applicable to diverse datasets.

Similarly to data models, reusable data transformations promote separation of concerns, prevent duplication of effort, and reduce the time spent processing data. However, unlike data models, which can be shared as RDF vocabularies or ontologies, there is no well-established way of sharing data transformations.We propose a way to share data transformations as 'pipeline fragments' for LinkedPipes ETL (LP-ETL), which is an RDFbased data processing tool focused on RDF data.

We describe the features of LP-ETL that enable development of reusable transformations as pipeline fragments. Pipeline fragments are represented in RDF as JSON-LD files that can be shared directly or via dereferenceable IRIs.

We demonstrate the use of pipeline fragments on data transformations for fiscal data described by the Data Cube Vocabulary (DCV). We cover both generic transformations for any DCV-compliant data, such as DCV validation or DCV to CSV conversion, and transformations specific for the fiscal data used in the OpenBudgets.eu (OBEU) project, including conversion of Fiscal Data Package to RDF or normalization of monetary values.

The applicability of these transformations is shown on concrete use cases serving the goals of the OBEU project.