Skip to content

Column Definitions

Abdul Kasim edited this page Sep 6, 2024 · 1 revision

Column Definitions

In csvcubed qube config In csv-w if you were to write this tageting rdf qube vocabulary Cube RDF vocabulary
"data_type" "data_type" xsd:integer, xsd:decimal, or xsd:string
“type” “type” qb:DataSet is a dataset, qb:Observation refers to an individual observation, and qb:DimensionProperty or qb:MeasureProperty represents dimensions or measures.
“false” “suppress”
“label” “label” rdfs:label
"required" "required"
"from_existing" “propertyUrl”
“virtual”

The above table presents how different properties are used differently across csvcuved, csvw and RDF. Some properties may only exist in csvcubed, others only in csvw and so on. However, they may have properties which act in the same manner but with different names, as can be seen in the table above. Some properties exist in both and are used in the same manner but can also be used in different ways. How they are used will be explained in more detail below.

datatype

An atomic property that contains either a single string that is the main datatype of the values of the cell or a datatype description object. If the value of this property is a string, it must be the name of one of the built-in datatypes defined in section 5.11.1 Built-in Datatypes and this value is normalized to an object whose base property is the original string value. If it is an object then it describes a more specialized datatype. If a cell contains a sequence (i.e. the separator property is specified and not null) then this property specifies the datatype of each value within that sequence. The normalized value of this property becomes the datatype annotation for the described column.

In csvcubed's qube config, data_type is used to define the type of data expected in a measure or dimension. This could be a datatype, such as xsd:string, xsd:integer, or xsd:decimal, ensuring that the data is appropriately typed and can be validated correctly within the RDF Data Cube structure.

In CSVW, dataType is used within a column description to specify the type of data contained in that column. This is crucial for validating and transforming the data into RDF. For example, dataType could be set to number, string, or more complex RDF types like xsd:dateTime. When targeting RDF Data Cube vocabulary, specifying the correct dataType ensures that each observation or measure is appropriately typed according to the model.

propertyUrl

A URI template property that may be used to create a URI for a property if the table is mapped to another format. The value of this property becomes the property URL annotation for the described column and is used to create the value of the property URL annotation for the cells within that column as described in section 5.1.3 URI Template Properties.

propertyUrl is typically defined on a column description. If defined on a schema description, table description or table group description, care must be taken to ensure that transformed cell values maintain an appropriate semantic relationship, for example by including the name of the column in the generated URL by using _name in the template.

In CSVW, ‘propertyUrl’ is used to map a specific column to an RDF property. This means that the values in the column will be associated with the specified RDF property, such as qb:dimension or qb:measureType. This allows the data in the CSV to be accurately transformed into RDF triples, where each value in the column is linked to the RDF property defined by the propertyUrl.

valueUrl

A URI template property that is used to map the values of cells into URLs. The value of this property becomes the value URL annotation for the described column and is used to create the value of the value URL annotation for the cells within that column as described in section 5.1.3 URI Template Properties.

valueUrl is used within the csv-w to specify the template or pattern for generating URIs for the values in a column. This allows each value in the column to be mapped to a specific RDF resource. For example, if a column represents a dimension like "country," valueUrl could generate URIs that point to resources in a geographic ontology, ensuring that each country is linked to a consistent, dereferenceable RDF resource.

required

A boolean atomic property taking a single value which indicates whether the cell value can be null. See Parsing Cells in tabular-data-model for more details. The default is false, which means cells can have null values. The value of this property becomes the required annotation for the described column.

In csvcubed, in all instances all things other than attributes and observations must have a value but csvcubed makes exceptions for observations.

In CSVW, required is used within the JSON metadata for columns to specify whether a value must be present in every row of the CSV. If set to true, the column cannot have null or missing values.

virtual

A boolean atomic property taking a single value which indicates whether the column is a virtual column not present in the original source. The default value is false. The normalized value of this property becomes the virtual annotation for the described column. If present, a virtual column must appear after all other non-virtual column definitions.

In the context of CSVW targeting RDF Data Cube vocabulary, virtual is used within the JSON metadata to specify columns that are not present in the actual CSV file but are generated during the transformation process. These virtual columns are used to create additional RDF triples or to add necessary metadata without altering the original CSV file. For example, a virtual column could define a constant value for a qb:measureType across all rows.

type

Type exists in both. However, it is used differently. Within csvcubed it is used to define whether a column is an measure, attribute, dimension.

Within csvw “type” in the JSON metadata would be used to specify RDF classes relevant to the Data Cube, such as qb:DataSet, qb:Observation, qb:MeasureProperty, etc. This allows for accurate mapping of CSV data to RDF structures that conform to the RDF Data Cube model.

label

In CSVW, when targeting RDF Data Cube vocabulary, label is used within the JSON metadata to provide a human-readable name or description for a particular element, such as a column, table, or resource. This label is often mapped to the rdfs:label property in RDF, providing a clear, accessible name for the resource in the resulting RDF Data Cube.

In the Qube Config part of csvcubed, label is used to provide a human-readable name or description for various components of the dataset, such as dimensions, measures, attributes, or even the dataset itself. This label is often used to generate the rdfs:label property in the resulting RDF,

suppress

In CSVW, “suppress” is used within the JSON metadata to indicate that a specific column or component should not be included in the output RDF serialization.

false

In csvcubed, false is used as a Boolean value to indicate that a particular feature should be disabled or is not applicable. For example, it might be used to disable a certain validation check, to indicate that a dimension is not required, or to prevent the generation of certain metadata fields.

from_existing

from_existing is used to create new components (like dimensions, measures, or attributes) by referencing or reusing definitions from existing components. This allows for efficient configuration by leveraging pre-defined structures

Clone this wiki locally