-
Notifications
You must be signed in to change notification settings - Fork 9
Description
My wish is that I be able to supply a fully-resolved schema document to Gen3 deployment config dictionary_url that looks like this:
{
"node1": {...},
"node2": {...},
}where by "fully resolved" I mean it contains no $ref expressions, need not contain any magic keys like _definitions.yaml or _terms.yaml, and all top-level keys correspond to data nodes.
I've been playing with dictionaryutils and seem to find that although it creates a fully resolved schema of this nature, it cannot be initialized in this way — and the reason it would be super nice to be able to do that is that such a doc can be easily generated with a number of data modelling systems that don't know anything about Gen3, e.g. LinkML or Hackolade.
Currently if I model in those frameworks I have to painstakingly back out my generic JSON schema document into a bespoke input format
{
"node1.yaml": {...},
"node2.yaml": {...},
"_definitions.yaml": {...},
"_terms.yaml": {...},
}just so that it passes the DataDictionary.__init__, only to be resolved on the way out in the DataDictionary.schema
To be clear, this is nothing about the Gen3 data model e.g. internal custom structure like links, but only about what format is considered valid as input for the dictionary.
I have noted this here in dictionaryutils , tho I guess for it to be operationally useful there would also need to be an alternative configuration point added to the deployment config, something like dictionary_resolved_url, so that datamodelutils, peregrine etc would know what they are starting with.
I am happy to contribute some time to the work required, if it is considered worthwhile.