Skip to content

Conversation

@skinkie
Copy link
Contributor

@skinkie skinkie commented Oct 7, 2025

Description
This is the first working attempt to add additional information to NeTEx references, enabling access to the class that each reference points to. The implementation processes all VersionOfObjectRefStructure variants and builds a hierarchical tree of them. This allows defining defaults — for example, a ScheduledStopPointRef normally points to a ScheduledStopPoint.
If it instead points to a FareScheduledStopPoint, this is validated and allowed, because FareScheduledStopPoint is a descendant of ScheduledStopPoint. The reverse, however, is not allowed: a TimingPoint cannot be referenced by a ScheduledStopPointRef.

There are several issues that still need to be addressed. Therefore, this implementation deliberately takes an incremental approach rather than aiming for a 100% complete solution, especially since the NeTEx XML Schema itself is not fully consistent or correct.

How to review
The individual commits should be review stepping stones, then you understand why what happens.

  1. The big issue with the current approach was that there wasn't a uniform way how reference-classes were implemented. My suggestion for documentation (hint: @TuThoThai) is that we describe why reference classes should virtually always be a restriction, embedded in simpleContent. We don't extend references (ref, version, nameofrefclass, uri, etc.) but we restrict their types. Using extension element anywhere in this graph makes the leaf impossible. So convert the few remaining extensions, to restrictions like the rest. 237fe94

  2. For some reasons (let it be named: everything was done by hand, no validations in place, merge conflicts) we ended up with references to classes that do not exist, and maybe never did. NeTEx 3.0 will remove that stuff, but for now, we don't break what is in there, but also not let it break us. 76a7648

  3. Next step goes one step further, and just removes the default attributes on all places that suggests a NameOfClass that was, or has never been a class. For complaints: fix this stuff yourself, nobody could use it anyway. 477677c

  4. My personal believe is that NameOfClass in any case has been very under defined. Was the meaning "all elements", was it just EntityInFrame, nobody knows, but considering its use all was mixed. This commit creates the NameOfClass structure automatically. And fixes all examples, including crazy profile definitions that were never matching the NameOfClass definition in the first place. f7cd256

  5. Last step. The actual implementation of applying nameOfRefClass to all references that did not complain, and fix all the examples that were invalid. We should have a discussion on the excludelist.

How it works
The complex-types-included file is generated by a script that computes all *Ref elements ultimately derived from VersionOfObjectRefStructure. For these elements, all associated complex types are extracted and stored in complex-types-included. There is also a negative list, complex-types-excluded. During development of this feature, it became clear that several complex types or cascades caused issues; instead of fixing hundreds of validation errors immediately, this list allows us to focus on the positive results of the working cases first.

In addition to the exclusion list, the script/nameofclass.py logic skips variants for which no clear relationship can be found between a reference and its corresponding class.

Given the number of references now correctly resolved, it is clear how valuable this approach is and why it should be further pursued to reduce the exclusion list over time.

Urgency: High

@TuThoThai I would like you to make sure that this pull request is reviewed and merged as soon as possible. My preference would be regular merge (non-squash).

@skinkie skinkie self-assigned this Oct 7, 2025
@skinkie skinkie added the hygiene Technical dept, results in a breaking change. label Oct 7, 2025
This is the first working attempt to add additional information to NeTEx references,
enabling access to the class that each reference points to. The implementation
processes all VersionOfObjectRefStructure variants and builds a hierarchical tree of
them. This allows defining defaults, for example, a ScheduledStopPointRef normally
points to a ScheduledStopPoint. If it instead points to a FareScheduledStopPoint,
this is validated and allowed, because FareScheduledStopPoint is a descendant of
ScheduledStopPoint. The reverse, however, is not allowed: a TimingPoint cannot be
referenced by a ScheduledStopPointRef.

There are several issues that still need to be addressed. Therefore, this
implementation deliberately takes an incremental approach rather than aiming for
a 100% complete solution, especially since the NeTEx XML Schema itself is not
fully consistent or correct.

The complex-types-included file is generated by a script that computes all *Ref
elements ultimately derived from VersionOfObjectRefStructure. For these elements,
all associated complex types are extracted and stored in complex-types-included.
There is also a negative list, complex-types-excluded. During development of this
feature, it became clear that several complex types or cascades caused issues;
instead of fixing hundreds of validation errors immediately, this list allows us
to focus on the positive results of the working cases first.

In addition to the exclusion list, the script/nameofclass.py logic skips variants
for which no clear relationship can be found between a reference and its
corresponding class.

Given the number of references now correctly resolved, it is clear how valuable
this approach is and why it should be further pursued to reduce the exclusion
list over time.
@skinkie skinkie changed the title Transform all references having a extension to a regular restriction First referencing class constraints Oct 8, 2025
@skinkie skinkie added this to the netex_2.0 milestone Oct 8, 2025
@skinkie skinkie marked this pull request as ready for review October 8, 2025 00:07
@TuThoThai
Copy link
Collaborator

@skinkie
I did not have time (yet) to make a thorough review of this PR.
However, one thing: some xxxRef have been added in NeTEx to "systemize" some "free text", for example: <TypeOfFareStructureFactorRef ref="BetweenFirstAndLastValidation"/>
These are not ref to an object that is detailled somewhere else in the feed. It should be renamed differently to avoid future confusion (==> NeTEx 3.0) but if we go too fast on that topic we might break stuff (refer to the situation with DataSourceRef that was used as a free text before).
So, my request is for us to focus on freezing the XSD to match the documentation submitted. Then, work on all hygiene for 2.1.

@thbar et @Aurige, your view?

@ue71603
Copy link
Contributor

ue71603 commented Oct 8, 2025

@skinkie I did not have time (yet) to make a thorough review of this PR. However, one thing: some xxxRef have been added in NeTEx to "systemize" some "free text", for example: <TypeOfFareStructureFactorRef ref="BetweenFirstAndLastValidation"/> These are not ref to an object that is detailled somewhere else in the feed. It should be renamed differently to avoid future confusion (==> NeTEx 3.0) but if we go too fast on that topic we might break stuff (refer to the situation with DataSourceRef that was used as a free text before). So, my request is for us to focus on freezing the XSD to match the documentation submitted. Then, work on all hygiene for 2.1.

@thbar et @Aurige, your view?

When we do the hygiene we should stop using Ref for free textin the same go.

@TuThoThai
Copy link
Collaborator

When we do the hygiene we should stop using Ref for free textin the same go.

Agree, but as said it might be considered as breaking change and not just hygiene by some

@skinkie
Copy link
Contributor Author

skinkie commented Oct 8, 2025

@TuThoThai I do not see how your comment is related to this pull request.

This still required some manual intervention for both PlaceRef as AuthorityRef.
Will fix the automatic processing later (if possible).
@TuThoThai TuThoThai modified the milestones: netex_2.0, netex_2.1 Oct 8, 2025
@TuThoThai
Copy link
Collaborator

@skinkie
I was not referring to the great technical work you are doing on cleaning this up. But, it will take a massive amount of time to check every single point in order to make sure we do not break any useful references.
So, let's take some time to examine this for the v2.1

@skinkie
Copy link
Contributor Author

skinkie commented Oct 8, 2025

It does not break anything, because all examples are validated (and repaired).
Never the less, this is targeted for NeTEx 2.0, this is a release with breaking changes, but this specific change does not break things.

The script included all xsd sub-folders, but the examples naively referenced classes without any prefixed.
For GML we now have explicitly added them (manually) to the script, with the gml: prefix. The examples were
updated accordingly.

This significantly reduces the enumerations, and reduces them to NeTEx only (without prefix).

In addition substitutionGroups 'results' have been added to the enumeration of the classes.
@TuThoThai TuThoThai modified the milestones: netex_2.1, netex_2.0 Oct 9, 2025
@TuThoThai
Copy link
Collaborator

Release meeting as of 9 Oct 2025:

  • PR is ok to go to v2.0
  • @Aurige, @thbar and @TuThoThai will check a second time every single commit
  • If all is clear, approve & merge

Deadline: 14 Oct 2025

@thbar thbar mentioned this pull request Oct 9, 2025
19 tasks
Copy link
Collaborator

@trurlurl trurlurl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to put the automatically generated files in a separate folder or to otherwise signal that no manual editing is necessary? Or are the scripts only run sporadically?

trurlurl
trurlurl previously approved these changes Oct 9, 2025
@Aurige
Copy link
Contributor

Aurige commented Oct 9, 2025

Small point; PR make even more mandatory to have a "How to add an Element to NeTEx" to be available here inn the repo (not in the CEN Doc). There was already the xxxIdType, xxxx_VersionStructure, xxxGroup, xxxRef, xxxRefStructure, xxxRefs_RelStructure, etc. we now have that huge NameOfClass and authorised ref classes details which are in a totally separated file (netex_entity_support.xsd). Maybe we need a kind of plan to maintain in (it's likely that people will forget to update it).
Also note that this netex_entity_support.xsd is now refering (not by including, but by using the name of the class) to every Parts of NeTEx, and this is from the "lowest" part (framework) where up to now only descending dependencies were allowed (for upper parts to lower ones). Even if I think that this is acceptable here, that should be mentioned to the group and limited to that very specific case.

TuThoThai
TuThoThai previously approved these changes Oct 9, 2025
Copy link
Collaborator

@TuThoThai TuThoThai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with the general comment that whenever the documentation stays "Automatic reference class for" means it has been added via the script (i.e., automatically generated).
This is for future general documentation purpose on this repo (most probably via Wiki)

@thbar
Copy link
Collaborator

thbar commented Oct 9, 2025

@skinkie I will review that too. Can you confirm all the commits are out and pushed, and the PR is stable for review? Thanks!

ue71603
ue71603 previously approved these changes Oct 9, 2025
@skinkie
Copy link
Contributor Author

skinkie commented Oct 9, 2025

Would it be possible to put the automatically generated files in a separate folder or to otherwise signal that no manual editing is necessary? Or are the scripts only run sporadically?

@trurlurl This is not really possible, because the most parts that are changed are part of regular objects.

PR make even more mandatory to have a "How to add an Element to NeTEx" to be available here in the repo (not in the CEN Doc).

@Aurige I don't know if I send someone an email on exactly this point yesterday, but I totally agree. Especially the extension / restriction point is now a choice made. That should be documented with what, how and why.

Also note that this netex_entity_support.xsd is now refering (not by including, but by using the name of the class) to every Parts of NeTEx, and this is from the "lowest" part (framework) where up to now only descending dependencies were allowed (for upper parts to lower ones). Even if I think that this is acceptable here, that should be mentioned to the group and limited to that very specific case.

@Aurige Do you mean it should actually be included with imports, or was that already the case?

Approved with the general comment that whenever the documentation stays "Automatic reference class for" means it has been added via the script (i.e., automatically generated). This is for future general documentation purpose on this repo (most probably via Wiki)

@TuThoThai Update to Automatically generated scheduled.

I will review that too. Can you confirm all the commits are out and pushed, and the PR is stable for review? Thanks!

@thbar maybe if the IdTypes is fastly implemented I might update the python script to be more correct. The few missing classes could be included. But I am not going for the 1% if the 99% is what we want to fix today.

@TuThoThai
Copy link
Collaborator

Would it be possible to put the automatically generated files in a separate folder or to otherwise signal that no manual editing is necessary? Or are the scripts only run sporadically?

@trurlurl This is not really possible, because the most parts that are changed are part of regular objects.

PR make even more mandatory to have a "How to add an Element to NeTEx" to be available here in the repo (not in the CEN Doc).

@Aurige I don't know if I send someone an email on exactly this point yesterday, but I totally agree. Especially the extension / restriction point is now a choice made. That should be documented with what, how and why.

Also note that this netex_entity_support.xsd is now refering (not by including, but by using the name of the class) to every Parts of NeTEx, and this is from the "lowest" part (framework) where up to now only descending dependencies were allowed (for upper parts to lower ones). Even if I think that this is acceptable here, that should be mentioned to the group and limited to that very specific case.

@Aurige Do you mean it should actually be included with imports, or was that already the case?

Approved with the general comment that whenever the documentation stays "Automatic reference class for" means it has been added via the script (i.e., automatically generated). This is for future general documentation purpose on this repo (most probably via Wiki)

@TuThoThai Update to Automatically generated scheduled.

I will review that too. Can you confirm all the commits are out and pushed, and the PR is stable for review? Thanks!

@thbar maybe if the IdTypes is fastly implemented I might update the python script to be more correct. The few missing classes could be included. But I am not going for the 1% if the 99% is what we want to fix today.

@skinkie, ok for me and duly noted on your comments.
Good to go as soon as @thbar reviews!

thbar
thbar previously approved these changes Oct 9, 2025
Copy link
Collaborator

@thbar thbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 🚀

Great work adding better typing & structure @skinkie.

Since I don't want to delay the merge: I'll do later (before final release) some testing against live data files, as mentioned in #963 (and in the future, will make sure I can do it at the PR review level more easily, with adequate tooling).

Suggestion: I feel the PR title & TL;DR could be updated (If I understand well) to better reflect the improvements (e.g. stronger typing, or any typing at all etc) & make it easier to prepare release notes later. But that's not a blocker given the amount of things already merged in 😄

@TuThoThai TuThoThai dismissed stale reviews from thbar, ue71603, trurlurl, and themself via 6341ea4 October 10, 2025 07:15
@TuThoThai
Copy link
Collaborator

@skinkie
I edited the comment in the script for English
@thbar, @Aurige, @ue71603 and @trurlurl : if you approve again, I will make a simple merge (not squash). So, we can identify each commits.

Copy link
Collaborator

@thbar thbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-approving (only change is without impact).

@skinkie
Copy link
Contributor Author

skinkie commented Oct 10, 2025

For the Idfixes I updated some other code too, but you'll see.

@thbar
Copy link
Collaborator

thbar commented Oct 10, 2025

For the Idfixes I updated some other code too, but you'll see.

@skinkie code that you will push on this PR & we must add to the review? (or that you'll push elsewhere?)

(asking since @TuThoThai was more or less about to merge in the current state, last time we discussed)

@TuThoThai TuThoThai merged commit 382e1fc into next Oct 10, 2025
1 check passed
@thbar
Copy link
Collaborator

thbar commented Oct 10, 2025

For the reader - the other code is at:

Moving forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hygiene Technical dept, results in a breaking change. needs wiki update

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants