Skip to content

Conversation

@MiguelRosaTauroni
Copy link
Collaborator

The following PR addresses the issues identified in issue #430 , which revealed execution problems and misalignments with the VTL 2.1 specification. In addition, an error-handling mechanism has been incorporated for the functions keep, rename, filter, and calc, in order to report both the cause of the error and the VTL statement that triggered it. Below is a detailed list of the main features included in this PR.

ClauseVisitor – Key Changes

General / Exceptions

  • Standardized error messages to include line, column, and full statement across these clauses: keep, rename, calc and filter.

KEEP/DROP

  • Added explicit validation for unknown columns: now throws VtlRuntimeException(new InvalidArgumentException(...)) when a requested component is not in the dataset.
  • Enforced VTL 2.1 rule: identifiers must not be explicitly listed in KEEP/DROP; the error now reports each offending identifier with role/type.
  • Result projection continues to preserve dataset order and guarantees identifiers remain present (both after KEEP and DROP).

CALC

  • Blocks overwriting existing IDENTIFIER components.
  • Detects duplicate targets within the same CALC clause and raises an engine exception.
  • Validates internal consistency of collected expressions / roles / expressionStrings before execution; on mismatch, raises a VtlRuntimeException.

RENAME

Validates and errors on:

  • duplicate 'from' names,
  • duplicate 'to' names,
  • non-existent 'from' columns, and collisions where a to name already exists among untouched columns.
  • All above use VtlRuntimeException(new InvalidArgumentException(...)) with precise context.

FILTER

  • Unchanged semantics; errors now surface as engine-native exceptions with line/column/statement.

UNITARY TESTS

  • Addapted

MiguelRosaTauroni and others added 4 commits August 25, 2025 07:20
…ply with the requirements of the VTL 2.1 specification. In addition, an error handling mechanism has been incorporated into all the aforementioned functions, providing both the cause and the statement that triggered the error in the event of a runtime failure
@MiguelRosaTauroni MiguelRosaTauroni self-assigned this Aug 26, 2025
@MiguelRosaTauroni MiguelRosaTauroni added the bug Something isn't working label Aug 26, 2025
Copy link
Collaborator

@hadrienk hadrienk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Miguel, very nice contribution thank you!

I took a first pass. There are a few important changes that we need to address.
Please don't hesitate to ask for clarifications if needed.

Comment on lines 142 to 147
if (!availableColumns.contains(requested)) {
String errorMsg =
String.format(
"Error: column '%s' not found in dataset. Line %d, position %d. Statement: [%s]",
requested, line, charPosition, statement);
throw new VtlRuntimeException(new InvalidArgumentException(errorMsg, fromContext(ctx)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VtlRuntimeException handles reporting the position so you don't have to do it yourself. See the fromContext(...) that returns a positionable for the particular ANTLR context.

Suggested change
if (!availableColumns.contains(requested)) {
String errorMsg =
String.format(
"Error: column '%s' not found in dataset. Line %d, position %d. Statement: [%s]",
requested, line, charPosition, statement);
throw new VtlRuntimeException(new InvalidArgumentException(errorMsg, fromContext(ctx)));
if (!availableColumns.contains(requested)) {
throw new VtlRuntimeException(new InvalidArgumentException(String.format("'%s' not found in dataset", requested), fromContext(ctx)));

Comment on lines 110 to 113
// Error reporting context
final int line = ctx.getStart().getLine();
final int charPosition = ctx.getStart().getCharPositionInLine();
final String statement = ctx.getText();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Error reporting context
final int line = ctx.getStart().getLine();
final int charPosition = ctx.getStart().getCharPositionInLine();
final String statement = ctx.getText();

.filter(identifiers::containsKey)
.collect(Collectors.toCollection(LinkedHashSet::new));

if (!forbidden.isEmpty()) {
Copy link
Collaborator

@hadrienk hadrienk Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you are building the complete set of possible errors. While this is a very nice thing for the end user, this makes the implementation very complex. I suggest a simple fail fast approach for now. If we want to offer more that one error we should bake that in the VtlScritpException API.

Copy link
Collaborator

@hadrienk hadrienk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Miguel,

Do you think you could split this PR into the different PR of each area? Something like:

  • Calc
  • Keep/drop
  • Rename
  • Filter

@MiguelRosaTauroni
Copy link
Collaborator Author

Hi hadrienk,

Apologies for the delay in replying; I was involved in other priorities.

To summarize the rationale behind the changes:
• The keep operator was not fully aligned with the specification, particularly regarding the use of identifiers.
• During large-scale testing with long concatenated VTL scripts (thousands of lines), we found it difficult to trace the real source of errors. Messages were too generic, and in some cases errors appeared in later operators instead of where the issue originated (e.g., rename vs. leftjoin).
• The PR introduces more precise validations and clearer error handling, providing specific fields and statements involved without performance impact. This is especially relevant in scenarios with large scripts.
• We also extended the same approach to rename, calc, and filter, which appeared frequently in testing and exposed similar issues.

Regarding your suggestion on ctx management, I did not fully understand the change. If you can provide a concrete example, I will adapt the code as you propose. I still believe error messages should explicitly include the fields and the VTL statement causing the runtime issue, otherwise the engine is not verbose enough for effective debugging.

For the other minor points, I agree with your proposals. On splitting the PR into four (one per operator), I will not be able to take that on due to project changes. I can either complete this PR with the requested adjustments or leave the splitting for a colleague who may continue this work.

…ply with the requirements of the VTL 2.1 specification. In addition, an error handling mechanism has been incorporated into all the aforementioned functions, providing both the cause and the statement that triggered the error in the event of a runtime failure

Unitary tests adapted
# Conflicts:
#	vtl-engine/src/main/java/fr/insee/vtl/engine/visitors/ClauseVisitor.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/ClauseVisitorTest.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/expression/ArithmeticExprTest.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/expression/ComparisonExprTest.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/expression/ConditionalExprTest.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/expression/UnaryExprTest.java
#	vtl-engine/src/test/java/fr/insee/vtl/engine/visitors/expression/functions/JoinFunctionsTest.java
@hadrienk
Copy link
Collaborator

Hi hadrienk,

Apologies for the delay in replying; I was involved in other priorities.

To summarize the rationale behind the changes: • The keep operator was not fully aligned with the specification, particularly regarding the use of identifiers. • During large-scale testing with long concatenated VTL scripts (thousands of lines), we found it difficult to trace the real source of errors. Messages were too generic, and in some cases errors appeared in later operators instead of where the issue originated (e.g., rename vs. leftjoin). • The PR introduces more precise validations and clearer error handling, providing specific fields and statements involved without performance impact. This is especially relevant in scenarios with large scripts. • We also extended the same approach to rename, calc, and filter, which appeared frequently in testing and exposed similar issues.

Regarding your suggestion on ctx management, I did not fully understand the change. If you can provide a concrete example, I will adapt the code as you propose. I still believe error messages should explicitly include the fields and the VTL statement causing the runtime issue, otherwise the engine is not verbose enough for effective debugging.

For the other minor points, I agree with your proposals. On splitting the PR into four (one per operator), I will not be able to take that on due to project changes. I can either complete this PR with the requested adjustments or leave the splitting for a colleague who may continue this work.

Hi Miguel, no problem. Thanks for taking the time.

The 'fromContext` method takes an ANTLR grammar object (token I think, reviewing from the phone). This contains all the information needed to construct an error message that is positioned.

For instance, if a variable is not found, you can pass the actual variable 'context' to the exception and one will be able to position the error (lines and columns + text value). Therefore, adding the name of the variables and their position is redundant.

There's still improvements to be made to the error system, but we can make improvements later as long as we pass the proper contextes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants