Skip to content

Conversation

@HMS17
Copy link

@HMS17 HMS17 commented Dec 3, 2024

Description

Story: BI-2055 - Remove missingValueString tablesaw values

Default tablesaw behavior silently converts certain values to an empty string when processing tables, which can lead to unexpected behavior when importing files in DeltaBreed. This follows up on work implemented in BI-1993 to also remove this silent conversion for N/A, NaN, *, and null values and update associated tests.

Dependencies

bi-api: bug/BI-2055

Testing

Will be tested in bi-api PR after this is merged, due to difficulties in utilizing tablesaw from a different branch

Checklist:

  • I have performed a self-review of my own code
  • I have tested my code and ensured it meets the acceptance criteria of the story
  • I have tested that my code works with both the brapi-java-server and BreedBase
  • I have create/modified unit tests to cover this change
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to documentation
  • I have run TAF: <please include a link to TAF run>

@HMS17 HMS17 changed the base branch from develop to master January 10, 2025 14:53
@HMS17 HMS17 marked this pull request as ready for review January 10, 2025 20:28
@HMS17 HMS17 requested review from davedrp and mlm483 January 10, 2025 20:33
@mlm483 mlm483 self-assigned this Jan 14, 2025
Copy link

@mlm483 mlm483 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of merging so we can test with bi-api.

I left a few questions in comments.

Comment on lines 33 to +36
// No default missing indicators
// TODO: Allow this to be configurable?
public static final ImmutableList<String> MISSING_INDICATORS =
ImmutableList.of(missingInd1, missingInd2, missingInd4, missingInd5);
ImmutableList.of();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to keep lines 25-31 around in this file at all, if those variables are not used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I was admittedly doing the bare minimum to obtain the desired change in functionality lest I accidentally break something else, but that is something that can be reasonably removed, so just added a commit!

Table t = Table.read().csv("../data/missing_values2.csv");
assertEquals(1, t.stringColumn(0).countMissing());
assertEquals(1, t.numberColumn(1).countMissing());
assertEquals(0, t.numberColumn(1).countMissing());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this change needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test counts missing values, which includes both values in the MISSING_INDICATORS that were turned to an empty string, as well as the empty string itself. Removing values from MISSING_INDICATORS meant that one of the values in the csv was no longer a missing value, and so the count in the test had to be adjusted.

@HMS17 HMS17 merged commit feeef35 into master Jan 15, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants