Skip to content

Conversation

@somasays
Copy link
Contributor

@somasays somasays commented Dec 8, 2025

Summary

  • Fixes attempting to write smallint/tinyint into int column results in incompatibility with other iceberg APIs #2791: Writing smaller integer types (uint8, int8, int16, uint16) to Iceberg IntegerType columns now correctly casts to int32/int64
  • PyIceberg was preserving original Arrow types in Parquet files, causing Spark to fail with Unsupported logical type: UINT_8
  • Added integer type widening logic in ArrowProjectionVisitor._cast_if_needed() following the same pattern as existing timestamp handling
  • Only widening conversions are allowed (e.g., uint8 → int32, int32 → int64); narrowing conversions continue to be rejected via promote()

Test plan

  • All 3041 unit tests pass
  • Lint passes
  • New parameterized test covers: uint8, int8, int16, uint16 → int32 and uint32, int32 → int64
  • Existing test_projection_filter_add_column_demote still works (narrowing rejection)
  • Manual verification: uint8 data written to IntegerType column produces int32 in Parquet file

Closes #2791

…patibility

When writing PyArrow tables with smaller integer types (uint8, int8,
int16, uint16) to Iceberg tables with IntegerType columns, PyIceberg
preserves the original Arrow type in the Parquet file. This causes
Spark to fail with:

    java.lang.UnsupportedOperationException: Unsupported logical type: UINT_8

The fix casts smaller integer types to their canonical Iceberg
representation (int32 for IntegerType, int64 for LongType) during
write, ensuring cross-platform compatibility.

Only widening conversions are allowed - narrowing conversions (e.g.,
int64 to int32) continue to be rejected via the existing promote()
function.

Closes apache#2791
@somasays
Copy link
Contributor Author

somasays commented Dec 8, 2025

Verification with issue reproduction scenario

Source parquet type: uint8
Written parquet type: int32 ✓ (Spark compatible)
Data integrity: preserved ✓

Tested with exact data from #2791 (uint8 column with values [None, None, None, None, 217, 163, 130, None, 69, 78]).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

attempting to write smallint/tinyint into int column results in incompatibility with other iceberg APIs

1 participant