fix: Cast smaller integer types to int32/int64 on write for Spark compatibility #2799

somasays · 2025-12-08T17:36:47Z

Summary

Fixes attempting to write smallint/tinyint into int column results in incompatibility with other iceberg APIs #2791: Writing smaller integer types (uint8, int8, int16, uint16) to Iceberg IntegerType columns now correctly casts to int32/int64
PyIceberg was preserving original Arrow types in Parquet files, causing Spark to fail with Unsupported logical type: UINT_8
Added integer type widening logic in ArrowProjectionVisitor._cast_if_needed() following the same pattern as existing timestamp handling
Only widening conversions are allowed (e.g., uint8 → int32, int32 → int64); narrowing conversions continue to be rejected via promote()

Test plan

All 3041 unit tests pass
Lint passes
New parameterized test covers: uint8, int8, int16, uint16 → int32 and uint32, int32 → int64
Existing test_projection_filter_add_column_demote still works (narrowing rejection)
Manual verification: uint8 data written to IntegerType column produces int32 in Parquet file

…patibility When writing PyArrow tables with smaller integer types (uint8, int8, int16, uint16) to Iceberg tables with IntegerType columns, PyIceberg preserves the original Arrow type in the Parquet file. This causes Spark to fail with: java.lang.UnsupportedOperationException: Unsupported logical type: UINT_8 The fix casts smaller integer types to their canonical Iceberg representation (int32 for IntegerType, int64 for LongType) during write, ensuring cross-platform compatibility. Only widening conversions are allowed - narrowing conversions (e.g., int64 to int32) continue to be rejected via the existing promote() function. Closes apache#2791

somasays · 2025-12-08T18:12:14Z

Verification with issue reproduction scenario

Source parquet type: uint8
Written parquet type: int32 ✓ (Spark compatible)
Data integrity: preserved ✓

Tested with exact data from #2791 (uint8 column with values [None, None, None, None, 217, 163, 130, None, 69, 78]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Cast smaller integer types to int32/int64 on write for Spark compatibility #2799

fix: Cast smaller integer types to int32/int64 on write for Spark compatibility #2799

somasays commented Dec 8, 2025 •

edited

Loading

Uh oh!

somasays commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: Cast smaller integer types to int32/int64 on write for Spark compatibility #2799

Are you sure you want to change the base?

fix: Cast smaller integer types to int32/int64 on write for Spark compatibility #2799

Conversation

somasays commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

somasays commented Dec 8, 2025

Verification with issue reproduction scenario

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

somasays commented Dec 8, 2025 •

edited

Loading