Skip to content

unexpand: -a uses Unicode display width instead of byte count for multibyte characters #9948

@ChrisDryden

Description

@ChrisDryden

Example Input:

"1ΔΔΔ5 99999" (3 spaces after the 5)
Δ (U+0394) is 2 bytes in UTF-8

printf '1ΔΔΔ5   99999\n' | unexpand -a | xxd

GNU unexpand output: keeps 3 spaces (no conversion)

  00000000: 31ce 94ce 94ce 9435 2020 2039 3939 3939  1......5   99999

uutils unexpand output: converts to tab

00000000: 31ce 94ce 94ce 9435 0939 3939 3939 0a    1......5.99999.

This doesn't fully fix the Busybox non utf-8 tests because they have different rules about trailing spaces, but this example came from there.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions