Unexpand: use byte count for multibyte characters for column width when using -a flag #9949

JaneIllario · 2025-12-31T16:27:18Z

Salut, merci pour la review! 😄

In #9948 the unexpand utilities diverges from the GNU behavior with multi-byte characters. This fix replaces the UnicodeWidthChar with nbytes to maintain the same compatibility with the GNU unexpand. I also added an integration test based on the busybox test and the issue description to make sure that we don't regress to the same behavior.

codspeed-hq · 2025-12-31T16:44:40Z

CodSpeed Performance Report

Merging #9949 will improve performance by 6.85%

_{Comparing JaneIllario:unexpand-unicode (2ded1fe) with main (fd68328)¹}

Summary

⚡ 2 improvements
✅ 134 untouched
⏩ 15 skipped²

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	`unexpand_many_lines[100000]`	269.6 ms	252.3 ms	+6.85%
⚡	`unexpand_large_file[10]`	565.2 ms	529 ms	+6.85%

No successful run was found on main (c8c412c) during the generation of this report, so fd68328 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩
15 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

JaneIllario added 2 commits December 31, 2025 11:12

Remove Unicode width calculation for characters

208d661

Add test for unexpand with multibyte UTF-8 input

2ded1fe

cakebaker linked an issue Dec 31, 2025 that may be closed by this pull request

unexpand: -a uses Unicode display width instead of byte count for multibyte characters #9948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Unexpand: use byte count for multibyte characters for column width when using -a flag #9949

Unexpand: use byte count for multibyte characters for column width when using -a flag #9949

JaneIllario commented Dec 31, 2025

Uh oh!

codspeed-hq bot commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Unexpand: use byte count for multibyte characters for column width when using -a flag #9949

Are you sure you want to change the base?

Unexpand: use byte count for multibyte characters for column width when using -a flag #9949

Conversation

JaneIllario commented Dec 31, 2025

Uh oh!

codspeed-hq bot commented Dec 31, 2025

CodSpeed Performance Report

Merging #9949 will improve performance by 6.85%

Summary

Benchmarks breakdown

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant