Skip to content

Conversation

@woodhead2019
Copy link

Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji
gt: optional Unicode wide-char width calculation (HB_GTI_WIDECHARWIDTH)

  • Add HB_GTI_WIDECHARWIDTH switch, disabled by default for zero overhead
  • Only activate width calculation when user calls
    hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
  • Based on public-domain mk_wcwidth; supports narrow(1)/wide(2)/zero(0)
  • 100% backward compatible, no binary bloat

Usage:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.) && enable

When using Harbour with UTF-8 codepage (UTF8EX), the `col()`
function returns incorrect column positions for text containing
wide characters (e.g., CJK characters like 中文, Japanese
characters like 日, Korean characters, etc.).

This fixes cursor mis-alignment in UTF-8 terminals when output contains CJK, Emoji or other multi-column characters.

Key points:
    New HB_GTI_WIDECHARWIDTH switch, disabled by default, zero run-time overhead.
    Width logic is activated only after hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.).
    Implementation based on public-domain mk_wcwidth; supports narrow(1), wide(2) and zero-width(0) characters.
    No binary bloat, no breaking changes—existing applications compile and run unchanged.

Example:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.)  && enable wide-char calculation
The cursor now advances by actual display columns, so subsequent prompts are correctly aligned.
@alcz
Copy link
Contributor

alcz commented Jan 5, 2026

I'm not able to look at it extensievly right now, but pulling in mk_mcwidth that is not ANSI C and would make compatibility problems. <wchar.h> and wchar_t are around C95, so the code would have to be ported.

src/rtl/cdpapi.c Outdated
#include "hbapi.h"
#include "hbapierr.h"
#include "hbapicdp.h"
#include "hbgtcore.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling GT core api from here doesn't look like a good idea, respecting the setting should be done somehow still in terminal code.

@woodhead2019 woodhead2019 force-pushed the fix-utf8-cursor-col branch 2 times, most recently from d9bd0ae to 8331ff0 Compare January 6, 2026 14:22
The fix add the `hb_cdpUTF8CharWidth` function in `cdpapi.c` to check
the GT driver's `fWideCharWidth` flag. When enabled, it uses the `mk_wcwidth`
function from the public domain implementation for accurate Unicode TR11
width calculation. When disabled, it returns the default width of 1 for
backward compatibility.

- Add HB_GTI_WIDECHARWIDTH switch, disabled by default for zero overhead
- Only activate width calculation when user calls
  hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
- Based on public-domain mk_wcwidth; supports narrow(1)/wide(2)/zero(0)
- 100% backward compatible, no binary bloat
- Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji

Usage:
  hb_cdpSelect("UTF8EX")
  hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)     && enable
@woodhead2019
Copy link
Author

Thank you for your patient review and guidance—I’ve revised the code accordingly. I last used Clipper over thirty years ago and then left the IT industry. With the help of AI, I’m now trying to fix UTF-8 character-display issues in CJK terminals, and I hope the solution meets Harbour’s standards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants