Skip to content

Conversation

@Melnytskyi
Copy link

@Melnytskyi Melnytskyi commented Jul 1, 2024

This PR implements UTF-8 handling and, by the way, resolves #8 issue.
Also solves issue with redundant \n or \r characters at the end of commands.

Linux implementation not checked.

Copy link
Owner

@lionkor lionkor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Please see my comments. I will code review in detail and test on all supported platforms soon, as well.

return res;
}

size_t utf8_length(const char* str, size_t siz = -1) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be reasonable to use a library to handle UTF-8?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a header-only library that might suit this repo. At first, I thought this library was too big for only three functions. Mb just add this lib as a Git submodule? Or some other lib?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this library and changed my mind, because it checks every character and throws exceptions on non-valid input and that needs to be handled. I have no idea how to handle such things in a separate thread for console reading and whether they make sense in the console. That's why I decided to improve the current functions so that they just return values that will definitely not lead to an out of range.

If it is not suitable, I can redo it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for looking into it! I'll read up on UTF-8 so I can be sure that this is conformant, but it looks good. Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any news?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I genuinely have not had the time or motivation to understand the unicode utf 8 spec to a point where i can merge this code without concerns :/

Melnytskyi and others added 3 commits July 6, 2024 16:22
    - Fix UTF8 handling for `current_view_cursor_pos` and `current_view` functions
    - Move UTF8 handling to a separate header file
    - Rename and document UTF8 functions
    - Fix `get_terminal_width` function, not correctly using CONSOLE_SCREEN_BUFFER_INFO structure
@Melnytskyi Melnytskyi marked this pull request as ready for review July 9, 2024 19:41
@lionkor lionkor requested a review from Copilot September 20, 2025 14:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements UTF-8 character support for the interactive terminal backend and fixes issue #8 related to terminal width calculation. It also removes redundant newline/carriage return characters at the end of commands.

  • Adds UTF-8 character handling with new functions for reading multi-byte characters
  • Separates cursor position tracking into byte position and visual character position
  • Fixes terminal width calculation bug (was returning height instead of width)

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/windows_impl.cpp Adds UTF-8 console setup, new UTF-8 character reading function, fixes terminal width bug
src/linux_impl.cpp Implements UTF-8 character reading for Linux platform
src/impls.h Adds declaration for new UTF-8 character reading function
src/backends/utf8.hpp New utility functions for UTF-8 string manipulation and length calculation
src/backends/InteractiveBackend.h Adds new cursor position tracking member and multi-byte character method
src/backends/InteractiveBackend.cpp Integrates UTF-8 support throughout interactive input handling
CMakeLists.txt Updates CMake and C++ standard versions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@lionkor
Copy link
Owner

lionkor commented Sep 20, 2025

I requested an AI review here to speed things up. I'll still do a manual review.

Melnytskyi and others added 3 commits September 20, 2025 16:59
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Make the UTF-8 character check consistent with other UTF-8 functions

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
  - and also apply copilot's recommendation
@Melnytskyi
Copy link
Author

Melnytskyi commented Sep 20, 2025

Should I increase the CMake version to 3.5? Or higher? For 3.5 there's also a warning:

[cmake]   Compatibility with CMake < 3.10 will be removed from a future version of
[cmake]   CMake. 

@lionkor
Copy link
Owner

lionkor commented Sep 20, 2025

I don't see why we wouldn't pull it up to 3.10 or so. That's from 2018. I think we can safely use that ;)

@lionkor
Copy link
Owner

lionkor commented Sep 20, 2025

Why are we moving to C++23? From C++11, that's quite the jump.

EDIT: Nevermind, already changed back. My bad.

@lionkor
Copy link
Owner

lionkor commented Sep 20, 2025

Other than that, this looks pretty good, nice clean code in the utf8 file now. Good job! I'll do a thorough code-review and some testing still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use WinAPI functions on Windows

2 participants