-
Notifications
You must be signed in to change notification settings - Fork 14
Implement utf8 support and solve #8 issue #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
lionkor
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! Please see my comments. I will code review in detail and test on all supported platforms soon, as well.
src/backends/InteractiveBackend.cpp
Outdated
| return res; | ||
| } | ||
|
|
||
| size_t utf8_length(const char* str, size_t siz = -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be reasonable to use a library to handle UTF-8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's a header-only library that might suit this repo. At first, I thought this library was too big for only three functions. Mb just add this lib as a Git submodule? Or some other lib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this library and changed my mind, because it checks every character and throws exceptions on non-valid input and that needs to be handled. I have no idea how to handle such things in a separate thread for console reading and whether they make sense in the console. That's why I decided to improve the current functions so that they just return values that will definitely not lead to an out of range.
If it is not suitable, I can redo it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for looking into it! I'll read up on UTF-8 so I can be sure that this is conformant, but it looks good. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any news?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I genuinely have not had the time or motivation to understand the unicode utf 8 spec to a point where i can merge this code without concerns :/
- Fix UTF8 handling for `current_view_cursor_pos` and `current_view` functions
- Move UTF8 handling to a separate header file
- Rename and document UTF8 functions
- Fix `get_terminal_width` function, not correctly using CONSOLE_SCREEN_BUFFER_INFO structure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements UTF-8 character support for the interactive terminal backend and fixes issue #8 related to terminal width calculation. It also removes redundant newline/carriage return characters at the end of commands.
- Adds UTF-8 character handling with new functions for reading multi-byte characters
- Separates cursor position tracking into byte position and visual character position
- Fixes terminal width calculation bug (was returning height instead of width)
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/windows_impl.cpp | Adds UTF-8 console setup, new UTF-8 character reading function, fixes terminal width bug |
| src/linux_impl.cpp | Implements UTF-8 character reading for Linux platform |
| src/impls.h | Adds declaration for new UTF-8 character reading function |
| src/backends/utf8.hpp | New utility functions for UTF-8 string manipulation and length calculation |
| src/backends/InteractiveBackend.h | Adds new cursor position tracking member and multi-byte character method |
| src/backends/InteractiveBackend.cpp | Integrates UTF-8 support throughout interactive input handling |
| CMakeLists.txt | Updates CMake and C++ standard versions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
I requested an AI review here to speed things up. I'll still do a manual review. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Make the UTF-8 character check consistent with other UTF-8 functions Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- and also apply copilot's recommendation
|
Should I increase the CMake version to 3.5? Or higher? For 3.5 there's also a warning: [cmake] Compatibility with CMake < 3.10 will be removed from a future version of
[cmake] CMake. |
|
I don't see why we wouldn't pull it up to 3.10 or so. That's from 2018. I think we can safely use that ;) |
|
Why are we moving to C++23? From C++11, that's quite the jump. EDIT: Nevermind, already changed back. My bad. |
|
Other than that, this looks pretty good, nice clean code in the utf8 file now. Good job! I'll do a thorough code-review and some testing still. |
This PR implements UTF-8 handling and, by the way, resolves #8 issue.
Also solves issue with redundant
\nor\rcharacters at the end of commands.Linux implementation not checked.