-
Notifications
You must be signed in to change notification settings - Fork 18
fix: Enforce UTF-8 for AppMap I/O and add encoding regression tests #314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes UTF-8 encoding issues in AppMap I/O operations to prevent data corruption on systems with non-UTF-8 default character sets (like Windows-1252). The fix ensures consistent UTF-8 encoding when writing AppMap files, reading them back, and serving them over HTTP.
Changes:
- Enforced explicit UTF-8 charset specification for all file I/O operations (readers and writers)
- Added charset specification to HTTP Content-Type headers and corrected Content-Length calculation
- Introduced comprehensive regression tests to verify encoding behavior under non-UTF-8 system defaults
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| agent/src/main/java/com/appland/appmap/record/RecordingSession.java | Updated to use OutputStreamWriter with UTF-8 instead of FileWriter for writing AppMap files |
| agent/src/main/java/com/appland/appmap/record/Recording.java | Changed to use InputStreamReader with UTF-8 instead of FileReader for reading AppMap files |
| agent/src/main/java/com/appland/appmap/process/hooks/remoterecording/ServletRequest.java | Added charset=UTF-8 to Content-Type headers and fixed Content-Length calculation to use byte count |
| agent/test/encoding/encoding.bats | BATS test suite verifying UTF-8 encoding under Windows-1252 system defaults |
| agent/test/encoding/UnicodeTest.java | Test class that exercises AppMap recording with Unicode content |
| agent/test/encoding/ReadFullyTest.java | Test class validating Recording.readFully() with UTF-8 content |
| agent/test/encoding/encoding_test.cp1252 | Test data file with Windows-1252 encoded content |
| agent/test/encoding/appmap.yml | AppMap configuration for encoding tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
agent/src/main/java/com/appland/appmap/record/RecordingSession.java
Outdated
Show resolved
Hide resolved
agent/src/main/java/com/appland/appmap/record/RecordingSession.java
Outdated
Show resolved
Hide resolved
This commit ensures consistent UTF-8 character encoding for AppMap data across file system operations and HTTP responses, preventing corruption on systems where the default charset is not UTF-8 (e.g., Windows-1252). Key changes: - `RecordingSession.java`: Enforce `StandardCharsets.UTF_8` when creating writers for AppMap files, replacing reliance on default `FileWriter`. - `Recording.java`: Explicitly use `StandardCharsets.UTF_8` in `readFully` to correctly decode AppMap content during retrieval. - `ServletRequest.java`: Set `Content-Type: application/json; charset=UTF-8` for remote recording responses and calculate `Content-Length` based on UTF-8 byte size rather than string length. - `agent/test/encoding/`: Add a comprehensive regression test suite (`encoding.bats`, `UnicodeTest.java`, `ReadFullyTest.java`) to verify encoding handling for both reading and writing operations under non-UTF-8 environment settings.
… leaks The fastjson JSONWriter.close() method does not close the underlying Writer passed to its constructor - it only closes the internal SerializeWriter, which has no effect. This caused FileOutputStream and OutputStreamWriter instances to leak in RecordingSession. Changes: - Make AppMapSerializer implement AutoCloseable and track the underlying Writer to close it explicitly - Add try-with-resources for AppMapSerializer in checkpoint() to ensure proper cleanup on exceptions - Add try-finally in stop() to ensure serializer is closed even if write() fails - Consolidate JSON finalization and closing logic into close() method, removing the redundant finish() method Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1f70837 to
6f5c34e
Compare
This commit ensures consistent UTF-8 character encoding for AppMap data across file system operations and HTTP responses, preventing corruption on systems where the default charset is not UTF-8 (e.g., Windows-1252).
Key changes:
RecordingSession.java: EnforceStandardCharsets.UTF_8when creating writers for AppMap files, replacing reliance on defaultFileWriter.Recording.java: Explicitly useStandardCharsets.UTF_8inreadFullyto correctly decode AppMap content during retrieval.ServletRequest.java: SetContent-Type: application/json; charset=UTF-8for remote recording responses and calculateContent-Lengthbased on UTF-8 byte size rather than string length.agent/test/encoding/: Add a comprehensive regression test suite (encoding.bats,UnicodeTest.java,ReadFullyTest.java) to verify encoding handling for both reading and writing operations under non-UTF-8 environment settings.