Skip to content

Conversation

@danipen
Copy link
Owner

@danipen danipen commented Dec 15, 2025

Summary

  • Replace ArrayPool<char> with direct char[] allocation when appending newlines in Grammar.Tokenize(). ArrayPool buffers can be reused with different content, which breaks Onigwrap's regex search caching (uses memory reference equality). Direct allocation creates unique buffers that enable proper caching, improving overall tokenization performance.
  • Update Demo program to pass lines with line terminators included, using ReadOnlyMemory<char> slices from a single file read. This avoids per-line allocations and demonstrates the optimal usage pattern.
  • Add XML documentation to IModelLines and ITMModel interfaces, including guidance that GetLineText() should return lines with terminators to avoid internal allocations.

⚠️ Breaking Changes

  • IModelLines.GetLineText() was renamed to IModelLines.GetLineTextIncludingTerminators() for API clarity.

ArrayPool buffers can be reused with different content, which breaks
Onigwrap's regex search caching (uses memory reference equality).
Direct allocation creates unique buffers that enable proper caching,
improving overall tokenization performance.
@danipen danipen merged commit a80eb82 into master Dec 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants