-
Notifications
You must be signed in to change notification settings - Fork 1
Deduplicate memory files based on symlinks, content hash, and similarity #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ection Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
…e content Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
alexec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Use https://github.com/adrg/strutil to dedupe based on similarity to existing memories.
Co-authored-by: alexec <1142830+alexec@users.noreply.github.com>
Done! Added similarity-based deduplication using Implementation:
Example: Commit: d5a709b |
|
Note to self: not sure this is a real problem. |
Memory files can be duplicated through symlinks, file copies, or very similar content, causing redundant content in generated prompts.
Changes
filepath.EvalSymlinks()to detect files accessed through different symlinksgithub.com/adrg/strutilwith Sorensen-Dice coefficient to detect very similar files (95%+ similarity threshold)Implementation
Files with different frontmatter but identical body content are treated as distinct files since both hash and similarity checks use raw file content including frontmatter.
Similarity Detection
The similarity-based deduplication uses:
This catches cases where files have minor wording differences (e.g., "Contact the team if you have any issues" vs "Contact the team if you encounter any problems") that would otherwise result in redundant content.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.