-
-
Notifications
You must be signed in to change notification settings - Fork 21
feat: Add filesystem caching with global refresh support #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Implement CacheManager class for code execution result caching - Support hash-based caching (automatic invalidation on code changes) - Support custom cache IDs for cross-build persistence - Add MARKDOWN_EXEC_CACHE_REFRESH environment variable for global refresh - Add refresh option to force cache updates per code block - Store cache files in .markdown-exec-cache/ directory - Add comprehensive test suite with 14 cache-specific tests - Add caching documentation with usage examples - Update README with caching quickstart - Isolate test cache directories to prevent pollution This improves documentation build performance by caching expensive operations like plot generation, API calls, and computations.
|
It looks like you've used AI. Did you thoroughly review every code change and docs change? |
Yes, I have reviewed it. |
|
Thanks. What is your motivation here, are you a user of markdown-exec? |
pawamoy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the PR looks good, as often with code models, it doesn't really address what was said in #4. Specifically, the intent to actually track cached files (for use in CI) is not reflected in this PR. As a consequence, the PR also does nothing for removing stale cache entries.
| ``` | ||
| ```` | ||
|
|
||
| Use custom cache IDs for persistence across builds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Force some reason the AI advertises custom IDs as providing persistence across builds throughout the change, while this is of course provided by the caching feature globally. There will be some rewording to do.
| ``` | ||
| ```` | ||
|
|
||
| Force cache refresh with `refresh="yes"`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced by the utility of this refresh feature. Having to update the code block, build docs, then modify the code block back seems tedious. I'd prefer a solution where an environment variable is used, as for global refreshes. This would require IDs of course, so, to be discussed.
Generally speaking, this feature doesn't seem to come from the requirements identified in #4. It doesn't address what was said there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have provide MARKDOWN_EXEC_CACHE_REFRESH=1 for global control and I think refresh="yes" is necessary for user who are modifying code and just affect a little piece of document. He can quickly modify project source code and just run one code block to see if everything is fine.
If user try to remove cache="true" to let block always run, when he add it back, the cache system will use previous cache result which will lead to mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could check every file under cache dir if it is currently mentioned in build system and and remove all legacy cache files. Thus we can remove the refresh="yes" option. It affect performance a little, but it sounds worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pawamoy which approach do you prefer? Use refresh="yes" or manage all cache files in cache dir?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stale cache entries must be automatically deleted for sure 🙂
| @@ -0,0 +1,245 @@ | |||
| # Caching | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The page generally looks good, except for some parts which don't make a lot of sense. It's also a bit too verbose. This would have to be reworked (I will provide more details).
| cache_manager = get_cache_manager() | ||
| cache_id = cache if isinstance(cache, str) else None | ||
| cache_options = { | ||
| "language": language, | ||
| "html": html, | ||
| "result": result, | ||
| "returncode": returncode, | ||
| "workdir": workdir, | ||
| "width": width, | ||
| "extra": extra, | ||
| } | ||
| cache_manager.set(cache_id, source_input, output, **cache_options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems wasteful to recompute everything. We can reuse the previously computed cache id and cache options, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, I'll change this.
|
I am also a bit annoyed by the fact that you (or the agent you used) didn't respect the PR template we provide. The original issue (#4) wasn't even linked properly. And the markup is broken, which makes me doubt you actually thoroughly reviewed this. |
What do you mean about "the markup is broken"? |
|
The Usage Examples in the PR body are ill-formatted (it failed to use quadruple-backtick fences). |
ab0abd5 to
91d8cab
Compare
Sorry, I am not an expert of markdown and this is the first time I know about |
|
No worries, thanks for updating the PR body. |
Summary
This PR implements a comprehensive filesystem-based caching system for code execution results, eliminating the need for in-memory caching and providing cross-build persistence.
Key Features
🚀 Filesystem Caching
.markdown-exec-cache/directory🔑 Custom Cache IDs
cache="my-plot"🔄 Cache Refresh Options
refresh="yes"forces re-executionMARKDOWN_EXEC_CACHE_REFRESH=1environment variable🧪 Test Improvements
📚 Documentation
docs/usage/caching.mdChanges
New Files
src/markdown_exec/_internal/cache.py- CacheManager implementationtests/test_cache.py- Comprehensive test suitedocs/usage/caching.md- User documentationModified Files
src/markdown_exec/_internal/formatters/base.py- Integrated cachingsrc/markdown_exec/_internal/main.py- Added cache/refresh optionstests/conftest.py- Test isolation fixturesrc/markdown_exec/__init__.py- Exposed CacheManager in public API.gitignore- Added .markdown-exec-cache/README.md- Added caching sectionmkdocs.yml- Added caching docs to navigationUsage Examples
Hash-Based Caching (Auto-Invalidation)
Custom Cache ID (Cross-Build Persistence)
Force Refresh
Global Refresh (All Caches)
Testing
All tests pass:
Performance Impact
Breaking Changes
None - this is a new feature. Existing code blocks work unchanged.
Related Issues
Addresses the requirement for filesystem-only caching with global refresh trigger.