Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ uv.lock
.mypy_cache/
.ruff_cache/
__pycache__/
.markdown-exec-cache/
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,41 @@ grep extra_css README.md && exit 2
```
````

### Caching

Speed up your builds by caching execution results:

````md
```python exec="yes" cache="yes"
# Expensive computation
import time
time.sleep(5)
print("Done!")
```
````

Use custom cache IDs for persistence across builds:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force some reason the AI advertises custom IDs as providing persistence across builds throughout the change, while this is of course provided by the caching feature globally. There will be some rewording to do.


````md
```python exec="yes" cache="my-plot"
# Generate plot - will be cached
import matplotlib.pyplot as plt
# ...
```
````

Force cache refresh with `refresh="yes"`:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the utility of this refresh feature. Having to update the code block, build docs, then modify the code block back seems tedious. I'd prefer a solution where an environment variable is used, as for global refreshes. This would require IDs of course, so, to be discussed.

Generally speaking, this feature doesn't seem to come from the requirements identified in #4. It doesn't address what was said there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have provide MARKDOWN_EXEC_CACHE_REFRESH=1 for global control and I think refresh="yes" is necessary for user who are modifying code and just affect a little piece of document. He can quickly modify project source code and just run one code block to see if everything is fine.

If user try to remove cache="true" to let block always run, when he add it back, the cache system will use previous cache result which will lead to mistake.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could check every file under cache dir if it is currently mentioned in build system and and remove all legacy cache files. Thus we can remove the refresh="yes" option. It affect performance a little, but it sounds worth it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawamoy which approach do you prefer? Use refresh="yes" or manage all cache files in cache dir?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stale cache entries must be automatically deleted for sure πŸ™‚


````md
```python exec="yes" cache="my-plot" refresh="yes"
# This will always re-execute
```
````

See [caching documentation](https://pawamoy.github.io/markdown-exec/usage/caching/) for more details.

---

See [usage](https://pawamoy.github.io/markdown-exec/usage/) for more details,
and the [gallery](https://pawamoy.github.io/markdown-exec/gallery/) for more examples!

Expand Down
259 changes: 259 additions & 0 deletions docs/usage/caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
# Caching
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page generally looks good, except for some parts which don't make a lot of sense. It's also a bit too verbose. This would have to be reworked (I will provide more details).


Markdown Exec supports filesystem-based caching of code execution results to speed up documentation builds and development workflows.

## Overview

When generating images, charts, or running expensive computations in your documentation, re-executing the same code on every build can significantly slow down the rendering process. The caching feature allows you to:

- **Speed up builds**: Reuse previously computed results instead of re-executing code
- **Persist across builds**: All cache is stored on the filesystem for cross-build persistence
- **Global cache refresh**: Force refresh of all cached results with a single environment variable

## Cache Storage

All cached results are stored in `.markdown-exec-cache/` in your project root directory:

```sh
your-project/
β”œβ”€β”€ docs/
β”œβ”€β”€ mkdocs.yml
└── .markdown-exec-cache/
β”œβ”€β”€ my-plot.cache # Custom ID cache
└── abc123def456.cache # Hash-based cache files
```

Add this directory to your `.gitignore`:

```gitignore
.markdown-exec-cache/
```

## Usage

### Hash-Based Caching

Enable caching by adding `cache="yes"` to your code block. A hash is computed from the code content and execution options:

````md exec="1" source="tabbed-left" tabs="Markdown|Rendered"
```python exec="yes" cache="yes"
import time
print(f"Executed at: {time.time()}")
```
````

The cache is automatically invalidated when the code or execution options change.

### Custom Cache IDs

For more control, use a custom cache ID (string value). This is useful for expensive operations where you want explicit control over cache invalidation:

````md exec="1" source="tabbed-left" tabs="Markdown|Rendered"
```python exec="yes" cache="my-plot"
import matplotlib.pyplot as plt
# Expensive plot generation...
print("Generated plot")
```
````

The cache file will be stored as `.markdown-exec-cache/my-plot.cache`.

### Cache Invalidation

To force re-execution and update the cache for a specific code block, use `refresh="yes"`:

````markdown
```python exec="yes" cache="my-plot" refresh="yes"
# This will always re-execute and update the cache
print("Fresh execution!")
```
````

!!! note "refresh vs removing cache"
**`refresh="yes"`** forces re-execution but **keeps the cache enabled** - it updates the cached result for future builds.

```bash
**Removing `cache` option** completely disables caching - the code executes every time with no caching at all.

Use `refresh="yes"` when you want to update stale cache but keep caching benefits for subsequent builds.
```

### Global Cache Refresh

To refresh **all** cached results at once, set the `MARKDOWN_EXEC_CACHE_REFRESH` environment variable:

```bash
# Force refresh all caches during build
MARKDOWN_EXEC_CACHE_REFRESH=1 mkdocs build

# Or with other truthy values
MARKDOWN_EXEC_CACHE_REFRESH=yes mkdocs build
MARKDOWN_EXEC_CACHE_REFRESH=true mkdocs build
MARKDOWN_EXEC_CACHE_REFRESH=on mkdocs build
```

This is useful for:

- CI/CD pipelines where you want fresh builds
- Ensuring all documentation is up-to-date
- Debugging cache-related issues

## Clearing Cache

### Delete Specific Cache Entry

Remove the cache file for a specific custom ID:

```bash
rm .markdown-exec-cache/my-custom-id.cache
```

### Clear All Cache

Remove the entire cache directory:

```bash
rm -rf .markdown-exec-cache/
```

## How It Works

1. **Hash Computation**: For `cache="yes"`, a SHA-256 hash is computed from:

- The code content
- Execution options (language, HTML mode, working directory, etc.)

1. **Cache Lookup**: Before execution, the filesystem cache is checked for a matching entry

1. **Execution & Storage**: If no cached result is found:

- Code is executed
- Output is stored in the filesystem cache

1. **Cache Retrieval**: Cached output is used instead of re-executing the code

## Best Practices

### When to Use Caching

βœ… **Good use cases:**

- Generating plots, diagrams, or images
- Running expensive computations
- Calling external APIs or services
- Processing large datasets

❌ **Avoid caching for:**

- Simple print statements
- Code demonstrating output variations
- Time-sensitive or non-deterministic code

### Choosing Cache Type

- **`cache="yes"`** (hash-based):

- Automatically invalidated when code changes
- Great for development and production
- No manual cache management needed

- **`cache="custom-id"`** (custom ID):

- Use for expensive operations where you want explicit control
- Easier to identify and manage specific cache files
- Requires manual invalidation or `refresh="yes"` when code changes

### Cache Invalidation Strategy

**For hash-based caching (`cache="yes"`):**

- Cache is automatically invalidated when code or options change
- No manual intervention needed

**For custom ID caching (`cache="custom-id"`):**

1. **Change the ID** when you want to force re-execution:

```markdown
cache="my-plot-v2" # Changed from my-plot
```

1. **Use refresh temporarily**:

```markdown
cache="my-plot" refresh="yes" # Remove refresh="yes" after update
```

1. **Use global refresh** for all caches:

```bash
MARKDOWN_EXEC_CACHE_REFRESH=1 mkdocs build
```

1. **Clear cache directory** before important builds:

```bash
rm -rf .markdown-exec-cache/
```

## Examples

### Caching a Matplotlib Plot

````markdown
```python exec="yes" html="yes" cache="population-chart"
import matplotlib.pyplot as plt
import io
import base64

# Expensive plot generation
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9])
ax.set_title("Population Growth")

# Save to base64
buffer = io.BytesIO()
plt.savefig(buffer, format='png')
buffer.seek(0)
img_str = base64.b64encode(buffer.read()).decode()
print(f'<img src="data:image/png;base64,{img_str}"/>')
plt.close()
```
````

### Caching API Calls

````markdown
```python exec="yes" cache="github-stars" refresh="no"
import requests
response = requests.get("https://api.github.com/repos/pawamoy/markdown-exec")
stars = response.json()["stargazers_count"]
print(f"⭐ **{stars}** stars on GitHub!")
```
````

## Troubleshooting

### Cache Not Working

1. Ensure the cache directory is writable
1. Check that you're using `cache="yes"` or a custom ID
1. Verify the cache directory exists: `ls -la .markdown-exec-cache/`

### Stale Cache Results

1. Use `refresh="yes"` to force re-execution
1. Delete the specific cache file
1. Clear the entire cache directory

### Large Cache Directory

Cache files accumulate over time. Periodically clean up:

```bash
# See cache directory size
du -sh .markdown-exec-cache/

# Remove all cache files
rm -rf .markdown-exec-cache/
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ nav:
- Pyodide: usage/pyodide.md
- Shell: usage/shell.md
- Tree: usage/tree.md
- Caching: usage/caching.md
- Gallery: gallery.md
- API reference: reference/api.md
- Development:
Expand Down
3 changes: 3 additions & 0 deletions src/markdown_exec/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Utilities to execute code blocks in Markdown files.
"""

from markdown_exec._internal.cache import CacheManager, get_cache_manager
from markdown_exec._internal.formatters.base import (
ExecutionError,
base_format,
Expand All @@ -29,6 +30,7 @@

__all__ = [
"MARKDOWN_EXEC_AUTO",
"CacheManager",
"ExecutionError",
"HeadingReportingTreeprocessor",
"IdPrependingTreeprocessor",
Expand All @@ -43,6 +45,7 @@
"default_tabs",
"formatter",
"formatters",
"get_cache_manager",
"get_logger",
"markdown_config",
"patch_loggers",
Expand Down
Loading
Loading