-
Notifications
You must be signed in to change notification settings - Fork 179
feat: Add HTML representation #2236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @flying-sheep, @Zethson, @ivirshup! Hope you're having a wonderful holiday season! Just a quick update: I've merged settylab/anndata#3 into this PR to keep everything in one place. That brought in the exported building blocks for packages like SpatialData/MuData, the No rush at all with the holidays! Whenever you have a moment, I'd appreciate any feedback on the direction. Happy New Year! |
|
I feel like lazy loading might become more common (especially as datasets and number of modalities grow larger). So, I decided to take a pragmatic approach for the HTML repr rather than showing no category information at all. The trade-off: For
Why not avoid all loading? Showing just "(50 categories)" is much less useful than seeing the actual category names with color swatches. The cost of reading a few category strings is small compared to the value of the preview. Implementation: We access Visual examples: See tests 8b (partial loading) and 8c (metadata-only mode) in the live demo. |
Rich HTML representation for AnnData
Summary
Implements rich HTML representation (
_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)
Screenshot
Features
Interactive Display
.rawsection showing unprocessed data (Reportn_varsof.rawin__repr__#349)Visual Indicators
unspalettes (e.g.,cell_type_colors)unsvaluesuns["README"])Serialization Warnings
Proactively warns about data that won't serialize:
/Compatibility
.anndata-reprprevents style conflictsread_lazy()(categories, colors)Extensibility
Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):
obst/vart,mod)See the Reviewer's Guide for examples and API documentation.
Testing
python tests/visual_inspect_repr_html.pyRelated
scipyinheritance #1927 (sparse scipy changes), feat: array-api compatibility #2063 (Array-API)Acknowledgments
Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.
Technical Notes and Edits
Lazy Loading
Constants are in
_repr_constants.py(outside_repr/) to prevent loading ~6K lines onimport anndata. The full module loads only when_repr_html_()is called.Config Changes
pyproject.toml: Addedvartto codespell ignore list (TreeData section name).Edit (Dec 27, 2024)
To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.
What changed:
.rawsection - Expandable row showing unprocessed data (Reportn_varsof.rawin__repr__#349)Edit (Jan 4, 2025)
Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.
Code refactoring:
html.pyinto focused modules for maintainabilitycomponents.py(badges, buttons, icons)sections.py(obs/var, mapping, uns, raw)core.py(avoids circular imports)utils.pyFormatterContextconsolidates all 6 rendering settings (read once at entry, propagated via context)html.pyreduced from ~2100 to ~740 lines, clean import hierarchyNew features:
read_lazy()AnnData objects (experimental) - indicates when obs/var are xarray-backed(lazy)indicator on columnsBug fixes:
adata-text-mutedclass for uniform appearanceRelated issue discovered:
read_lazy()returns index values as byte-representation strings (e.g.,"b'cell_0'"instead of"cell_0") - seeISSUE_READ_LAZY_INDEX.mdEdit (Jan 6, 2025)
Smart partial loading for
read_lazy()AnnData:Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting
repr_html_max_lazy_categories(default: 100, set to 0 for metadata-only mode).Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).