Skip to content

Conversation

@katosh
Copy link

@katosh katosh commented Nov 29, 2025

Rich HTML representation for AnnData

Summary

Implements rich HTML representation (_repr_html_) for AnnData objects in Jupyter notebooks. Builds on previous draft PRs (#784, #694, #521, #346) with a complete, production-ready implementation.

Live Demo | Reviewer's Guide (technical details, design decisions, extensibility examples)

Screenshot

screenshot2

Features

Interactive Display

  • Foldable sections with auto-collapse for large datasets
  • Search/filter with regex and case-sensitive toggles
  • Copy-to-clipboard for field names
  • Nested AnnData expansion with configurable depth
  • .raw section showing unprocessed data (Report n_vars of .raw in __repr__ #349)

Visual Indicators

  • Category colors from uns palettes (e.g., cell_type_colors)
  • Type badges for views, backed mode, sparse matrices, Dask arrays
  • Serialization warnings for data that won't write to H5AD/Zarr
  • Value previews for simple uns values
  • README support via modal (renders markdown from uns["README"])
  • Memory info in footer

Serialization Warnings

Proactively warns about data that won't serialize:

Level Issue Related
🔴 Error datetime64/timedelta64 #455, #2238
🔴 Error Non-string keys #321
🟡 Warning Keys with / #1447, #2099
🟡 Warning Object columns with dicts/lists #1923, #567, #636
🟡 Warning String columns auto-converted to categorical #534, #926

Compatibility

  • Dark mode auto-detection (Jupyter Lab/VS Code)
  • No-JS fallback with graceful degradation
  • JupyterLab safe - CSS scoped to .anndata-repr prevents style conflicts
  • Lazy-loading safe - configurable partial loading for read_lazy() (categories, colors)
  • Zero dependencies added

Extensibility

Three extension mechanisms for ecosystem packages (MuData, SpatialData, TreeData):

  1. TypeFormatter - Custom visualization for value types
  2. SectionFormatter - Add new sections (e.g., obst/vart, mod)
  3. Building blocks - CSS/JS/helpers for packages needing full control

See the Reviewer's Guide for examples and API documentation.

Testing

  • 303 unit tests with ~92% coverage
  • Visual test cases: python tests/visual_inspect_repr_html.py

Related

Acknowledgments

Thanks to @selmanozleyen (#784), @gtca (#694), @VolkerH (#521), @ivirshup (#346, #675), and @Zethson (#675) for prior work and discussions.


Technical Notes and Edits

Lazy Loading

Constants are in _repr_constants.py (outside _repr/) to prevent loading ~6K lines on import anndata. The full module loads only when _repr_html_() is called.

Config Changes

pyproject.toml: Added vart to codespell ignore list (TreeData section name).


Edit (Dec 27, 2024)

To simplify review and reduce the diff, I've merged settylab/anndata#3 into this PR. That PR was originally created as a follow-up to explore additional features based on the discussion with @Zethson about SpatialData/MuData extensibility.

What changed:

  • Exported building blocks - CSS, JavaScript, and rendering helpers for external packages to build custom reprs while reusing anndata's styling
  • .raw section - Expandable row showing unprocessed data (Report n_vars of .raw in __repr__ #349)
  • Enhanced serialization warnings - Extended to cover datetime64, non-string keys, slashes in keys, and all sections
  • Regex search - Case-sensitive and regex toggles for filtering
  • Robust error handling - Failed sections show visible error indicators instead of being silently hidden

Edit (Jan 4, 2025)

Moved detailed implementation documentation (architecture, design decisions, extensibility examples, configuration reference) to the Reviewer's Guide to keep this PR description focused on features.

Code refactoring:

  • Split html.py into focused modules for maintainability
  • UI components extracted to components.py (badges, buttons, icons)
  • Section renderers moved to sections.py (obs/var, mapping, uns, raw)
  • Shared rendering primitives extracted to core.py (avoids circular imports)
  • Preview utilities moved to utils.py
  • FormatterContext consolidates all 6 rendering settings (read once at entry, propagated via context)
  • Result: html.py reduced from ~2100 to ~740 lines, clean import hierarchy

New features:

  • "Lazy" badge for read_lazy() AnnData objects (experimental) - indicates when obs/var are xarray-backed
  • Visual test for lazy AnnData (9b) - demonstrates lazy loading with (lazy) indicator on columns

Bug fixes:

  • Consistent meta column styling - all meta column text now uses adata-text-muted class for uniform appearance
  • Bytes index decoding - properly decode bytes values in index previews

Related issue discovered:

  • read_lazy() returns index values as byte-representation strings (e.g., "b'cell_0'" instead of "cell_0") - see ISSUE_READ_LAZY_INDEX.md

Edit (Jan 6, 2025)

Smart partial loading for read_lazy() AnnData:

Previously, lazy AnnData showed no category previews to avoid disk I/O. Now we do minimal, configurable loading to get richer visualization cheaply: only the first N category labels and their colors are read from storage (not the full column data). New setting repr_html_max_lazy_categories (default: 100, set to 0 for metadata-only mode).

Visual tests reorganized: 8 (Dask), 8b (lazy categories), 8c (metadata-only), 9 (backed).

@katosh
Copy link
Author

katosh commented Dec 27, 2025

Hi @flying-sheep, @Zethson, @ivirshup! Hope you're having a wonderful holiday season!

Just a quick update: I've merged settylab/anndata#3 into this PR to keep everything in one place. That brought in the exported building blocks for packages like SpatialData/MuData, the .raw section, enhanced serialization warnings, and a few other improvements. I've updated the PR description above to reflect all these changes.

No rush at all with the holidays! Whenever you have a moment, I'd appreciate any feedback on the direction. Happy New Year!

@flying-sheep flying-sheep changed the title Add HTML representation feat: Add HTML representation Jan 5, 2026
@katosh
Copy link
Author

katosh commented Jan 6, 2026

I feel like lazy loading might become more common (especially as datasets and number of modalities grow larger). So, I decided to take a pragmatic approach for the HTML repr rather than showing no category information at all.

The trade-off: For read_lazy() AnnData, we now do minimal, configurable partial loading to get richer category previews:

  • Only the first N category labels are read from storage (not the full column data or codes)
  • Only the corresponding N colors from .uns are loaded
  • Controlled via ad.settings.repr_html_max_lazy_categories (default: 100, set to 0 for zero disk I/O)

Why not avoid all loading? Showing just "(50 categories)" is much less useful than seeing the actual category names with color swatches. The cost of reading a few category strings is small compared to the value of the preview.

Implementation: We access CategoricalArray._categories directly and use read_elem_partial() to read only what we need. This bypasses the @cached_property that would load all categories. See the design decision in the reviewer's guide for details.

Visual examples: See tests 8b (partial loading) and 8c (metadata-only mode) in the live demo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTML Repr

3 participants