Skip to content

Support Incremental Parsing for Streaming Markdown Input/支持流式 Markdown 输入的增量解析 #414

@huihui4045

Description

@huihui4045

English

Background
Currently, commonmark-java parses Markdown via Parser.parse(String markdown), which always processes the entire input from scratch and rebuilds the AST.
In streaming scenarios (e.g., live Markdown preview, collaborative editing, chat messages), new content is appended continuously. Re-parsing the whole document each time is inefficient and can cause performance issues for large documents.

Request
I would like to request incremental parsing support — the ability to parse only the newly added Markdown content and update the existing AST without re-parsing unchanged parts.

Possible Approach

  • Maintain a parsing context (ParserContext) that stores:
    • Current open block parsers
    • AST root node
    • Current line number / position
  • Add a method like:
    Node parseIncremental(String newMarkdown, ParserContext context);
    which:
    • Resumes parsing from the saved context
    • Continues open blocks if possible (canContinue() logic)
    • Creates new blocks for new content
    • Parses inline elements only for new blocks
  • Update the AST in place, avoiding full rebuild.

Benefits

  • Significant performance improvement for streaming or real-time editing scenarios.
  • Reduced memory and CPU usage for large documents.
  • Enables more responsive applications using commonmark-java.

Example Use Case

Parser parser = Parser.builder().build();
ParserContext context = new ParserContext();

Node doc = parser.parseIncremental(initialMarkdown, context);
doc = parser.parseIncremental(newMarkdownChunk, context);

Would you consider adding this feature or providing hooks to implement it externally?


中文

背景
目前,commonmark-java 通过 Parser.parse(String markdown) 解析 Markdown,每次都会从头处理整个输入并重建 AST。
在流式场景(例如实时 Markdown 预览、协同编辑、聊天消息)中,内容会不断追加。每次都全量解析会导致性能浪费,尤其是文档较大时。

需求
希望能支持 增量解析 —— 只解析新增的 Markdown 内容,并在已有 AST 基础上更新,而不重新解析未变化的部分。

可能的实现思路

  • 维护一个解析上下文(ParserContext),保存:
    • 当前未闭合的块解析器
    • AST 根节点
    • 当前行号 / 位置
  • 增加一个方法,例如:
    Node parseIncremental(String newMarkdown, ParserContext context);
    该方法:
    • 从保存的上下文恢复解析状态
    • 如果可能,继续旧块(依赖 canContinue() 逻辑)
    • 为新增内容创建新块
    • 仅对新增块进行行内解析
  • 在原 AST 上直接更新,避免全量重建。

好处

  • 在流式或实时编辑场景下显著提升性能。
  • 降低大文档的内存和 CPU 消耗。
  • 让基于 commonmark-java 的应用更加流畅、响应更快。

使用示例

Parser parser = Parser.builder().build();
ParserContext context = new ParserContext();

Node doc = parser.parseIncremental(initialMarkdown, context);
doc = parser.parseIncremental(newMarkdownChunk, context);

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions