-
Notifications
You must be signed in to change notification settings - Fork 678
Description
Some very short background: I am trying to create a pymupdf backend for the amazing emacs pdf-tools package. Currently, pdf-tools uses a server written in C using the 'poppler' pdf-library. Creating a server using 'pymupdf' would make it much more hackable, and would make it much easier to add features like line annotations, pdf forms support etc. (I have already added a dirty hack for pymupdf support in the pymupdf-mode package, but that is no clean solution).
I have the pymupdf-server/backend already working b.t.w. (including annotations, but not yet for annotating text in reading order)
Now to get to the point: I would like to be able to select text regions in reading order (over multiple lines). Checking out the pdf-tools epdfinfo server, I have found that poppler 'simply' provides a function (poppler_page_get_selected_region) for this, but I have found no equivalent in pymupdf and neither in your pymupdf-utilities.
However, checking out the mupdf-gl reader (part of the mupdf library) annotation capabilities, I have found that that reader perfectly supports selecting text in reading order, so I would guess that the mupdf library somehow also already provides a function for this (which would probably be much faster than any hack written in python). Therefore, I was wondering if you are already aware of this, and if it would be possible to include this functionality in pymupdf (I hope you could determine the responsible function yourself, as I am no c-developer, but I would be happy to investigate it for you). It would be great if this would be possible (I figured you might be interested yourself also). Additionally, if pdf-tools would switch to the pymupdf library, this would be a great showcase for pymupdf as pdf-tools is really one of the most beautiful and most powerful, and wiht pymupdf for sure the most hackable, pdf readers available.
Here follow two screenshots of the functionality I would like to see, the first is of Emacs with pdf-tools using the current epdfinfo server, the second is of the 'mupdf' library its own mupdf-gl reader (in both images, the active areas are obtained using a single selection/mouse-drag). I only need to obtain the rectangles, coloring the active area I simply achieve (already) using the Pillow pdf library.

PDF-TOOLS
Thank you!
