|
1 | | -# Python Redlines |
| 1 | +# Python-Redlines: Docx Redlines (Tracked Changes) for the Python Ecosystem |
2 | 2 |
|
3 | | -[](https://pypi.org/project/python-redlines) |
4 | | -[](https://pypi.org/project/python-redlines) |
| 3 | +## Project Goal - Democratizing DOCX Comparisons |
5 | 4 |
|
6 | | ------ |
| 5 | +The main goal of this project is to address the significant gap in the open-source ecosystem around `.docx` document |
| 6 | +comparison tools. Currently, the process of comparing and generating redline documents (documents that highlight |
| 7 | +changes between versions) is complex and largely dominated by commercial software. These |
| 8 | +tools, while effective, often come with cost barriers and limitations in terms of accessibility and integration |
| 9 | +flexibility. |
7 | 10 |
|
8 | | -**Table of Contents** |
| 11 | +`Python-redlines` aims to democratize the ability to run tracked change redlines for .docx, providing the |
| 12 | +open-source community with a tool to create `.docx` redlines without the need for commercial software. This will let |
| 13 | +more legal hackers and hobbyist innovators experiment and create tooling for enterprise and legal. |
9 | 14 |
|
10 | | -- [Installation](#installation) |
11 | | -- [License](#license) |
| 15 | +## Project Roadmap |
12 | 16 |
|
13 | | -## Installation |
| 17 | +### Step 1. Open-XML-PowerTools `WmlComparer` Wrapper |
14 | 18 |
|
15 | | -```console |
16 | | -pip install python-redlines |
17 | | -``` |
| 19 | +The [Open-XML-PowerTools](https://github.com/OpenXmlDev/Open-Xml-PowerTools) project historically offered a solid |
| 20 | +foundation for working with `.docx` files and has an excellent (if imperfect) comparison engine in its `WmlComparer` |
| 21 | +class. However, Microsoft archived the repository almost five years ago, and a forked repo is not being actively |
| 22 | +maintained, as its most recent commits dates from 2 years ago and the repo issues list is disabled. |
18 | 23 |
|
19 | | -## License |
| 24 | +As a first step, our project aims to bring the existing capabilities of WmlCompare into the Python world. Thankfully, |
| 25 | +XML Power Tools is full cross-platform as it is written in .NET and compiles with the still-maintained .NET 8. The |
| 26 | +resulting binaries can be compiled for the latest versions of Windows and Linux (Ubuntu specifically, though other |
| 27 | +distributions should work fine too). |
20 | 28 |
|
21 | | -`python-redlines` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license. |
| 29 | +The initial release has a single engine `XmlPowerToolsEngine`, which is just a Python wrapper for a simple C# utility |
| 30 | +written to leverage WmlComparer for 1-to-1 redlines. We hope this provides a stop-gap capability to Python developers |
| 31 | +seeking .docx redline capabilities. |
22 | 32 |
|
23 | | -Usage: |
| 33 | +**Note**, we don't plan to fork or maintain Open-XML-PowerTools. [Version 4.4.0](https://www.nuget.org/packages/Open-Xml-PowerTools/), |
| 34 | +which appears to only be compatible with [Open XML SDK < 3.0.0](https://www.nuget.org/packages/DocumentFormat.OpenXml) works |
| 35 | +for now, it needs to be made compatible with the latest versions of the Open XML SDK to extend its life. There are |
| 36 | +also some [issues](https://github.com/dotnet/Open-XML-SDK/issues/1634) it seems the only maintainer of |
| 37 | +Open-XML-PowerTools probably won't fix, and understanding the existing code base is no small task. |
24 | 38 |
|
25 | | -Setup project and create .net project in dir |
26 | | -``` |
27 | | -dotnet new console |
28 | | -``` |
| 39 | +### Step 2. Pure Python Comparison Engine |
29 | 40 |
|
30 | | -Build app |
| 41 | +Looking towards the future, rather than reverse engineer `WmlComparer` and maintain a C# codebase, we envision a |
| 42 | +comparison engine written in python. We've done some experimentation with [`xmldiff`](https://github.com/Shoobx/xmldiff) |
| 43 | +as the engine to compare the underlying xml of docx files. Specifically, we've built a prototype to unzip `.docx` files, |
| 44 | +execute an xml comparison using `xmldiff`, and then reconstructed a tracked changes docx with the proper Open XML |
| 45 | +(ooxml) tracked change tags. Preliminary experimentation with this approach has shown promise, indicating its |
| 46 | +feasibility for handling modifications such as simple span inserts and deletes. |
31 | 47 |
|
32 | | -``` |
33 | | -dotnet build |
34 | | -``` |
| 48 | +However, this ambitious endeavor is not without its challenges. The intricacies of `.docx` files and the potential for |
| 49 | +complex, corner-case scenarios necessitate a thoughtful and thorough development process. In the interim, `WmlComparer` |
| 50 | +is a great solution as it has clearly been built to account for many such corner cases, through a development process |
| 51 | +that clearly was influenced by issues discovered by a large user base. The XMLDiff engine will take some time to reach |
| 52 | +a level of maturity similar to WmlComparer. At the moment it is NOT included. |
35 | 53 |
|
36 | | -Run via dotnet console |
37 | | -``` |
38 | | -dotnet run scrudato@umich.edu "/home/jman/Downloads/NVCA-Model-Document-Certificate-of-Incorporation.docx" "/home/jman/Downloads/Modified.docx" "/home/jman/Downloads/redline.docx" |
39 | | -``` |
| 54 | +## Getting started |
40 | 55 |
|
41 | | -Package for linux |
42 | | -``` |
43 | | -dotnet publish -c release -r linux-x64 --self-contained |
44 | | -``` |
| 56 | +If you just want to use the tool, jump into our [quickstart guide](docs/quickstart.md). |
45 | 57 |
|
46 | | -Package for windows |
47 | | -``` |
48 | | -dotnet public -c release -r win-x64 --self-contained |
49 | | -``` |
| 58 | +## Architecture Overview |
50 | 59 |
|
51 | | -"Install" on Linux |
| 60 | +`XmlPowerToolsEngine` is a Python wrapper class for the `redlines` C# command-line tool, source of which is available in |
| 61 | +[./csproj/Program.cs](./csproj/Program.cs). The redlines utility and wrapper let you compare two docx files and |
| 62 | +show the differences in tracked changes (a "redline" document). |
52 | 63 |
|
53 | | -1. Install .net for Linux (whatever distro you're using) |
54 | | -2. Copy the Release contents from linux-x64 folder |
55 | | -3. run `chmod 777 ./redlines` |
56 | | -4. Then you can run `./redlines original_path.docx modified_path.docx` |
| 64 | +### C# Functionality |
57 | 65 |
|
58 | | -More complete package and install directions: |
| 66 | +The `redlines` C# utility is a command line tool that requires four arguments: |
| 67 | +1. `author_tag` - A tag to identify the author of the changes. |
| 68 | +2. `original_path.docx` - Path to the original document. |
| 69 | +3. `modified_path.docx` - Path to the modified document. |
| 70 | +4. `redline_path.docx` - Path where the redlined document will be saved. |
59 | 71 |
|
60 | | -https://ttu.github.io/dotnet-core-self-contained-deployments/ |
| 72 | +The Python wrapper, `XmlPowerToolsEngine` and its main method `run_redlines()`, simplifies the use of `redlines` by |
| 73 | +orchestrating its execution with Python and letting you pass in bytes or file paths for the original and modified |
| 74 | +documents. |
61 | 75 |
|
62 | | -Archive and zip the release folder contents as .tar.gz, then distribute the archive. User install instructions would be: |
| 76 | +### Packaging |
63 | 77 |
|
64 | | -```commandline |
65 | | -$ mkdir redlines && cd redlines |
66 | | -$ wget <github release url> |
67 | | -$ tar -zxvf redlines-linux-x64.tar.gz |
68 | | -$ chmod +x redlines |
69 | | -$ ./redlines <author_tag> <original_path.docx> <modified_path.docx> <redline_path.docx> |
| 78 | +The project is structured as follows: |
| 79 | +``` |
| 80 | +python-redlines/ |
| 81 | +│ |
| 82 | +├── csproj/ |
| 83 | +│ ├── bin/ |
| 84 | +│ ├── obj/ |
| 85 | +│ ├── Program.cs |
| 86 | +│ ├── redlines.csproj |
| 87 | +│ └── redlines.sln |
| 88 | +│ |
| 89 | +├── docs/ |
| 90 | +│ ├── developer-guide.md |
| 91 | +│ └── quickstart.md |
| 92 | +│ |
| 93 | +├── src/ |
| 94 | +│ └── python_redlines/ |
| 95 | +│ ├── bin/ |
| 96 | +│ │ └── .gitignore |
| 97 | +│ ├── dist/ |
| 98 | +│ │ ├── .gitignore |
| 99 | +│ │ ├── linux-x64-0.0.1.tar.gz |
| 100 | +│ │ └── win-x64-0.0.1.zip |
| 101 | +│ ├── __about__.py |
| 102 | +│ ├── __init__.py |
| 103 | +│ └── engines.py |
| 104 | +│ |
| 105 | +├── tests/ |
| 106 | +| ├── fixtures/ |
| 107 | +| ├── test_openxml_differ.py |
| 108 | +| └── __init__.py |
| 109 | +| |
| 110 | +├── .gitignore |
| 111 | +├── build_differ.py |
| 112 | +├── extract_version.py |
| 113 | +├── License.md |
| 114 | +├── pyproject.toml |
| 115 | +└── README.md |
70 | 116 | ``` |
71 | 117 |
|
72 | | -"Install" on Windows |
| 118 | +- `src/your_package/`: Contains the Python wrapper code. |
| 119 | +- `dist/`: Contains the zipped C# binaries for different platforms. |
| 120 | +- `bin/`: Target directory for extracted binaries. |
| 121 | +- `tests/`: Contains test cases and fixtures for the wrapper. |
| 122 | + |
| 123 | +### Detailed Explanation and Dev Setup |
| 124 | + |
| 125 | +If you want to contribute to the library or want to dive into some of the C# packaging architecture, go to our |
| 126 | +[developer guide](docs/developer-guide.md). |
| 127 | + |
| 128 | +## Additional Information |
| 129 | + |
| 130 | +- **Contributing**: Contributions to the project should follow the established coding and documentation standards. |
| 131 | +- **Issues and Support**: For issues, feature requests, or support, please use the project's issue tracker on GitHub. |
| 132 | + |
| 133 | +## License |
73 | 134 |
|
74 | | -TODO - run through this and document |
| 135 | +MIT |
0 commit comments