Skip to content

Commit 612a74c

Browse files
committed
Added great readme docs and quick start.
1 parent 3612c15 commit 612a74c

File tree

14 files changed

+322
-86
lines changed

14 files changed

+322
-86
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ csproj/obj/*
1414
.Python
1515
build/
1616
develop-eggs/
17-
dist/
17+
src/python_redlines/dist/
1818
downloads/
1919
eggs/
2020
.eggs/
File renamed without changes.

README.md

Lines changed: 111 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,135 @@
1-
# Python Redlines
1+
# Python-Redlines: Docx Redlines (Tracked Changes) for the Python Ecosystem
22

3-
[![PyPI - Version](https://img.shields.io/pypi/v/python-redlines.svg)](https://pypi.org/project/python-redlines)
4-
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/python-redlines.svg)](https://pypi.org/project/python-redlines)
3+
## Project Goal - Democratizing DOCX Comparisons
54

6-
-----
5+
The main goal of this project is to address the significant gap in the open-source ecosystem around `.docx` document
6+
comparison tools. Currently, the process of comparing and generating redline documents (documents that highlight
7+
changes between versions) is complex and largely dominated by commercial software. These
8+
tools, while effective, often come with cost barriers and limitations in terms of accessibility and integration
9+
flexibility.
710

8-
**Table of Contents**
11+
`Python-redlines` aims to democratize the ability to run tracked change redlines for .docx, providing the
12+
open-source community with a tool to create `.docx` redlines without the need for commercial software. This will let
13+
more legal hackers and hobbyist innovators experiment and create tooling for enterprise and legal.
914

10-
- [Installation](#installation)
11-
- [License](#license)
15+
## Project Roadmap
1216

13-
## Installation
17+
### Step 1. Open-XML-PowerTools `WmlComparer` Wrapper
1418

15-
```console
16-
pip install python-redlines
17-
```
19+
The [Open-XML-PowerTools](https://github.com/OpenXmlDev/Open-Xml-PowerTools) project historically offered a solid
20+
foundation for working with `.docx` files and has an excellent (if imperfect) comparison engine in its `WmlComparer`
21+
class. However, Microsoft archived the repository almost five years ago, and a forked repo is not being actively
22+
maintained, as its most recent commits dates from 2 years ago and the repo issues list is disabled.
1823

19-
## License
24+
As a first step, our project aims to bring the existing capabilities of WmlCompare into the Python world. Thankfully,
25+
XML Power Tools is full cross-platform as it is written in .NET and compiles with the still-maintained .NET 8. The
26+
resulting binaries can be compiled for the latest versions of Windows and Linux (Ubuntu specifically, though other
27+
distributions should work fine too).
2028

21-
`python-redlines` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
29+
The initial release has a single engine `XmlPowerToolsEngine`, which is just a Python wrapper for a simple C# utility
30+
written to leverage WmlComparer for 1-to-1 redlines. We hope this provides a stop-gap capability to Python developers
31+
seeking .docx redline capabilities.
2232

23-
Usage:
33+
**Note**, we don't plan to fork or maintain Open-XML-PowerTools. [Version 4.4.0](https://www.nuget.org/packages/Open-Xml-PowerTools/),
34+
which appears to only be compatible with [Open XML SDK < 3.0.0](https://www.nuget.org/packages/DocumentFormat.OpenXml) works
35+
for now, it needs to be made compatible with the latest versions of the Open XML SDK to extend its life. There are
36+
also some [issues](https://github.com/dotnet/Open-XML-SDK/issues/1634) it seems the only maintainer of
37+
Open-XML-PowerTools probably won't fix, and understanding the existing code base is no small task.
2438

25-
Setup project and create .net project in dir
26-
```
27-
dotnet new console
28-
```
39+
### Step 2. Pure Python Comparison Engine
2940

30-
Build app
41+
Looking towards the future, rather than reverse engineer `WmlComparer` and maintain a C# codebase, we envision a
42+
comparison engine written in python. We've done some experimentation with [`xmldiff`](https://github.com/Shoobx/xmldiff)
43+
as the engine to compare the underlying xml of docx files. Specifically, we've built a prototype to unzip `.docx` files,
44+
execute an xml comparison using `xmldiff`, and then reconstructed a tracked changes docx with the proper Open XML
45+
(ooxml) tracked change tags. Preliminary experimentation with this approach has shown promise, indicating its
46+
feasibility for handling modifications such as simple span inserts and deletes.
3147

32-
```
33-
dotnet build
34-
```
48+
However, this ambitious endeavor is not without its challenges. The intricacies of `.docx` files and the potential for
49+
complex, corner-case scenarios necessitate a thoughtful and thorough development process. In the interim, `WmlComparer`
50+
is a great solution as it has clearly been built to account for many such corner cases, through a development process
51+
that clearly was influenced by issues discovered by a large user base. The XMLDiff engine will take some time to reach
52+
a level of maturity similar to WmlComparer. At the moment it is NOT included.
3553

36-
Run via dotnet console
37-
```
38-
dotnet run scrudato@umich.edu "/home/jman/Downloads/NVCA-Model-Document-Certificate-of-Incorporation.docx" "/home/jman/Downloads/Modified.docx" "/home/jman/Downloads/redline.docx"
39-
```
54+
## Getting started
4055

41-
Package for linux
42-
```
43-
dotnet publish -c release -r linux-x64 --self-contained
44-
```
56+
If you just want to use the tool, jump into our [quickstart guide](docs/quickstart.md).
4557

46-
Package for windows
47-
```
48-
dotnet public -c release -r win-x64 --self-contained
49-
```
58+
## Architecture Overview
5059

51-
"Install" on Linux
60+
`XmlPowerToolsEngine` is a Python wrapper class for the `redlines` C# command-line tool, source of which is available in
61+
[./csproj/Program.cs](./csproj/Program.cs). The redlines utility and wrapper let you compare two docx files and
62+
show the differences in tracked changes (a "redline" document).
5263

53-
1. Install .net for Linux (whatever distro you're using)
54-
2. Copy the Release contents from linux-x64 folder
55-
3. run `chmod 777 ./redlines`
56-
4. Then you can run `./redlines original_path.docx modified_path.docx`
64+
### C# Functionality
5765

58-
More complete package and install directions:
66+
The `redlines` C# utility is a command line tool that requires four arguments:
67+
1. `author_tag` - A tag to identify the author of the changes.
68+
2. `original_path.docx` - Path to the original document.
69+
3. `modified_path.docx` - Path to the modified document.
70+
4. `redline_path.docx` - Path where the redlined document will be saved.
5971

60-
https://ttu.github.io/dotnet-core-self-contained-deployments/
72+
The Python wrapper, `XmlPowerToolsEngine` and its main method `run_redlines()`, simplifies the use of `redlines` by
73+
orchestrating its execution with Python and letting you pass in bytes or file paths for the original and modified
74+
documents.
6175

62-
Archive and zip the release folder contents as .tar.gz, then distribute the archive. User install instructions would be:
76+
### Packaging
6377

64-
```commandline
65-
$ mkdir redlines && cd redlines
66-
$ wget <github release url>
67-
$ tar -zxvf redlines-linux-x64.tar.gz
68-
$ chmod +x redlines
69-
$ ./redlines <author_tag> <original_path.docx> <modified_path.docx> <redline_path.docx>
78+
The project is structured as follows:
79+
```
80+
python-redlines/
81+
82+
├── csproj/
83+
│ ├── bin/
84+
│ ├── obj/
85+
│ ├── Program.cs
86+
│ ├── redlines.csproj
87+
│ └── redlines.sln
88+
89+
├── docs/
90+
│ ├── developer-guide.md
91+
│ └── quickstart.md
92+
93+
├── src/
94+
│ └── python_redlines/
95+
│ ├── bin/
96+
│ │ └── .gitignore
97+
│ ├── dist/
98+
│ │ ├── .gitignore
99+
│ │ ├── linux-x64-0.0.1.tar.gz
100+
│ │ └── win-x64-0.0.1.zip
101+
│ ├── __about__.py
102+
│ ├── __init__.py
103+
│ └── engines.py
104+
105+
├── tests/
106+
| ├── fixtures/
107+
| ├── test_openxml_differ.py
108+
| └── __init__.py
109+
|
110+
├── .gitignore
111+
├── build_differ.py
112+
├── extract_version.py
113+
├── License.md
114+
├── pyproject.toml
115+
└── README.md
70116
```
71117

72-
"Install" on Windows
118+
- `src/your_package/`: Contains the Python wrapper code.
119+
- `dist/`: Contains the zipped C# binaries for different platforms.
120+
- `bin/`: Target directory for extracted binaries.
121+
- `tests/`: Contains test cases and fixtures for the wrapper.
122+
123+
### Detailed Explanation and Dev Setup
124+
125+
If you want to contribute to the library or want to dive into some of the C# packaging architecture, go to our
126+
[developer guide](docs/developer-guide.md).
127+
128+
## Additional Information
129+
130+
- **Contributing**: Contributions to the project should follow the established coding and documentation standards.
131+
- **Issues and Support**: For issues, feature requests, or support, please use the project's issue tracker on GitHub.
132+
133+
## License
73134

74-
TODO - run through this and document
135+
MIT

build_differ.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,11 +54,11 @@ def main():
5454

5555
# Compress the Linux build
5656
linux_build_dir = './csproj/bin/Release/net8.0/linux-x64'
57-
compress_files(linux_build_dir, f"./src/python_redlines/bin/linux-x64-{version}.tar.gz")
57+
compress_files(linux_build_dir, f"./src/python_redlines/dist/linux-x64-{version}.tar.gz")
5858

5959
# Compress the Windows build
6060
windows_build_dir = './csproj/bin/Release/net8.0/win-x64'
61-
compress_files(windows_build_dir, f"./src/python_redlines/bin/win-x64-{version}.zip")
61+
compress_files(windows_build_dir, f"./src/python_redlines/dist/win-x64-{version}.zip")
6262

6363
print("Build and compression complete.")
6464

csproj/Program.cs

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,19 +29,16 @@ static void Main(string[] args)
2929
var originalBytes = File.ReadAllBytes(originalFilePath);
3030
var modifiedBytes = File.ReadAllBytes(modifiedFilePath);
3131
var originalDocument = new WmlDocument(originalFilePath, originalBytes);
32-
Console.WriteLine(originalDocument);
3332
var modifiedDocument = new WmlDocument(modifiedFilePath, modifiedBytes);
34-
Console.WriteLine(modifiedDocument);
33+
3534
var comparisonSettings = new WmlComparerSettings
3635
{
3736
AuthorForRevisions = authorTag,
3837
DetailThreshold = 0
3938
};
4039

4140
var comparisonResults = WmlComparer.Compare(originalDocument, modifiedDocument, comparisonSettings);
42-
Console.WriteLine(comparisonResults);
4341
var revisions = WmlComparer.GetRevisions(comparisonResults, comparisonSettings);
44-
Console.WriteLine(revisions);
4542

4643
// Output results
4744
Console.WriteLine($"Revisions found: {revisions.Count}");

docs/developer-guide.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Developer Guide for RedlinesWrapper
2+
3+
## Prerequisites
4+
5+
- Python 3.7 or higher installed
6+
- .NET SDK for building C# binaries or .NET Runtime to run them
7+
- Hatch for Python environment and package management
8+
9+
## Setting Up the Project
10+
11+
### Step 1: Clone the Repository
12+
13+
Clone the RedlinesWrapper repository to your local
14+
15+
machine. Use Git to clone the repository using the following command:
16+
17+
```bash
18+
git clone [URL_OF_YOUR_REPOSITORY]
19+
cd [REPOSITORY_NAME]
20+
```
21+
22+
Replace `[URL_OF_YOUR_REPOSITORY]` with the actual URL of your repository and `[REPOSITORY_NAME]` with the name of the directory where your repository is cloned.
23+
24+
### Step 2: Install Hatch
25+
26+
If Hatch is not already installed, install it using pip:
27+
28+
```bash
29+
pip install hatch hatchling
30+
```
31+
32+
### Step 3: Create and Activate the Virtual Environment
33+
34+
Inside the project directory, create a virtual environment using Hatch:
35+
36+
```bash
37+
hatch env create
38+
```
39+
40+
Activate the virtual environment:
41+
42+
```bash
43+
hatch shell
44+
```
45+
46+
### Step 4: Install Dependencies
47+
48+
Install the necessary Python dependencies:
49+
50+
```bash
51+
pip install .[dev]
52+
```
53+
54+
## Building the C# Binaries
55+
56+
You can use the binaries distributed with the project, or, if you want to build new binaries for some reason, you can
57+
use our build script, integrated as a hatch tool.
58+
59+
```bash
60+
hatch run build
61+
```
62+
63+
### Under the Hood
64+
65+
We're just using dotnet to build binaries for [Program.cs](csproj/Program.cs), a command line utility that exposes
66+
`WmlComparer's` redlining capabilities. We are currently target win-x64 and linux-x64 builds, but any runtime
67+
[supported by .NET](https://learn.microsoft.com/en-us/dotnet/core/rid-catalog) is theoretically supported.
68+
69+
**Our build script does the following:**
70+
71+
1. Build a binary for Linux:
72+
73+
```bash
74+
dotnet publish -c Release -r linux-x64 --self-contained
75+
```
76+
77+
2. Build a binary for Windows:
78+
79+
```bash
80+
dotnet publish -c Release -r win-x64 --self-contained
81+
```
82+
83+
3. Archive and package binaries into `./dist/`:
84+
85+
86+
## Running Tests
87+
88+
To ensure everything is set up correctly and working as expected, run the tests included in the `tests/` directory.
89+
90+
### Step 1: Navigate to the Test Directory
91+
92+
Change to the `tests/` directory in your project:
93+
94+
```bash
95+
cd path/to/tests
96+
```
97+
98+
### Step 2: Run the Tests
99+
100+
Execute the tests using pytest:
101+
102+
```bash
103+
pytest
104+
```
105+
106+
This will run all the test cases defined in your test files.
107+
108+
## Conclusion
109+
110+
You've now set up the RedlinesWrapper project, built the necessary C# binaries, and learned how to use the Python wrapper to compare `.docx` files. Running the tests ensures that your setup is correct and the wrapper functions as expected.
111+
112+
---
113+
114+
Feel free to modify this Quickstart Guide to better fit the specifics of your project, such as the exact commands for building the C# binaries or the directory structure of your project. This guide provides a general framework to get users started with the RedlinesWrapper.

docs/quickstart.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Python-Redlines Quickstart
2+
3+
As discussed in the main README, the initial version is a wrapper for the C# api provided by Open-XML-PowerTools and
4+
`WmlComparer`. This readme will show you how to use the XmlPowerToolsEngine to run a redline.
5+
6+
### Step 1: Import and Initialize the Wrapper
7+
8+
In your Python script or interactive session, import and initialize the wrapper:
9+
10+
```python
11+
from python_redlines.engines import XmlPowerToolsEngine
12+
13+
wrapper = XmlPowerToolsEngine()
14+
```
15+
16+
### Step 2: Run Redlines
17+
18+
Use the `run_redlines` method to compare documents. You can pass the paths of the `.docx` files or their byte content:
19+
20+
```python
21+
# Example with file paths
22+
output = wrapper.run_redlines('AuthorTag', '/path/to/original.docx', '/path/to/modified.docx')
23+
24+
# Example with byte content
25+
with open('/path/to/original.docx', 'rb') as f:
26+
original_bytes = f.read()
27+
with open('/path/to/modified.docx', 'rb') as f:
28+
modified_bytes = f.read()
29+
30+
output = wrapper.run_redlines('AuthorTag', original_bytes, modified_bytes)
31+
```
32+
33+
In both cases, `output` will contain the byte content of the resulting redline - a .docx with changes in tracked
34+
changes.
35+
36+
### Step 3: Handle the Output
37+
38+
Process or save the output as needed. For example, to save the redline output to a file:
39+
40+
```python
41+
with open('/path/to/redline_output.docx', 'wb') as f:
42+
f.write(output)
43+
```

src/python_redlines/bin/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
!*

linux-x64-0.0.1.tar.gz renamed to src/python_redlines/dist/linux-x64-0.0.1.tar.gz

65.7 MB
Binary file not shown.

0 commit comments

Comments
 (0)