This script automates the process of collecting and downloading documentation pages from an Atlassian Confluence space. It was tested on the following URL:
https://dokumentacja-inpost.atlassian.net/wiki/spaces/PL/overview
- Recursively collects all subpage links from a given Confluence space.
- Downloads each page as a Word document (.doc or .docx) using Confluence's export functionality.
- Verifies which documents have been downloaded and logs their status.
- Exports a CSV file (
confluence_links.csv) with all found links and their download status. - Supports retrying downloads for failed pages.
- Logs all actions and errors to
log.txt.
- Python 3.7+
- Google Chrome browser
- ChromeDriver (compatible with your Chrome version)
- Selenium
- python-dotenv
Install dependencies with:
pip install -r requirements.txt
- Place your
.envfile in the project directory if you want to override the default Confluence URL. Example:CONFLUENCE_BASE_URL=https://your-confluence-url/wiki/spaces/PL/overview - Run the script:
To retry only failed downloads:
python script.pypython script.py --retry-failed
- Downloaded Word documents are saved in the
downloads/directory. - All found links and their download status are saved in
confluence_links.csv. - Logs are written to
log.txt.
- The script uses Selenium in headless mode to automate Chrome.
- It expands all navigation menus to ensure all subpages are found.
- The script was tested on the InPost Confluence documentation space (see URL above).
- Make sure you have access rights to the Confluence space for exporting documents.
- Ensure ChromeDriver is installed and matches your Chrome version.
- If downloads do not start, check your permissions and network connection.
- Review
log.txtfor detailed error messages.
See LICENSE.txt for license information.