A tool to fetch and verify the existence of endpoints from the Wayback Machine API. This tool has been updated to support Python 3.
- Pull URLs from Wayback Machine
- Check those URLs and provide URL, status code, response length, content-type, and redirect URL
- Multithreaded
- Load and output to/from file
- Accepts stdin as input
- Optimizations made around old domains and timeouts
With the transition to Python 3, there have been a few changes:
- The
printstatement now requires parentheses. So, instead ofprint r.text.strip(), useprint(r.text.strip()). - The
urlparsemodule has been renamed tourllib.parse. So, if you're using Python 3, replaceimport urlparsewithfrom urllib.parse import urlparse.
Here's the updated line in your code:
from urllib.parse import urlparseFetch URLs:
$ python3 waybacktool.py pull --host example.com
http://example.com/example.html
https://example.com/testing.js
https://example.com/test.css Check URLs:
$ python3 waybacktool.py pull --host example.com | python waybacktool.py check
http://example.com/example.html, 200, 1024, text/html
https://example.com/testing.js, 301, 58, text/plain; charset=utf-8, https://example.com/testing1234.js
https://example.com/test.css, 403, 103, text/htmlThe design allows you to apply grep transformations to the output of the fetch URLs. For instance, the following is a valid usage:
$ python3 waybacktool.py pull --host example.com | grep html | python waybacktool.py check
http://example.com/example.html, 200, 1024, text/htmlEnjoy using this tool!