Crawls websites and saves found URLs to a file.
Install Node.js and run npm install in ./crawler.
There are 2 required CLI arguments:
- First argument: domain to crawl
- Second argument: path to the file where the URLs should be saved
And 2 optional CLI arguments:
- Third argument: connection count limit. Default is
15 - Fourth argument: redirect count limit. Default is
15.
For example, if you want to crawl example.com and save found URLs to ./test.txt, run the following command:
node ./index.js example.com test.txtUse Wget: wget --input-file=CHANGE_THIS --warc-file="warc" --force-directories --tries=10