Crawl a microformats2 site to find things like canonical URLs for h-entrys
Note: this module does not really handle pages with more than one top-level microformats2 nodes.
npm install crawl-mf2
Start a crawl and log canonical h-entry URLs found on https://strugee.net/blog/:
var crawl = require('crawl-mf2');
var crawler = crawl('https://strugee.net/blog/');
crawler.on('h-entry', function(url, mf2node) {
console.log(url);
});The module exports a single function, crawlMf2, which takes a single argument, the base URL to crawl from.
It returns an EventEmitter.
Emitted when an error occurs. Currently this means either the microformats2 parser failed or an HTTP error occurred.
Note: treated specially by Node.js.
StringThe URL being discovered
Emitted when a new URL is discovered, including the initial base URL.
StringThe URL being parsedObjectThe parsed microformats2 node, returned bymicroformat-node's.get()
Emitted when a URL is parsed for microformats2 markup.
StringThe URL containing theh-feedObjectThe parsed microformats2 node, returned bymicroformat-node's.get()
Emitted when an h-feed page is discovered.
StringThe URL containing theh-entryObjectThe parsed microformats2 node, returned bymicroformat-node's.get()
Emitted when an h-entry page is discovered.
LGPL 3.0+
AJ Jordan alex@strugee.net