-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
- https://en.wikipedia.org/wiki/Web_scraping
- https://en.wikipedia.org/wiki/Linked_data
- https://github.com/lorien/awesome-web-scraping/blob/master/python.md
- https://github.com/vinta/awesome-python#web-content-extracting
- https://github.com/vinta/awesome-python#web-crawling--web-scraping
- https://github.com/kennethreitz/requests-html
- https://github.com/miyakogi/pyppeteer (headless chrome)
- https://github.com/microsoft/playwright-python (headless chromium, webkit, firefox)
- https://github.com/Psycojoker/ipython-beautifulsoup
- https://github.com/mozilla/bleach
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes
- https://github.com/tiran/defusedxml
The results of an attack on a vulnerable XML library can be fairly dramatic. With just a few hundred Bytes of XML data an attacker can occupy several Gigabytes of memory within seconds. An attacker can also keep CPUs busy for a long time with a small to medium size request. Under some circumstances it is even possible to access local files on your server, to circumvent a firewall, or to abuse services to rebound attacks to third parties.
- https://github.com/scrapinghub/extruct
- RDFa (RDF in HTML attributes)
- https://en.wikipedia.org/wiki/RDFa
- https://schema.org/docs/full.html
- Facebook OpenGraph https://ogp.me
- Microdata
- Microformats
- JSON-LD
- RDFa (RDF in HTML attributes)
- https://github.com/CodeForAntarctica/codeforantarctica.github.io/pull/3
- Structured Data, Linked Data
Metadata
Metadata
Assignees
Labels
No labels