-
Notifications
You must be signed in to change notification settings - Fork 0
DOC: A few web scraping resources #1
Copy link
Copy link
Open
Description
westurner
opened on Aug 24, 2018
Issue body actions
- https://en.wikipedia.org/wiki/Web_scraping
- https://en.wikipedia.org/wiki/Linked_data
- https://github.com/lorien/awesome-web-scraping/blob/master/python.md
- https://github.com/vinta/awesome-python#web-content-extracting
- https://github.com/vinta/awesome-python#web-crawling--web-scraping
- https://github.com/kennethreitz/requests-html
- https://github.com/miyakogi/pyppeteer (headless chrome)
- https://github.com/microsoft/playwright-python (headless chromium, webkit, firefox)
- https://github.com/Psycojoker/ipython-beautifulsoup
- Note the XSS protections
- https://github.com/mozilla/bleach
- https://github.com/tiran/defusedxml
- https://github.com/scrapinghub/extruct
- RDFa (RDF in HTML attributes)
- https://en.wikipedia.org/wiki/RDFa
- https://schema.org/docs/full.html
- Facebook OpenGraph https://ogp.me
- Microdata
- Microformats
- JSON-LD
- https://en.wikipedia.org/wiki/JSON-LD
- RDFa (RDF in HTML attributes)
- https://github.com/CodeForAntarctica/codeforantarctica.github.io/pull/3
- Structured Data, Linked Data
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels