Skip to content

LaborChronicle/crawling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Labor Chronicle Crawler

Overview

The Labor Chronicle Crawler is a component of the Labor Chronicle application, designed to gather labor-related news articles from news platforms and labor organizations. This crawler is specifically tuned to identify and retrieve news content that pertains to labor issues.

Purpose

This crawler is developed to support public knowledge related to labor rights and developments. It helps consolidate news from various sources into a single, accessible platform, making it easier for users to stay informed about significant labor-related issues.

Operational Details

  • Target Content: The crawler is programmed to search for and retrieve articles that explicitly relate to labor topics. It uses predefined keywords and categories (such as "labor rights," "unions," "wages," "employment law") to filter content during the crawling process.
  • Frequency: To minimize server load and respect the website's bandwidth, the crawler operates once daily during off-peak hours.
  • User-Agent String: The crawler identifies itself with: LaborChronicleCrawler/1.0 (+https://github.com/LaborChronicle/crawling)

Compliance with robots.txt

  • Adherence to Directives: This crawler strictly adheres to the directives outlined in the robots.txt files of all target websites. It is configured to respect all Disallow and Allow rules to ensure compliance with each site's policy on automated access.
  • Respect for Site Architecture: The crawler is designed to navigate and parse websites without causing undue strain or impact on their operational performance.

Contact Information

For any inquiries, feedback, or concerns about the Labor Chronicle Crawler, please contact:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages