Crawler-OpenSource-List

返回目录

Crawler OpenSource List | 爬虫开源框架索引

Framework

Node

Python

  • Photon #Project#: Incredibly fast crawler which extracts urls, emails, files, website accounts and much more.

  • Gerapy #Project#: Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js.

Golang

  • 2015-go_spider #Project#: An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only.

  • 2017-Colly #Project#: Lightning Fast and Elegant Scraping Framework for Gophers.

  • 2018-ferret #Project#: ferret is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics.

Java

  • Crawler4j #Project#: crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes.

Content Analysis | 内容分析