Professional Documents
Culture Documents
Web
Research conducted
on Web Crawling,
Crawling
open source
frameworks across
languages
Web Crawler
(Known
JAVA
PYTHON
RUBY
PHP
C# , C++ , CROSS PLATFORMS
1.PYTHON Based
APCHE NUTCH
SCRAPY
KIMONO
SCRAPING HUB
IMPORT.IO
GRUB
2.JAVA BASED
WEBCOLLECTOR
CRAWLER4J
EX-CRAWLER
BIXO
WEB-HARVEST
JOBO
ARACHNID
SMART AND SIMPLE WEB CRAWLER
WEBLECH
CAPEK
GRUNK
LARM
ARALE
SPINDLE
METIS
APETURE
HOUNDER
WEB EATER
ANDJING
PYCREEP
LUCENE
3.PHP BASED
SPHIDER
OPEN WEB SPIDER
4.RUBY
ANEMONE
CLOUD-CRAWLER
4.C# , C++ AND CROSS PLATFORM
DATAPARK SEARCH
GNU WGET
GRU
HT://DIG
HTTRACK
ICDL CRAWLER
MNO GO SEARCH
OPEN SOURCE SERVER
ASPSEEK
HYPER ES TRAILER
OPEN WEB SPIDER
PAVUK
XAPIAN
ARACHNODE.NET
CRAWWWLER
OPESE
CCRAWLER
CONCLUSION :
Python is highly used across crawling
Reason:
Most efficient, highly distributed
The requests library is very powerful while being extremely
simple to use. Python also has a great inbuilt html/xml parser in
LXML - An alternative to LXML is Beautiful Soup.
A scripting language like Python/Perl offers excellent text
processing abilities in the form of regular expressions and low
Bibliography :
www.quora.com
http://stackoverflow.com/questions/5555930/is-there-any-javascript-web-crawler-framework
http://forums.udacity.com/questions/19039/java-vs-python-forwriting-a-web-crawler
http://en.wikipedia.org/wiki/Web_crawler
https://www.coursera.org/
www.google.com
http://opendata-tools.org/en/data/
http://www.garethjames.net/a-guide-to-web-scrapping-tools/