Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter’s for custom implementations e.g. Apache Tika for parsing.
Apache Nutch is a extensible and scalable open source web crawler software project.Nutch provides extensible interfaces such as Parse, Index and ScoringFilter’s for custom implementations e.g. Apache Tika for parsing.
Customer Reviews
Narendra A.
Advanced user of Apache NutchWhen I used apache Nutch I was amazed with the speed it crawls data and the libraries and data structures provided to customise your crawling and reading the data in desired format. I was crawling the whole IBM data to get the insights and do text analytics on it. The kind of support I got from the forums was also great. So overall it was nice experience using apache Nutch crawler.
What I disliked was the video support it provides in the Internet.
It's nice to use and provides lots of flexibility.
I was solving the problem in my organisation for data analytics. Where we automate the whole process of bidding with text analytics.