Discussion:
Crawler for Ferret
Huang, Zijian(Victor)
2009-03-19 21:32:44 UTC
Permalink
Hi, guys:
Can you please recommend a good crawler for Ferret? Nutch is pretty
powerful in the Java side, do we have some thing is similar in Ruby? It
will be great if the crawler also handlers incremental index update
easily.

Thanks

Victor
Jens Krämer
2009-03-19 22:17:52 UTC
Permalink
Post by Huang, Zijian(Victor)
Can you please recommend a good crawler for Ferret? Nutch is
pretty powerful in the Java side, do we have some thing is similar
in Ruby? It will be great if the crawler also handlers incremental
index update easily.
RDig can do http crawling, but cannot really be compared with Nutch
feature- and performance wise as it was designed for intranet use, say
indexing the web pages of a few hosts.


Cheers,
Jens


--
Jens Krämer
webit! Gesellschaft für neue Medien mbH
Schnorrstraße 76 | 01069 Dresden
Telefon +49351467660 | Telefax +493514676666
kraemer-jv+***@public.gmane.org | www.webit.de

Amtsgericht Dresden | HRB 15422
GF Sven Haubold
Timothy Goddard
2009-03-19 22:38:54 UTC
Permalink
I wrote one called Suckr.

http://goddard.net.nz/projects/suckr/

It does the crawling, including incremental update and provides a command line
search interface. I've had some periodic stability issues with this on the old
Debian box I've been using it on myself - please test thoroughly.

It has some documentation in the README file. Please let me know if you have
any questions.

Cheers,

Tim
Post by Huang, Zijian(Victor)
Can you please recommend a good crawler for Ferret? Nutch is pretty
powerful in the Java side, do we have some thing is similar in Ruby? It
will be great if the crawler also handlers incremental index update
easily.
Thanks
Victor
Hugh Sasse
2009-03-20 12:34:37 UTC
Permalink
Post by Huang, Zijian(Victor)
Can you please recommend a good crawler for Ferret? Nutch is pretty
powerful in the Java side, do we have some thing is similar in Ruby? It
will be great if the crawler also handlers incremental index update
easily.
And then this shows up in my news feeds:

http://www.rubyinside.com/building-a-search-engine-in-200ish-lines-of-ruby-1655.html

I've not followed the links off it, though, so YMMV.
Post by Huang, Zijian(Victor)
Thanks
Victor
Hugh

Loading...