Discussion:
ferret finds 'tests' but not 'test'
Alastair Moore
2006-09-05 19:51:52 UTC
Permalink
Hello all,

Quick question (possibly!) - I've got a few records indexed and doing a
search for 'test' reports in no hits even though I know the word 'tests'
exists in the indexed field. Doing a search for 'tests' produces a
result. I would have thought that 'test' would match 'tests' but no such
luck!

Thanks,

Alastair
--
Posted via http://www.ruby-forum.com/.
David Balmain
2006-09-06 04:20:03 UTC
Permalink
Post by Alastair Moore
Hello all,
Quick question (possibly!) - I've got a few records indexed and doing a
search for 'test' reports in no hits even though I know the word 'tests'
exists in the indexed field. Doing a search for 'tests' produces a
result. I would have thought that 'test' would match 'tests' but no such
luck!
Thanks,
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;

require 'rubygems'
require 'ferret'

module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)

index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits

Hope that helps,
Dave
Alastair Moore
2006-09-06 12:36:39 UTC
Permalink
Post by David Balmain
Post by Alastair Moore
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
Hi Dave,

Many thanks for the help, it does help! However given the short timespan
for this project, I think the users of the site will just have to be a
bit more specific in their search terms :) Cheers and will bookmark your
reply for a later project.

Alastair
--
Posted via http://www.ruby-forum.com/.
Albert
2006-09-29 12:37:04 UTC
Permalink
Hi there,

Thanks for this useful piece of information! What I'm wondering is how
do stemming on queries as well. My first try was:

query = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)

index.search_each(query) { |doc, score| ... }

But this does not work the way I would expect it to work, i.e., it seems
to deliver empty results independent of the input.

Does anybody have an idea what I'm doing wrong?

Cheers,

Albert
Post by David Balmain
Post by Alastair Moore
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
--
Posted via http://www.ruby-forum.com/.
David Balmain
2006-09-29 14:31:04 UTC
Permalink
Post by Albert
Post by David Balmain
Post by Alastair Moore
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
Hi there,
Thanks for this useful piece of information! What I'm wondering is how
query = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
index.search_each(query) { |doc, score| ... }
But this does not work the way I would expect it to work, i.e., it seems
to deliver empty results independent of the input.
Does anybody have an idea what I'm doing wrong?
Cheers,
Albert
Hi Albert,

Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.

Cheers,
Dave
Albert
2006-09-29 16:45:10 UTC
Permalink
Hi Dave,

Thanks for following up! The StemmingAnalyzer is actually just the
MyAnalyzer from the example above:

module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

I've been trying to find the error but no success. The searching is
done this way:

i = Ferret::Index::Index.new(:path => index)
qp = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new)
query = qp.parse(query_string)
i.search_each(query) { |doc, score| ... }

What I don't get is that search_each(query) never returns a result
whereas when I use the original query string as in

i = Ferret::Index::Index.new(:path => index)
# qp = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new)
# query = qp.parse(query_string)
i.search_each(query_string) { |doc, score| ... }
------------

things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake
...

Cheers,

Albert
Post by David Balmain
Post by Albert
Post by David Balmain
class MyAnalyzer
puts index.search("test").total_hits
Ferret::Analysis::StemmingAnalyzer.new).parse(query_string)
Albert
Hi Albert,
Could you show us your implementation of StemmingAnalyzer as well.
Also, you need to be sure to use the same analyzer for both indexing
and analysis, although I think you already new this.
Cheers,
Dave
--
Posted via http://www.ruby-forum.com/.
David Balmain
2006-09-29 23:00:52 UTC
Permalink
Post by Alastair Moore
Hi Dave,
Thanks for following up! The StemmingAnalyzer is actually just the
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
I've been trying to find the error but no success. The searching is
i = Ferret::Index::Index.new(:path => index)
qp = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new)
query = qp.parse(query_string)
i.search_each(query) { |doc, score| ... }
What I don't get is that search_each(query) never returns a result
whereas when I use the original query string as in
i = Ferret::Index::Index.new(:path => index)
# qp = Ferret::QueryParser.new(:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new)
# query = qp.parse(query_string)
i.search_each(query_string) { |doc, score| ... }
------------
things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake
...
Cheers,
Albert
Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
query. For example:

"content:(ruby rails) title:(ruby rails)"

But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.

query = Ferret::QueryParser.new(
:analyzer => Ferret::Analysis::StemmingAnalyzer.new,
:fields => reader.fields,
:tokenized_fields => reader.tokenized_fields
).parse(query_string)

index.search_each(query) { |doc, score| ... }

Hope that helps,
Dave
Albert
2006-09-30 07:04:02 UTC
Permalink
Hi Dave,

Wonderful! Thanks! I should have taken a deeper look at the
documentation, indeed. Anyway, thanks for your patience!

Cheers,

Al.
Post by David Balmain
Post by Albert
end
i.search_each(query) { |doc, score| ... }
things work as expected (modulo the stemmming, of course). So, it may
be that I fundamentally misunderstand something or make a stupid mistake
...
Cheers,
Albert
Sorry, I must have been tired last night. The problem is obvious to me
now. You need to set the :fields parameter. The above query parser
should work as long as you explicitly specify all fields in your
"content:(ruby rails) title:(ruby rails)"
But if you want to search all fields by default then you need to tell
the QueryParser what fields exist. The Index class will handle all of
this for you including using the same analyzer as is used during
indexing. It looks like you are using the Index class for your
searches so why not just leave the query parsing to it. Otherwise you
can get the fields from the reader.
query = Ferret::QueryParser.new(
:analyzer => Ferret::Analysis::StemmingAnalyzer.new,
:fields => reader.fields,
:tokenized_fields => reader.tokenized_fields
).parse(query_string)
index.search_each(query) { |doc, score| ... }
Hope that helps,
Dave
--
Posted via http://www.ruby-forum.com/.
anrake
2006-10-09 09:19:20 UTC
Permalink
Hi, if I use this stemming analyzer, where do I put it ? /lib/ and
require it in each model?

-Anrake
Post by David Balmain
Post by Alastair Moore
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
--
Posted via http://www.ruby-forum.com/.
Ghost
2006-10-26 20:06:55 UTC
Permalink
Post by anrake
Hi, if I use this stemming analyzer, where do I put it ? /lib/ and
require it in each model?
-Anrake
Can someone give



Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.

Thanks for your patience.
Post by anrake
Post by David Balmain
Post by Alastair Moore
Alastair
The default analyzer doesn't perform any stemming. You need to create
your own analyzer with a stemmer. Something like this;
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)
index << "test"
index << "tests debate debater debating the for,"
puts index.search("test").total_hits
Hope that helps,
Dave
--
Posted via http://www.ruby-forum.com/.
Andreas Korth
2006-10-26 21:36:19 UTC
Permalink
Post by Ghost
Can someone give me an idiots guide as to how to implement this custom
stemming analyser. I do not know where to start.
1. Create the analyzer as David outlined it and name the file
"my_analyzer.rb". If you put it in /app/models you don't need any
require statements since every .rb file in /app/models gets
automagically 'required' by Rails.
Post by Ghost
# file: app/models/my_analyzer.rb
require 'rubygems'
require 'ferret'
module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end
2. When you create an Index instance, pass it your analyzer, like so:

index = Ferret::I.new(:analyzer => Ferret::Analysis::MyAnalyzer.new)


3. Test your analyzer, e.g.

index << "walking"
index << "walked"
index << "walks"

index.search("walk").total_hits # -> 3
Post by Ghost
Thanks for your patience.
You're welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?

Kind regards,
Andreas
Ghost
2006-10-27 09:58:34 UTC
Permalink
Hi I'm still having trouble with this. Probably something stupid but
here goes.

I'm using ferret version 0.13 and aaf.

I created this file in my app/models directory


require 'ferret'
include Ferret

module Ferret::Analysis
class MyAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end

naming it my_analyzer.rb as directed.

and then in my ferret model i have the following declarion.

acts_as_ferret :fields=> ['short_description'],:analyzer =>
Ferret::Analysis::MyAnalyzer.new
Post by Andreas Korth
VoObject.rebuild_index
NameError: uninitialized constant MyAnalyzer
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:123:in
`const_missing'
from script/../config/../config/../app/models/vo_object.rb:14
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:140:in
`load'
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:56:in
`require_or_load'
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:30:in
`depend_on'
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:85:in
`require_dependency'
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:98:in
`const_missing'
from
/usr/lib/ruby/gems/1.8/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:131:in
`const_missing'
from (irb):11
Nasty eh?

Any idea what is going on here? Why can't my VoObject model see the new
analyzer?
Thanks again.
Post by Andreas Korth
You're welcome. And may I kindly ask you to use a valid email address
and perhaps your real name for future posts?
I used to post with a valid email address. But then the number of spam
messages i recieved went from 1 or 2 a week to 50-60 a day. Ruby Forum
used to print the email addresses on the page. Heres a comprimise.

Regards
Caspar
--
Posted via http://www.ruby-forum.com/.
Andreas Korth
2006-10-27 12:44:05 UTC
Permalink
Hi Caspar,
Post by Ghost
Hi I'm still having trouble with this. Probably something stupid but
here goes.
I created this file in my app/models directory
naming it my_analyzer.rb as directed.
VoObject.rebuild_index
NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.

Cheers,
Andy
Adam Thorsen
2007-03-05 04:13:24 UTC
Permalink
Post by Andreas Korth
Hi Caspar,
Post by Ghost
NameError: uninitialized constant MyAnalyzer
Sorry, I forgot to mention that the directory structure needs to
resemble the module nesting, i.e. the file must go in app/models/
ferret/analysis instead of just app/models.
Cheers,
Andy
I've been trying to use the solution for stemming discussed in this
thread and have run into a bit of trouble.

I'm using this analyzer:

module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
StemFilter.new(StandardTokenizer.new(text))
end
end
end


I've configured aaf thusly:

AAF_DEFAULT_FERRET_OPTIONS = {:analyzer =>
Ferret::Analysis::StemmingAnalyzer.new}


acts_as_ferret({:store_class_name => true,
:fields => {:description => {:store =>
:yes}}}.merge(AAF_DEFAULT_OPTIONS),
AAF_DEFAULT_FERRET_OPTIONS)


The first time I search for something a new index is created in index,
and it successfully returns a set of results. The second time I search,
however, I get a strange error:

uninitialized constant Ferret::Search

#{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:264:in
`load_missing_constant'
#{RAILS_ROOT}/vendor/rails/activesupport/lib/active_support/dependencies.rb:453:in
`const_missing'
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:160:in
`query_for_record'
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:152:in
`document_number'
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:135:in
`highlight'
/opt/local/lib/ruby/1.8/monitor.rb:238:in `synchronize'
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/local_index.rb:134:in
`highlight'
#{RAILS_ROOT}/vendor/plugins/acts_as_ferret/lib/instance_methods.rb:30:in
`highlight'

Perhaps it has something to do with loading an already created index?

Thanks,
-Adam
--
Posted via http://www.ruby-forum.com/.
Max Williams
2009-04-09 14:34:31 UTC
Permalink
This is just postscript correction for this thread, in case anyone else
browses to it (like i did) and gets sent down the slightly wrong track.

If you're going to include the :analyzer option in your call to
acts_as_ferret, then it needs to live inside another option hash called
:ferret. EG, some of the examples above say to do this:

acts_as_ferret :fields=> ['short_description'],
:analyzer => Ferret::Analysis::MyAnalyzer.new

This won't work - it needs to be like this:

acts_as_ferret :fields=> ['short_description'],
:ferret => {:analyzer =>
Ferret::Analysis::MyAnalyzer.new}

Thanks to Jens for setting me straight on this :)
--
Posted via http://www.ruby-forum.com/.
Clare
2006-09-30 08:50:08 UTC
Permalink
Post by Alastair Moore
Hello all,
Quick question (possibly!) - I've got a few records indexed and doing a
search for 'test' reports in no hits even though I know the word 'tests'
exists in the indexed field. Doing a search for 'tests' produces a
result. I would have thought that 'test' would match 'tests' but no such
luck!
Thanks,
Alastair
Alastair - if you only want to find the plural of something and not the
full stem of words then ROR has a plurisation capability. It will take
test and bring back all the plurals or take tests and bring back the
singulars. You can then search on all these words. It is not a full
stemmer but in some circumstances perhaps this may be all that you are
wanting to do.

One thing to watch that caught us out was that as standard
pluralistation of words with two 'ss' at the end does not work properly.
For example, "glass" would come back as "glas" from the pluralizer.
There is a simple fix that is in the ROR forum that covers all this off.

I would only use the ror pluraliser if all you are looking to do is
bring back plurals of words and are not interested in the full stemming
of the words. For example, if you do a search on "tax" full stemming
should also search on "taxes" and "taxation". Pluralise would not search
on "taxation".

Hope this helps.

Clare
--
Posted via http://www.ruby-forum.com/.
Loading...