Max Williams
2009-11-26 11:13:12 UTC
I'm using a custom stem analyser in my searches and my indexing. The
analyser is defined thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
text.downcase!
RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text
= #{text.inspect}"
tokenizer = StandardTokenizer.new(text)
filter = StemFilter.new(tokenizer)
filter
end
end
end
I use it in my indexing like this:
acts_as_ferret({ :store_class_name => true,
:ferret => { :analyzer =>
Ferret::Analysis::StemmingAnalyzer.new },
:fields => {:property_names => { :boost => 3.0 },
....etc
}})
And in a search like this:
search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer
=> Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
r_id = r_id.to_i
ferret_ids << r_id
self.scores_hash[r_id] = score
end
I have a problem with case sensitivity - basically, searches only work when
they are lowercase: even when it looks like the text stored in the index is
uppercase. From the console -
Woodwind Instrumental and Vocal Image Resources Types" }
=> false
=> true
I think i have my stemming set up wrong, i'm not sure if it is even being
used. I implemented it so that searches allowed pluralised and singular
terms, and that seems to work, eg
=> true
But the case sensitivity thing has me stumped. I thought that the downcase!
call on the search term would make case irrelevant for searching but that
seems not to be the case. Can anyone set me straight?
analyser is defined thus:
module Ferret::Analysis
class StemmingAnalyzer
def token_stream(field, text)
text.downcase!
RAILS_DEFAULT_LOGGER.debug "SEARCHING, field = #{field.inspect}, text
= #{text.inspect}"
tokenizer = StandardTokenizer.new(text)
filter = StemFilter.new(tokenizer)
filter
end
end
end
I use it in my indexing like this:
acts_as_ferret({ :store_class_name => true,
:ferret => { :analyzer =>
Ferret::Analysis::StemmingAnalyzer.new },
:fields => {:property_names => { :boost => 3.0 },
....etc
}})
And in a search like this:
search_class.find_ids_with_ferret(search_term, {:limit => 10000, :analyzer
=> Ferret::Analysis::StemmingAnalyzer.new}) do |model, r_id, score|
r_id = r_id.to_i
ferret_ids << r_id
self.scores_hash[r_id] = score
end
I have a problem with case sensitivity - basically, searches only work when
they are lowercase: even when it looks like the text stored in the index is
uppercase. From the console -
resource.to_doc
=> {:resource_id=>"59", :property_names=>"Bb Clarinet Clarinet FamilyWoodwind Instrumental and Vocal Image Resources Types" }
TeachingObject.find_with_ferret("Vocal", :page => 1, :per_page =>
1000).include?(resource)=> false
TeachingObject.find_with_ferret("vocal", :page => 1, :per_page =>
1000).include?(resource)=> true
I think i have my stemming set up wrong, i'm not sure if it is even being
used. I implemented it so that searches allowed pluralised and singular
terms, and that seems to work, eg
TeachingObject.find_with_ferret("vocals", :page => 1, :per_page =>
1000).include?(resource)=> true
But the case sensitivity thing has me stumped. I thought that the downcase!
call on the search term would make case irrelevant for searching but that
seems not to be the case. Can anyone set me straight?