Showing better highlighted search result fragments with Elasticsearch

Elasticsearch has a pretty awesome highlighting feature, but it comes with a major deficiency. When it truncates your document/string, it gives you no indication that it has done so.

Take a look at this screen shot

As you can see the text (bolded behind the dropdown of results) is truncated in the results in the dropdown itself, but there’s no indication that is what has happened.

Doesn’t seem like a big deal, but for the perfectionists and craftsmen out there, this has to make you itch right? How is someone to know that there is more to that fragment of text that what they’re seeing?

Well, heres some ruby code to the rescue. Throw it in a helper and call it in your view or wherever

1
2
3
4
5
6
7
8
9
10
11
12
  def ellipses_for_highlights(params_highlight, params_original)
    # have to do this because highlighted stuff from ES has a trailing space for whatever reason
    stripped_highlighted_item = strip_tags(params_highlight).rstrip
    # if the beginning of the highlighted text doesn't match the original it has been clipped
    tmp = params_original =~ /#{stripped_highlighted_item}/
    front_ellipsis = tmp != 0
    # if the last 10 characters of the highlighted text don't match the original, same deal
    back_ellipsis = last_string_chars(stripped_highlighted_item, 10) != last_string_chars(params_original, 10)
 
    highlighted_item = front_ellipsis ? "... " + params_highlight : params_highlight
    highlighted_item = back_ellipsis ? highlighted_item + " ..." : highlighted_item
  end

Link to github gist here

to use this, just pass in the highlighted string from elasticsearch and the original string for comparison.
so something like this

1
    ellipses_for_highlights(item.highlight.name.first, item.name)

and you’ll get something like this

It will only truncate on the front or back of the string if elasticsearch only truncated at that spot, in addition to truncating on both ends if it realizes that elasticsearch did too. Better, right?

Couple of things to note.
– This will only work cleanly if you have  :term_vector set to “with_postions_offsets” in your mapping. This enables elasticsearch break the fragment on words vs truncating in the middle of a word. If you have it turned off (i.e you’re just using the plain highlighter), you’ll get something that looks more like this (notice how the truncation is happening in the middle of words)

– Also keep in mind that because of the behavior explained above when using term_vectors in your highlighting, the fragment_size will not match the number you specify exactly, makes sense (because it has to break on a word which can have be any number of characters in it), but its not described anywhere

On craftsmanship and turning customers into fans

“Show me a man who cannot bother to do little things and I’ll show you a man who cannot be trusted to do big things.”
— Lawrence Bell

This morning, while I was getting ready for work, I picked my French Soccer jersey from a pile of clothes I had laundered last week and I noticed this, right behind where the Crest for the French Football Federation (facing outwards) sits on the chest of the jersey …

Turns out it  means “our differences unite us“.
Tres inspirer.

The quote sits right above a player’s heart while they have the jersey on (presumably striving for their country at the same time), and it hits so close to home on the issues the French have had historically, dealing with diversity, that I imagine a player would have a hard time forgetting it once they’ve seen it. Its also put in a place where jersey makers typically expend the least effort (Adidas is notorious for this, some of their replica jerseys are so uncomfortable because the insides are horribly tailored). Its the perfect blend of sentimentality and craftsmanship.

Needless to say, I’m a little bit more of a Nike and French Soccer team fan than I was yesterday, and all because of a silly little thing like an inspiring message on the inside of a soccer jersey.

It got me thinking about a Steve Jobs anecdote I had read a while ago.

In an interview a few years later, after the Macintosh came out, Jobs again reiterated that lesson from his father: “When you’re a carpenter making a beautiful chest of drawers, you’re not going to use a piece of plywood on the back, even though it faces the wall and nobody will ever see it. You’ll know it’s there, so you’re going to use a beautiful piece of wood in the back. For you to sleep well at night, the aesthetic, the quality, has to be carried all the way through.”

How does this apply to software?

The little things matter. The craftsmanship of your product, the care you show on tiny things like 404 pages, error messages, and page layouts looking perfect in all browsers … or even unseen things like formatting your code beautifully, or leaving instructive comments become part of the dna of your creation; that intangible signal of ‘high quality’ and sentiment that draws people in without them knowing why.

Its just the sort of thing that might turn a customer into a fan.

PS: A fantastic place to see brilliant little design touches, pertaining to software is Little Big Details.