Controlling Elasticsearch memory usage on os x

By default, Elasticsearch will use between 256MB and 1GB of memory. On my dev environment, memory is a premium and I don’t really have a lot of data to index and play with so I’d prefer this to be closer to 256m

To change this, navigate to

(where x.xx.x is the Elasticsarch version number)
open up

then replace the memory size in the following lines

if [ "x$ES_MIN_MEM" = "x" ]; then
if [ "x$ES_MAX_MEM" = "x" ]; then

It is recommended that you set both ES_MIN_MEM and ES_MAX_MEM to the max size you want so that Elasticsearch doesn’t have to pause to resize memory, which can affect performance.

from the comments in that file …

min and max heap sizes should be set to the same value to avoid
stop-the-world GC pauses during resize, and so that we can lock the
heap in memory on startup to prevent any of it from being swapped

Showing better highlighted search result fragments with Elasticsearch

Elasticsearch has a pretty awesome highlighting feature, but it comes with a major deficiency. When it truncates your document/string, it gives you no indication that it has done so.

Take a look at this screen shot

As you can see the text (bolded behind the dropdown of results) is truncated in the results in the dropdown itself, but there’s no indication that is what has happened.

Doesn’t seem like a big deal, but for the perfectionists and craftsmen out there, this has to make you itch right? How is someone to know that there is more to that fragment of text that what they’re seeing?

Well, heres some ruby code to the rescue. Throw it in a helper and call it in your view or wherever

  def ellipses_for_highlights(params_highlight, params_original)
    # have to do this because highlighted stuff from ES has a trailing space for whatever reason
    stripped_highlighted_item = strip_tags(params_highlight).rstrip
    # if the beginning of the highlighted text doesn't match the original it has been clipped
    tmp = params_original =~ /#{stripped_highlighted_item}/
    front_ellipsis = tmp != 0
    # if the last 10 characters of the highlighted text don't match the original, same deal
    back_ellipsis = last_string_chars(stripped_highlighted_item, 10) != last_string_chars(params_original, 10)
    highlighted_item = front_ellipsis ? "... " + params_highlight : params_highlight
    highlighted_item = back_ellipsis ? highlighted_item + " ..." : highlighted_item

Link to github gist here

to use this, just pass in the highlighted string from elasticsearch and the original string for comparison.
so something like this


and you’ll get something like this

It will only truncate on the front or back of the string if elasticsearch only truncated at that spot, in addition to truncating on both ends if it realizes that elasticsearch did too. Better, right?

Couple of things to note.
– This will only work cleanly if you have  :term_vector set to “with_postions_offsets” in your mapping. This enables elasticsearch break the fragment on words vs truncating in the middle of a word. If you have it turned off (i.e you’re just using the plain highlighter), you’ll get something that looks more like this (notice how the truncation is happening in the middle of words)

– Also keep in mind that because of the behavior explained above when using term_vectors in your highlighting, the fragment_size will not match the number you specify exactly, makes sense (because it has to break on a word which can have be any number of characters in it), but its not described anywhere

Adding Autocomplete using elasticsearch

A commonly-requested feature in search applications is autocomplete or search suggestions. The basic idea is to give users instant feedback as they’re typing. Implementations of this feature can vary — sometimes the suggestions are based on popular queries (e.g., Google’s Autocomplete), other times they may be previews of results (e.g., Google Instant). The suggestions may be relatively simple, or they can be extremely complex, taking into account things like the user’s search history, generally popular queries, top results, trending topics, spelling mistakes, etc. Building the latter can consume the resources of a company the size of Google, but it’s relatively easy to add simple results-based autocomplete to an existing elasticsearch search application.


How to run multiple elasticsearch nodes on one machine

By default elasticsearch runs assuming a one machine, one node setup (You specify node data in elasticsearch/config/elasticsearch.yml), so what happens if you want to run multiple nodes on one box, say, you want to play with multiple nodes on your dev machine?

The easy answer is that you could create multiple elasticsearch.yml files (elasticsearch.0.yml, elasticsearch.1.yml etc etc) and then start each instance from the command line referencing the new config files.

For example
usr/local/bin/elasticsearch -fD es.config=/usr/local/Cellar/elasticsearch/0.xx.x/config/elasticsearch.0.yml

usr/local/bin/elasticsearch -fD es.config=/usr/local/Cellar/elasticsearch/0.xx.x/config/elasticsearch.1.yml

That should get you most of the way there (the new node comes up on port 9201), but if you have any problems and need an alternative read this detailed response on Stackoverflow 


Getting Started with Elasticsearch

I’ve been doing a lot of Elasticsearch work at my fulltime job and I’m liking it very much (Actually in San Francisco for an Elasticsearch conference right now). That being said … I started reading this great article by Jon Tai about how to use Elasticsearch as a supplement to your database to get quicker results for unstructured/complex queries, then I started to look at the rest of his blog posts about Elasticsearch and quickly realized that if you’re trying to get up to speed with Elasticsearch, there isn’t clearer, more easily digestable writing on the web about the basics of Lucene and Elasticsearch.

Trust me, I know. I’ve been screwing with ES for the last six months or so, and the knowledge I have is pieced together is from numerous google searches, Stackoverflow questions,  random one-off blogposts about Elasticsearch, Tire or/and videos from the Elasticsearch site.

So once you actually get ES setup on your dev machine, go get yourself a good cup of whatever and then snuggle up with the following (in this order).
Testing Lucene Analyzers with elasticsearch
Lucene Scoring and elasticsearch’s _all Field
Then watch this 40 minute video by Elasticsearch creator, Shay Banon, that explains the way Elasticsearch is designed and how to use it to your advantage
Big Data, Search and Analytics (I’ve watched this 3 times since last August and I pick up something new each time)


Elasticsearch error – failed to connect to master

I restarted Elasticsearch and started getting a nasty stack trace in my elasticsearch logs, the key line being

failed to connect to master [[Buzzard][bC1NWlbVT8Wnq7adl3VetA][inet[/]]] 

There was no ip address like that on my network, it was maddening because no matter what I tried, it kept trying to find that non-existent master node.

Turns out that older versions of elasticsearch (I’m running 19.2 … current version is 20.x) have that problem where stale master id information can be broadcast over the network by a client node. This probably happened because I took my laptop home from work and did a restart of elasticsearch at home (different network/ip address etc)

Eventually I found the solution here.

If you’re getting this error when you go to startup elasticsearch, multicast is probably not working properly. I’m running elasticsearch on a single server (dev environment) and didn’t need all the ceremony.

So I just went into elasticsearch.yml (mine was in /usr/local/Cellar/elasticsearch/0.19.2/config/elasticsearch.yml) and set false

thats was it. Elasticsearch came right back up!