Searching Big Data with Solr

Solr is extremely efficient at search. Not just keyword search but any kind of filtering criteria. Solr provides stable, time tested ways of scaling out. Whenever a Solr appliance is overwhelmed with traffic, the solution was always to add more servers. It was easy and it worked. Solr achieves good performance in searching large amounts of data through its sophisticated caching capabilities. Though technically not a NoSql solution, Solr has a lot in common with other NoSql technologies such as Cassandra.

Here at Zoosk, we currently use Solr in two different ways. Personals app users can find potential dating partners through what we call user profile search which employs a sophisticated filtering and ranking algorithm. The heart and soul of the romantic social network, the news feed, uses a different Solr cluster. The challenge there is not so much searching as it is handling the large and expensive rate of updates that flow in and propagate throughout each user’s social network. As of the time of this writing, our Solr index for the news feed is hundreds of millions of documents in size and grows by millions each day.

I gave a presentation on how Zoosk uses Solr at the Lucene Revolution 2012 conference in Boston as part of their big data track. The conference boasted high profile vendors such as Hortonworks and Microsoft. There were also lots of other companies using Solr such as Etsy and CareerBuilder.

Be sure to revisit this blog as I will share with you our trials and tribulations at using Solr and Lucene at very large scale. Experience last minutes saves, late night rebuilds, just in time deployments, traffic shaping kung fu, and hours of log file analysis as I share with you some stories regarding these high tech dev-ops thrills.