Skip to Navigation | Skip to Content



Archive for January, 2008

Super fast full text search on Rails with Sphinx and Ultrasphinx | January 30th, 2008

Full text search is difficult, I mean really hard to do right, with good relevance, filterable, fast querying and indexing. Searching is just not something that any business should be trying to solve as a core objective unless they are planning to taking on the Google… or they have reams of unstructured data in weird silos without any other way of accessing it. Search simply should not replace solid Information Architecture. Content should be inherently findable without an ambiguous search box. The use cases for a powerful full text search are those very rare instances where there is either just a tonne of content that is too difficult to classify or you are interacting with content that has little or no structure, metadata or taxonomy.

And, of course, my next big project, required full text search. Massive amounts of data. Very little structure. Disparate content sources.

In past projects I’ve had some limited success, and spectacular failure, with Ferret, a Ruby port of the Java based Lucene search. It was slow, leaked memory and generally caused more pain than it should.

The next full text search project I ran into our client decided that the Google Mini Search Appliance would be the cheapest and quickest the way to go. He was right. It provides an out of the box search interface experience with fantastic relevance. This is mighty Google in a box afterall. The downside? Google Mini has limited customization without cycling through reams of skanky XSL/T. Deep integration is nearly impossible. That’s an ok tradeoff depending on your goals. For our project it just wouldn’t work.

My next thought was a project called Solr which is essentially an abstraction layer on top of (again) Lucene. Solr has an accompanying Rails plugin called, appropriately, acts_as_solr. The only issue, I could immediately see with Solr, is having to maintain a second server instance, which is generally something you’ll nearly always have to do for full text search.

Then Sphinx happens to appear on my radar via the excellent Deploying Rails Google Group. To make a long story short here’s a couple of quotes from Ezra Zygmuntowicz, of Engine Yard fame, who literally wrote the book on deploying rails applications:

Ferret is unstable in production. Segfaults, corrupted indexes galore. We’ve switched around 40 clients form ferret to sphinx and solved their problems this way. I will never use ferret again after all the problems I have seen it cause peoples production apps. Plus sphinx can reindex many many times faster then ferret and uses
less cpu and memory as well.

Right about here you can hear me say in a western Canadian accent, “Sphinx eh..”. Ezra continues in a later post:

We have a bunch of clients using solr as well. In general it is more powerful then sphinx but a lot slower to reindex and querey. Also it uses 50 times the memory of sphinx. If you have a box or vm to put SOLR on by itself then it is a good option as well. but if sphinx can do everything you need from a a search indexer then it is a way better option cost wise.

Ok. I was fully intrigued in this crazy Sphinx. A little searching around I came across an excellent article from Ben Smith about using Rails with Sphinx. Following his instructions for compiling with support for MySQL 5 installed via Darwin Ports was pretty easy. I chose the Ultrasphinx plugin which makes working with Sphinx trivial. He mentions acts_as_sphinx and the predictably named Sphincter which I plan to revisit eventually.

So, how is Sphinx with Ultrasphinx? Awesome. I can’t quite reveal what exactly I’m working on, but I can say that I had recently downloaded all the page titles from Wikipedia and imported roughly 2.4 million of them into MySQL 5 into three columns. The first column is an integer index. The second is the raw page title. The final column is the page title with formatting removed and underscores replaced by spaces. You know, human readable and all that.

Running a hand rake task for indexing outputs:


collected 2409850 docs, 67.8 MB
sorted 8.9 Mhits, 100.0% done
total 2409850 docs, 67771103 bytes
total 34.220 sec, 1980424.75 bytes/sec, 70421.26 docs/sec

Roughly 34 seconds to index 2.4 million records. Not bad. I’ve had faster times so it might be time for me to reboot the old macbook. But what about actual searching? Consistently under a second. Relevance? Very good.

The verdict: Sphinx rocks.

Posted in Uncategorized | No Comments » | Add to Delicious | Digg It

A slightly better terminal experience for OS X | January 29th, 2008

Visor

A quake style terminal for OS X. Simply badass for the old school first person shooter gamer in you.

Tabs

Of course, just having a terminal instance available whenever to ‘drop in’ is sweet but most of us need more than one terminal at a time. No problem. Add tab support to your terminal. Awesome.

Syntax highlighting?

From the same blog there is an article about syntax highlighting with terminal but I cannot seem to get it working. Any suggestions from the lazy web are appreciated! I use textmate for the most part so I’m surviving but it would be nice to have. Seems like a major shortcoming on Apple’s part.

Posted in Uncategorized | No Comments » | Add to Delicious | Digg It

Serverside JavaScript with Jaxer | January 24th, 2008

JavaScript has a bright future. Millions of websites are being developed and every last one is looking for an edge. More often that not that edge is a better user experience. Web developers are required to understand JavaScript and, thanks to the many frameworks, very rich behavior can be achieved with very little code. And knowledge of JavaScript, in all its quirkyness, is growing fast. Aptana knows this. They build the most kick ass ajax ide out there.

Aptana also knows it is only a matter of time before these developers, their customers and ours, begin looking for deeper functionality which can only be achieved on the server side. There are many options and, while possible, JavaScript is not the most accessible language on a server. At least, it wasn’t.

Aptana just released Jaxer. An ajax server. It plays nice with Apache and apparently Java servers too. The docs are thin, these are early days, the community is brand new.

I created a Jaxer google group / mailing list to us started.

Posted in Uncategorized | No Comments » | Add to Delicious | Digg It

Just a friendly reminder for Feb 6 | January 21st, 2008

All the details on Upcoming!

Posted in Uncategorized | No Comments » | Add to Delicious | Digg It

Vlad sports a new Rake | January 15th, 2008

Or, something cool like that anyhow.

Posted in Uncategorized, rails, ror, ruby, software | No Comments » | Add to Delicious | Digg It

Vlad the Deployer set svn username | January 8th, 2008

Quick gotcha I ran into today deploying with Vlad (the Deployer) with a Subversion repo that has a different username than your deployment server. Here’s the variable you need to set in deploy.rb:

set :svn_cmd, "svn --username brian"

Of course once we add another developer to this project that hack isn’t going to hold up. If I magically find some extra time I’ll post a proper fix.

Gotta say Vlad is really nicely written. Have a peek at the source if you get the chance.

Posted in Uncategorized | 1 Comment » | Add to Delicious | Digg It

Free books to read this weekend | January 4th, 2008

Four free books for some light weekend reading on the subject of software development and entrepreneurship. Enjoy!

Posted in Uncategorized, software | 1 Comment » | Add to Delicious | Digg It

Thoughts on Zed’s Rant | January 4th, 2008

I resisted the urge to comment on this but so many opinions are floating around and most of them are fucking stupid so I had to comment.

Zed’s rant means nothing to the technology of Ruby or Rails. His rant was about people he had taken issue with. Thats it. I’m reading absolutely retarded ’sky is falling’ responses about failure of the technology. There was no failure of technology. This rant was the product of failed communication. I don’t agree with the opinions or the approach but I sure found it funny. Zed is an excellent flame writer and should probably awarded a medal for best troll ever written in the history of the internets.

I’m reading posts all over the place that are using Zed’s rant as a rally against the technology of Ruby and that is bullshit. Zed predicted that technology will change. Here, read that again: Zed predicted that technology will change.. No shit it will. Of course better solutions are going to emerge and evolve. It doesn’t mean the technology isn’t valid today.

I agree with Tim Bray and Dion Almaer. Rails is a solid technology appropriate for some scenarios but not all. Use it or don’t. Thats it. Calm down. Take a breath. Read the rant for the humor it is intended to be.

Lastly, if you are a programmer, take this rant as opportunity to teach yourself some Python or Scala or Erlang or even Ruby. Then you can draw your own, intelligent, rational and possibly objective arguments for or against a technology.

Posted in rails, rant, ror, ruby | 2 Comments » | Add to Delicious | Digg It

Ajax and Beer 2.0 | January 3rd, 2008

Everyone knows that a second release means maturity, refinement and more features.

Ajax and Beer 2.0

(Now with more Beer!)

Skip the presentation and head straight for the Shebeen Room in Vancouver Feb 6, 2008. Enjoy beers and with other ajax geeks. No pitches, presentations or demos.

Posted in Uncategorized | 2 Comments » | Add to Delicious | Digg It

2008 Predictions | January 2nd, 2008

Continuing on my recent ‘Goals for 2008′ post here are my Predictions for 2008.

  • The price of music (and media in general) continues to slide to $0. Dvd, hd-dvd, bluray and cd sales plummet.
  • New revenue models will emerge for traditional media and information. Not just advertising.
  • Marketing via Alternate Reality Games becomes more common.
  • User experience gets more attention, in particular Interaction Design, and Flow.
  • Low level cloud services continue to grow enabling one click infinitely scalable deployment of web applications. An area pioneered by Amazon we will see Microsoft, Yahoo and Google enter the fray lowering the cost to developers. Mozilla could be a dark horse here.
  • Sustainability and climate change will become hazy platforms for polarized debate. The earth will continue to warm.
  • Canucks make the third round of the playoffs.
  • The mobile web gets a large following via microblogging tools and technology. Twitter gets acquired. Pownce releases an updated ATOM API with oAuth.
  • AIR gets acceptance but doesn’t grow as quickly as Adobe expects. Changes to the security model sometime this summer will help.
  • Silverlight fails. (It won’t go away but it will continue to suck.)
  • APML, OpenID 2.0 and Attribute Exchange gain acceptance.
  • Dynamic languages continue to flourish; becoming acceptable glue for java and .net vms. People, like me, will roll their eyes and yawn.
  • The .NET community continues to totally misrepresent the concepts of REST and DSL’s with their new MVC (blatant shit ripoff) Framework.
  • Web services continue to fail. REST and ATOM (guised perhaps as GData) continue to succeed and grow. Techniques for offline / batch processing emerge.
  • Java popularity rises while Mono’s shrinks.

Posted in Uncategorized | 2 Comments » | Add to Delicious | Digg It


Search Posts

You are currently browsing the Brian@Nitobi weblog archives for January, 2008.

Archives

Categories