Develop With Style

Speeding Up SQL Queries Containing TEXT Fields

Last week, we encountered some performance issues with a particular MySQL query that was creating a lot of disk based temporary tables, despite the servers having plenty of room to create these temporary tables in RAM. After a little digging we discovered that when MySQL is running an ordered query that involves a TEXT field, it will create temporary tables on disk every time.

So a query like this would create temporary tables on disk, if any one of the included columns has a type of TEXT:

1
SELECT deals.* FROM deals ORDER BY created_at;

The above is a very simple example, and our query was much larger, so was even more pronounced for us. But you get the point - I hope!

So what we needed to do was avoid the inclusion of our TEXT fields in the query, but while still returning the value of our TEXT field. The following is what we came up with:

1
SELECT * FROM deals INNER JOIN ( SELECT DISTINCT deals.id FROM deals ORDER BY RAND() ) x ON (x.id = deals.id)

The above simply wraps the current query in an INNER JOIN, but the key to this is the change in what fields we are selecting. The old query is returning all fields using deals.* which includes any TEXT fields, but this new one is only including the ID with deals.id, which already makes things much faster, and of course avoids the need to create disk based temporary tables.

We ended up speeding up some of our SQL queries by up to 300% using this technique.

Lessons Learnt From Building a REST API

Late last year, I built a REST based API for ShermansTravel using Rails and ActsAsAPI. The API allowed us to tag and/or geotag any object, and was backed with MongoDB and a little MySQL for authentication. The API itself was - and still is - sound, but unfortunately it turned out to be very slow. This was amplified even further due to its use by another REST based API.

These last few weeks have allowed me to revisit the application, and to make changes to speed things up, often-times resulting in a 400% speed increase, not to mention increased concurrency. This post will briefly detail the lessons learnt building a high trafficked API, and what changes I have made.

The Right Way to Create an API

I know, I know, it’s been tooooo long. But whatever!

At Shermans we have begun a large project to rebuild the architecture around the travel deals system. And guess what? we chose to do it all in Ruby! Yay!

The deals server (as we are calling the entire system) will consist of a few separate apps, mostly Rails. A few of these apps will be REST based web API’s, allowing our partners and publishers to fetch deals. Surprisingly, there are very few Ruby Gems available to help with creating an API. But that probably has more to do with the great support for exposing resources built in to Rails using respond_with and co.

But based on my experience building the Codaset API, I knew I wanted a better way to expose the “views” for the API. For example, I don’t want all attributes of my model to be exposed, and Rails doesn’t provide a simple way to do that. Enter RABL!

From Zero to Hero: How I Built Twitious in Two Weeks

… and that wasn’t even part time!

I’ve been using Twitter for quite a while now, and I have quickly found that every time I come across a site or web page of interest - I will tweet it. And quite often, I have found myself either forgetting to add these links to my bookmarks in Firefox, or just not bothering. But that’s not good, as also quite often, I remember I tweeted a link that I need to refer back to, but just can’t remember the damn thing.

Twitter has become my primary means of bookmarking. But there is a problem with that - Twitter doesn’t give me an easy way to see all those bookmarks. It is for that reason that I promptly set about building a web app that will make it easy for me to find all my tweeted links and organise them by hashtags.

Dynamic Form and Dynamic Errors for Rails 3

Rails 3 is awesome! BUT

For some unknown reason, the powers that be decided to remove the error_message_on and error_messages_for form helpers. I’m not sure about every one else, but I use these in pretty much every single form that I create, so I just have no idea why it was removed.

Fortunately, all is not lost. The code for these helpers was extracted out into a plugin called dynamic_form. But unfortunately it doesn’t seem to have been kept up to date, as the last commit was back on June 12th. Running the tests failed all over the place, simply because the HTML returned has changed slightly in the release candidate. So I took it upon myself to fork the code and managed to get the tests passing with Rails 3 RC. I then also packaged it up into a Gem which you can grab now and install in your Rails 3 app.

My next task was to get the custom-err-msg plugin working with Rails 3. But rather than fix and refactor that as it is, I decided to simply integrate it into the dynamic_form plugin, as it’s functionality is closely related.

So the dynamic_form plugin has been enhanced a little, so please install and let me know what you think. I’ve put in a pull request to the official dynamic_form repo, so hopefully they will accept that soon. But in the meantime, you can still use my version by simply running gem install dynamic_form or adding to your Gemfile (if you are using Bundler) gem "dynamic_form".

Handling Dates in MongoDB

I’ve been using MongoDB quite a bit recently in several different projects, and it kicks ass. But the hardest thing to learn and to get used to with Mongo, is it’s schemaless structure. With SQL databases like MySQL, you always have a defined and very rigid structure, which is of course the schema for each table you create. You create a table with a few columns; each column has a set type, a length, and a whole host of other variables.

It’s this rigid structure that most of us have gotten used to for years, and that is the hardest thing to shake off when using NoSQL databases such as MongoDB.

But that is not to say that Mongo has no defined structure. In fact, that is far from the truth. Mongo and all schemaless databases have defined structure, but the difference is that that structure is defined by you. And that is what makes Mongo so cool. No need to actually create any tables and define each column - Just insert some data in your own defined way.

But there is a side effect of this schemaless idea. Even though you no longer have to define a schema in the traditional sense, you still have to think about the structure of each collection, and how that will look. In fact, there is a lot more to think about, as there are a lot more ways to structure your data with Mongo and other schemaless databases.

It’s somewhat of a steep learning curve, but well worth sticking with. I’m still leaning the best ways to structure certain data types, and dates is one that I think I’ve nailed.