Friday, November 11, 2011

Making HTTP requests and parsing JSON responses in Scala

One of the first things that I've tried to do on my own in Scala (not following a book or online example) is making HTTP requests and parsing the JSON responses. There are powerful libraries written in Scala to do both of these things, Dispatch for HTTP requests and the Lift JSON library.

Using these libraries as my first introduction to Scala proved to be a little more difficult than what I expected. Coming from a Ruby background I expected to be able to make a single call to do an HTTP request, then a call to parse the JSON out in to a hash/map. Not so in Scala. The added complexity can help when you want to do more advanced things, and allows you to give a clear definition to the expected JSON data, but makes it a little more challenging to get going at first.

Getting the libraries in your project
First, you'll need to import the dispatch and lift-json libraries in to your project.  For SBT, add the following to your project build class:

Then type reload and update at the sbt prompt.

As of this writing, 0.8.6 is the lastest dispatch. There is a 2.4 for lift-json, but I couldn't get it to work with Scala 2.8.

Making the HTTP request
Next, let's make an actual HTTP request. For this example I'll be talking with the Github API.  We'll make a request to get a list of all of my repos.  See for details on this API.  Here is the code:

First things first, the import lines will import everything for dispatch and Lift JSON. Next, line 4, we make a dispatch Http object that we will perform operations on. Line 5 is making an object for the specific request. Line 8 is where it gets interesting. With dispatch, when creating and processing requests, you can chain any number of operations together to perform any type of processing that you want. The Periodic Table of Dispatch Operators has most of the possible operators that you can use (although the one we're using is not there).

Confused? Totally lost? So was I when I tried to use Dispatch for the first time. I'm not sure why Dispatch makes the operations so cryptic. The point of having symbols for functions/operations is so that if you're calling the function many times it can save you some typing and brain power when reading. Well, no one is going to use any of these operators all of the time, so why make everything symbols? To make it so that you feel smarter by making the code more cryptic?

Alright, back to my code example. What >:+ does is execute the request the left, and pass the HTTP headers and the request itself in to a function. headers is a Map for all HTTP headers. On line 9 I'm grabbing the headers, if you want to actually process them you can do that in this anonymous function. The return value of this anonymous function will get returned from the call to the http object. On line 10, I'm using the second parameter to the anonymous function, which is the actual processed HTTP request. You can chain together any number of calls here if you wanted to do further processing, since you have full access to the HTTP request. In my case, the only other thing that I want to do is get the response body as a a string. I'm simply calling as_str on the request, which will return a string. So back to line 8, rspStr now contains a string with the response as a string.

Note that if you just want to get the response body and don't care about the headers, you could just write it like:

val rspStr = h(req as_str)

Processing the JSON data
Now that we have the data, how do we process the JSON? Dispatch has some built in handlers to turn it in to a JSON object, but I couldn't get my project to compile when I included this. Dispatch just uses the Lift JSON library anyway so we might as well use it ourselves.

Lift JSON has some pretty advanced syntax for looking for specific things in the JSON returned. In my case, I want an object that has all of the data from the response, with types so that way other parts of code can use the response. It's pretty easy to do this with Lift JSON. First, we have to define a "case class" that will represent the data. A case class is a class in Scala that can easily be used for pattern matching.

That represents all of the parameters returned from the JSON. If you have a field that may or may not be in the response, specify the type as Option[Type].

Now let's parse the response string as JSON and convert it in to a list of Repo instances.

Line 2 is really important, if you're using the extract method to convert JSON in to a case class you need this or else you'll get an error when compiling saying "could not find implicit value for parameter formats". Line 3 calls the Lift JSON parse function, which will return a JValue. Since the Github response for this particular request is an array of objects, to loop through these we'll need to get a list of these objects. Line 6 does this, rspList is List[JObject]. Finally, on line 7, we're looping through each object (a JObject), and converting it in to an instance of the case class Repo using the extract method. The code after yield gets run for each JObject in rspList. rspRepos now contains a list of Repos for all of my Github code repositories.

So that's it. It's a lot to explain for something seemingly simple, and this took me a while to really get the hang of it. But now that you know how to use Dispatch and Lift JSON you can do some really powerful stuff.

If you want to see this in action, see my GithubApi Scala class:

1 comment:

Søren Bramer said...

The Lift Json library is wonderful to work with. I use it in this small (as in you can read and understand the code in less than an hour) open sourced website