Saturday, December 31, 2011

Hiring remote workers versus local

DHH recently posted a blog entry Stop whining and start hiring remote workers.  I posted a comment on this that I think deserves its own blog entry.  I think DHH is in a very unique position at 37signals that allows him to hire the best of the best, which means he doesn't have to deal with as many issues with remote workers that most other companies will run in to.  Here is my comment:


"I partially agree with some of the points here. But in my experience people are much more open and frank if you're in the same room as them versus a Skype call. And from a management stand point it is easier to keep tabs on people, help them if they're struggling, etc. if you can just walk up to them and strike up a conversation directly.  If you have nothing but the best of the best developers, like 37signals, this isn't really an issue.  But most companies our there just don't have the reputation that 37signals does to attract the best devs, so we have to settle for many devs that aren't the best.  Especially if you're in consulting and contracting, and need to staff up.  Having remote devs that aren't the best of the best is much more difficult than if these same devs are local.

My experience has also been that remote devs are much less interested in doing any sort of management work.  And when they do management work they're typically less engaging than managers in the office.  They'll do OK managing those they work with day to day but once they need to interact with people from other teams at the company, someone else in the office is usually needed to facilitate the communication.

And there is the social aspect too that others have mentioned.  Several devs that worked remote at previous jobs but are in the office now at my job have said they're much happier in their lives because they have friends in the office, they can go out to lunch and socialize, etc.  This despite having to deal with commuting every day.  There are some devs that socially feel better working remotely but these are in the definite minority from what I've seen.

There is an environmental factor here too that no one has mentioned.  Many cite the environmental benefit of working from home - you're not driving to work every day. But this is not a valid point typically. If you're flying everyone in to the office 3 or 4 times a year, these 3 or 4 flights will usually have more of an impact on the environment than that person driving to work by themselves every day.  A lot of factors can throw this calculation off (like if the person only needs to take a direct 500 mile flight to get to your office, if the person commuting has a 50 mile commute, etc.) but overall 3 or 4 flights per year will equal or surpass commuting every day as far as environmental damage. "

Thursday, December 22, 2011

Fixing FBML Facebook apps that use Facebooker

If you've written an FBML Facebook app using the Facebooker (not Facebooker2) library for Ruby on Rails, I'm sure you've noticed by now that it's not working any more. In early to mid November, Facebook changed the way they send requests to all canvas apps (FBML and iFrame). Previously, Facebook would send a bunch of parameters called fb_sig_... where each parameter represented something, like the ID of the user. Then there was an fb_sig parameter which you could use with your secret key to validate that the request came from Facebook. The Facebooker plugin/gem is coded to use this.

In early/mid November they switched to using a single parameter, signed_request. This contains all of the info encoded as JSON, along with a validation signature. They no longer use a "session" to grant access, access is now granted with an OAuth2 key.

Unfortunately if you're still using the original Facebooker library like me, you've got some work on your hands to get your app back to a working state. I'll describe everything I did to get my app working again here.

Install Facebooker2 as a plugin
Development on the Facebooker plugin stopped long ago, the original maintainer started a new Facebooker2 plugin/gem. Unfortunately Facebooker2 doesn't have support for all of the FBML capabilities. There are some similarities between the two but overall Facebooker2 was designed to be MUCH simpler and easier to use, and it definitely is. To put Facebooker2 in your project:
  1. Download Facebooker2 to your vendor/plugins directory
  2. Install the mogli and ruby-hmac gems (how you do this depends on your Rails version, either install it on to the system, specify it as a required gem in your environment.rb file, etc.)
  3. Download the original Facebooker gem/plugin to your lib folder (or just copy it from your vendor folder if you put the code in your project previously).
  4. Edit your config/facebooker.yml file, and change secret_key to secret for your environments.
  5. Edit config/facebooker.yml, and add an app_id property to each environment for your App ID (which can be viewed from the Facebook app page for your app).
  6. Create a file config/initializers/facebooker2.rb and put a single line of code in it:


Modify your ApplicationController
Now that you've got Facebooker2 installed, you'll have to modify your code to use it instead of Facebooker. Modify your ApplicationController to include this code:

Now that you've got this in place, you'll have to edit the rest of your code.

Client and User
Instead of using facebook_session and facebook_session_secured, you'll want to call the methods current_facebook_user and current_facebook_client. current_facebook_user will return a Mogli::User object. Here is some code showing some of the common things you may want to do:


Facebooker Helpers
To include all of the FBML helpers that the Facebooker library includes, add the following code to your application_helper.rb:


You'll also need to make a few changes to the Facebooker helper file. Edit lib/facebooker.../lib/facebooker/rails/helpers.rb and add the following:


Views
If you have named your view files .fbml.erb, you'll need to rename them to just .erb. The .fbml type isn't registered with Facebooker2.

Publishing Stories
Facebooker had this whole publisher syntax to post stories to the users wall. None of this is necessary any more. If you wish to publish stories to a users wall, as long as the user has granted the offline_access permission to your app, do the following:


OK I think that's all there is to it. I have my Facebook app working with these changes.

The Future of FBML
You probably know that Facebook is completely abandoning FBML on June 1, 2012. Converting your app as I described here gives you some time to convert your app's views over to iframe instead of FBML. However, don't be surprised if stuff stops working before then. For instance, I can't get my app to display with https. I'm doing everything correctly but Facebook appears to not support this. Hopefully this blog post will give you some breathing time before June.

Saturday, December 3, 2011

Verifying Facebook's signed_request in Ruby

Facebook's signed_request parameter can be quite complicated to parse in Ruby. Facebook's examples are of course entirely in PHP. signed_request is their new way of delivering data to your app instead of individual fb_sig_ parameters for everything. Here is the code to properly verify the signed_request parameter, and return a hash with all of the data from the request. Just call parse_signed_request passing the received params from the HTTP request, and your app's secret key (issued by Facebook). An exception will be thrown if verification fails, otherwise, you'll get a hash of the data back.

Sunday, November 27, 2011

Executing system commands in Scala

The general consensus from what I've read is that to execute a system command in Scala, you should use the Java runtime features. I don't like this, Scala should be more concise than Java. SBT has classes to execute system commands, however, these are generally only available in your build code. To use this code in your program, download these two files: Process.scala and ProcessImpl.scala (available at the Github repo xsbt's util process). Put these files in your src, and then:


Calling !! will block until the command is finished, output errors to stderr, and return the output form the command. If the command returns a non-zero code, and exception is thrown. Calling ! will block, output to stderr and stdout, and return the return code of the command. No exeception is thrown when using !. Look at the code for the ProcessBuilder trait to see other functions that you can call, there are many.

Saturday, November 26, 2011

Using Lift JSON to return a Map from JSON

It's easy to convert a JSON string in to a key/value pair Map with Lift JSON. I found out how to do this from the Stack Overflow question Can I use the Scala lift-json library to parse a JSON into a Map?

First, if you don't already have Lift JSON in your project, you'll need to add it. For SBT projects, add this to your project build file:


Here is the example code for how to get a Map back:


That's all there is to it. If you knew that every value was a particular type, like String, you could specify that instead of Any in the asInstanceOf conversion.

Friday, November 25, 2011

HTTP authentication with Scala Dispatch

I had some trouble getting basic HTTP authentication to work with the Dispatch library. The documentation and message posts that I've seen say to use the .as(username, password) function on the HTTP request. However, this did not work for me. I had to use the as_! function instead of as. Example:



I believe using as will make a request without authentication, look for a 401 response header, and then re-issue the request with authentication. The Github API, and many others I imagine, do not send a 401 response back when not authenticated.

Sunday, November 13, 2011

Writing generic functions that take classes as parameters

It is possible to write generic functions that take a class as a parameter, allowing you to dynamically instantiate and/or reference a class within a function, without that function having to know what the class actually is. Scala has always had support for parameterized types. But if you want to do something with the class, like instantiate it or pass it to something else, you have to use the the Manifest feature.

Say you want a function that will return a new instance of the type of class passed in to the function. Here is how you do it:

That's all there is to it! Just specify : Manifest in the definition of the type T, and you can specify it as any class. To access the type, you must pass it as the class to the manifest function. You can do operations on this directly, but if you want to instantiate, call .erasure on it. This will return a scala class. Call newInstance to instantiate. Then call asInstanceOf to convert it in to the exact type you are expecting (T).

This is a really simple example. For a more complex example, let's look at my GithubApi class. I need to write a function that will make an HTTP request to Github's API (which returns a JSON array) for a specific API end point, and convert the response JSON in to a class that I specify when calling the function (which is the expected response from the particular end point). The class will always extend my base class "GithubClass". See my previous blog post Making HTTP Requests and parsing JSON responses in Scala for more info on how to make the HTTP request and parse the response.

That's a lot of code so let me try to break it down. First we define a base class. Then two classes which represent data that comes back from Github, that extend the base class. Then a GithubApi class which will just contain a function to make a request. Let's look at the definition of the function getData:

    def getData[T <: GithubClass : Manifest](path: String) : List[T] = {

First we're using the Scala parameterized type here to pass in the type of class that we expect. The T <: GithubClass means that type passed in must extend GithubClass, if we try to send something that does not, it won't compile. The : Manifest means that the type is going to be accessible in the function manifested. path is the path to the API. Finally, we're returning a list of the items of the type that we pass in. Now let's look at the line that actually converts the JSON data in to the type specified:

    for (jObj <- rspList) yield jObj.extract[T]

Here, we're looping through everything returned from Github, and for each item, we're converting the JSON in to the class of the type passed in (extract is a lift JSON function that converts their JSON in to an instance of your class). We can simply refer to the class as T.

Pretty powerful stuff, huh? This combines the best of dynamic languages like Ruby, where you can pass classes around anywhere you want, and static languages like Java where you can enforce types and know what you're passing in and getting back.

Friday, November 11, 2011

Making HTTP requests and parsing JSON responses in Scala

One of the first things that I've tried to do on my own in Scala (not following a book or online example) is making HTTP requests and parsing the JSON responses. There are powerful libraries written in Scala to do both of these things, Dispatch for HTTP requests and the Lift JSON library.

Using these libraries as my first introduction to Scala proved to be a little more difficult than what I expected. Coming from a Ruby background I expected to be able to make a single call to do an HTTP request, then a call to parse the JSON out in to a hash/map. Not so in Scala. The added complexity can help when you want to do more advanced things, and allows you to give a clear definition to the expected JSON data, but makes it a little more challenging to get going at first.

Getting the libraries in your project
First, you'll need to import the dispatch and lift-json libraries in to your project.  For SBT, add the following to your project build class:

Then type reload and update at the sbt prompt.

As of this writing, 0.8.6 is the lastest dispatch. There is a 2.4 for lift-json, but I couldn't get it to work with Scala 2.8.

Making the HTTP request
Next, let's make an actual HTTP request. For this example I'll be talking with the Github API.  We'll make a request to get a list of all of my repos.  See http://developer.github.com/v3/repos/ for details on this API.  Here is the code:

First things first, the import lines will import everything for dispatch and Lift JSON. Next, line 4, we make a dispatch Http object that we will perform operations on. Line 5 is making an object for the specific request. Line 8 is where it gets interesting. With dispatch, when creating and processing requests, you can chain any number of operations together to perform any type of processing that you want. The Periodic Table of Dispatch Operators has most of the possible operators that you can use (although the one we're using is not there).

Confused? Totally lost? So was I when I tried to use Dispatch for the first time. I'm not sure why Dispatch makes the operations so cryptic. The point of having symbols for functions/operations is so that if you're calling the function many times it can save you some typing and brain power when reading. Well, no one is going to use any of these operators all of the time, so why make everything symbols? To make it so that you feel smarter by making the code more cryptic?

Alright, back to my code example. What >:+ does is execute the request the left, and pass the HTTP headers and the request itself in to a function. headers is a Map for all HTTP headers. On line 9 I'm grabbing the headers, if you want to actually process them you can do that in this anonymous function. The return value of this anonymous function will get returned from the call to the http object. On line 10, I'm using the second parameter to the anonymous function, which is the actual processed HTTP request. You can chain together any number of calls here if you wanted to do further processing, since you have full access to the HTTP request. In my case, the only other thing that I want to do is get the response body as a a string. I'm simply calling as_str on the request, which will return a string. So back to line 8, rspStr now contains a string with the response as a string.

Note that if you just want to get the response body and don't care about the headers, you could just write it like:

val rspStr = h(req as_str)


Processing the JSON data
Now that we have the data, how do we process the JSON? Dispatch has some built in handlers to turn it in to a JSON object, but I couldn't get my project to compile when I included this. Dispatch just uses the Lift JSON library anyway so we might as well use it ourselves.

Lift JSON has some pretty advanced syntax for looking for specific things in the JSON returned. In my case, I want an object that has all of the data from the response, with types so that way other parts of code can use the response. It's pretty easy to do this with Lift JSON. First, we have to define a "case class" that will represent the data. A case class is a class in Scala that can easily be used for pattern matching.

That represents all of the parameters returned from the JSON. If you have a field that may or may not be in the response, specify the type as Option[Type].

Now let's parse the response string as JSON and convert it in to a list of Repo instances.

Line 2 is really important, if you're using the extract method to convert JSON in to a case class you need this or else you'll get an error when compiling saying "could not find implicit value for parameter formats". Line 3 calls the Lift JSON parse function, which will return a JValue. Since the Github response for this particular request is an array of objects, to loop through these we'll need to get a list of these objects. Line 6 does this, rspList is List[JObject]. Finally, on line 7, we're looping through each object (a JObject), and converting it in to an instance of the case class Repo using the extract method. The code after yield gets run for each JObject in rspList. rspRepos now contains a list of Repos for all of my Github code repositories.

So that's it. It's a lot to explain for something seemingly simple, and this took me a while to really get the hang of it. But now that you know how to use Dispatch and Lift JSON you can do some really powerful stuff.

If you want to see this in action, see my GithubApi Scala class:
https://github.com/brentsowers1/GithubCodeFixer/blob/master/src/main/scala/GithubApi.scala

Thursday, November 10, 2011

Code highlighting on Blogger

Want to post code snippets on your blogs? It's very easy with the SyntaxHighligher project.

Go in to your blog settings, click Design, then click Edit HTML. In the box below, right before the </head> tag, insert the following code:

<link href="http://alexgorbatchev.com/pub/sh/current/styles/shCore.css" rel="stylesheet" type="text/css"></link>
<link href="http://alexgorbatchev.com/pub/sh/current/styles/shThemeDefault.css" rel="stylesheet" type="text/css"></link>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shCore.js" type="text/javascript">
</script>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shAutoloader.js" type="text/javascript">
</script>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shBrushJScript.js" type="text/javascript">
</script>
<script src="http://alexgorbatchev.com/pub/sh/current/scripts/shBrushXml.js" type="text/javascript">
</script>

<script type="text/javascript">
//<![CDATA[
SyntaxHighlighter.all();
//]]>
</script>

This will load the default code highlighting for Javascript and XML. SyntaxHighlighter has 30 different languages, you'll have to add support for these manually. I'll explain this later.

Now to actually put a code snippet in your blog post, click the Edit HTML section. Add the following where you want your code snippet to show up. In this example I'll use Javascript:

<script class="brush: js" type="syntaxhighlighter">
<![CDATA[
// Put your code here, an example is below.

function foo()
{
if (counter <= 10)
return;
// it works!
}
]]>
</script>
Here is what it will look like on your blog:


So just put a script tag like above with a type of syntaxhighlighter. The class is important. brush: js tells it to use Javascript syntax highlighting. If we wanted XML/HTML, simply put the class as brush: xml.

To use other language syntax highlighting, you'll have to add a script tag for the language. The src is the same , /pub/sh/current/scripts/shBrushxxx.js. Some common types are Ruby, Python, Java, Scala, Cpp, CSharp, Css, Plain, Sql, etc. Download SyntaxHighlighter to see the full list.

A few things to note. If you are heavily using this, you should download the files and put them on your own server. Alex Gorbatchev is hosting the files on his server out of kindness, so if you want to heavily use it, either donate to him so he can pay his server bills or put them on your own server.

Also, I couldn't get the examples on the official site using the autoloader to work for me. That's why I've added specific script tags for each brush in my examples.

A big thanks to Alex for writing SyntaxHighlighter!

Sunday, October 23, 2011

Starting a Lift project with SBT and IntelliJ

Getting up and running with Scala and Lift requires a lot of tools, especially if you want to develop from an IDE.  Coming from a Ruby on Rails and PHP background, a lot of this was new to me, and the books typically assume you've done a lot of Java development and are used to dealing with build tools, compiling, dependencies, etc.

So this is my guide for how to get started with Scala and Lift development, using the IntelliJ IDEA IDE, based on me spending hours and hours trying to figure it out.  I've used Rubymine and PHPStorm which are based on IntelliJ so IntelliJ is definitely my favorite IDE.  I believe you can also do Scala development from Eclipse and Netbeans.

I won't use the very latest version of everything, because the very latest of one thing might not be compatible with the other.  Also, this assumes you're using Linux, but it will be almost the same on Windows and Mac as most of the action happens in the sbt console.  I've gathered much of the info here from the Lift In Action book which I would highly recommend for learning Lift.  The setup part of this book wasn't completely comprehensive though and I spent a lot of time with trial and error to get to this point.

Scala
First step is getting Scala set up so you can run Scala from the command line.  This is actually optional, but you may occasionally want to manually run Scala code (like irb in Ruby), or compile a stand alone Scala script.  We're going to use version 2.8.1, since this is the latest that is compatible with everything else as of this writing.  (2.9.1 is the latest as of this writing)

  1. Download Scala 2.8.1.  
  2. Extract it to your /opt directory
  3. "ln -s /opt/scala-2.8.1-final /opt/scala" to create a symlink to your scala folder without having to use the version number.  
  4. "ln -s /opt/scala/bin/scala /usr/local/bin/scala" and "ln -s /opt/scala/bin/scalac /usr/local/bin/scalac" to make links to the scala executable and scala compiler so they'll be in your path.
SBT
Simple Build Tool (SBT) is the command line utility that you'll do pretty much all of the work from, other than editing files.  It will compile your code, run your code, load dependencies, load the correct version of Scala for your app, etc.  As of this writing, 0.11.0 is the latest.  However, after version 0.7.7, sbt had a lot of pretty fundamental changes that I don't like (what happened to it creating the project directory/file structure for you, specifying plugins through the sbt console instead of editing files, etc.)  I think version 0.7.7 is easier to use, and it's what the Lift In Action book is written for, so we'll install 0.7.7.
  1. Download sbt-launch-0.7.7.jar to /usr/local/bin.
  2. Create a file named sbt in /usr/local/bin with the contents:
    java -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=1024m -Xmx512M -Xss4M -jar `dirname $0`/sbt-launch-0.7.7.jar "$@"
  3. "chmod a+x /usr/local/bin/sbt" to allow it to execute
Create The Project
This is a little different that what you may be used to.  You'll actually create the project with sbt from the command line, not from your IDE.  
  1. Make a directory for your project and cd in to it
  2. Run "sbt".  It will ask you if you want to create a new project, enter y.
  3. Enter a name for your project
  4. Enter an organization.  
  5. Enter a version number for your app
  6. Enter 2.8.1 for the Scala version
  7. Use the default 0.7.7 for the sbt version
Lift
You don't need to download and install Lift, this is all done from a tool called Lifty which is a module to sbt.  It will give you a "lift" command in sbt that performs lift actions.  As of this writing, 1.6.1 is the latest released version of lifty, which only works with sbt 0.7.7.  There is a 1.7 beta that claims sbt 0.10 support.  We're going to go with 1.6.1.
  1. From an sbt prompt, enter "*lift is org.lifty lifty 1.6.1".  This will install the lifty module globally, to allow you to type lift.  (In newer versions of sbt, you can't do this, you have to edit a file and put Scala code in, why did they take out this ability?)
  2. Assuming you're in the project directory, run "lift create project-blank".  This will make your project a Lift project, with the default files. 
  3. Leave the default value for mainpack
  4. Enter 2.3 for the liftversion (this is important, do not use the default 2.3-RC version, or 2.4 which is not compatible with Scala 2.8.1)
  5. Enter "reload" at the prompt to reload the project definition
  6. Enter "update" to compile your blank app
Your app is now compiled and ready to run!  You can type jetty-run to start up a web server for your app, then navigate to localhost:8080.  Type enter then jetty-stop to stop the server.


IntelliJ IDEA Project
I didn't forget about using IntelliJ IDEA.  Before you open the directory in IntelliJ, you need to create an IntelliJ project file so IntelliJ knows where everything is.  This is done from the sbt-idea processor.
  1. From an sbt prompt, type "*sbtIdeaRepo at http://mpeltonen.github.com/maven/"  sbt-idea is not in the standard scala-tools.org repo so you'll need to add a reference to it with this command.
  2. Install it with "*idea is com.github.mpeltonen sbt-idea-processor 0.4.0".  This will add "idea" as an option to the sbt prompt.  
  3. Type "update" to update references
  4. In your project directory, type "idea".  This will create the project file for IntelliJ.
IntelliJ IDEA
Now on to the IntelliJ program itself.  
  1. Load IntelliJ IDEA, and before opening the project, go to File, Settings, Plugins. 
  2. In the Available tab, add the Scala and SBT plugins.
  3. Restart IntelliJ, go to File, Open Project, and find your project.  It will take some time to index everything.
  4. You'll see an error "Cannot load facet Web".  Click Details and click Yes to remove it."
  5. You MAY have to add the lib and lib_managed folders as libraries.  I can't explain it, I had to do this on the first project I set up, on the second I didn't.  If you do, go to File, Project Structure, Libraries, Add a New Project Library called Libs, Attach Jar Directories, and pick the lib and lib_managed/scala-2.8.1/compile folders.  Also for some reason, it will not pick up some of the Lift JAR files that are in lib_managed.  I could not get it to recognize mapper, so on this same page I had to click Add Classes and pick the mapper JAR file in lib_managed.  
  6. You'll have to configure the SBT plugin.  Go to File, Settings, SBT.  Under IDE settings, enter /usr/local/bin/sbt-launch-0.7.7.jar for SBT launcher JAR file
Now that this is configured, you can use the SBT console at the bottom.  Just click start, and you'll get a prompt for this project.  This is where you'll want to run everything from.

Debugging 
If you want to use break points and everything else that comes along with debugging, from IntelliJ:
  1. From the drop down on the top menu, click Edit Configurations
  2. Click +, then Remote.
  3. Give it a name (like jetty debug)
  4. In the Before Launch section, check Run SBT Action
  5. Click the ... beside Run SBT Action
  6. Check Run in current module
  7. Enter jetty-run for the action
  8. Click File, Settings, SBT.  
  9. Add the following to the end of the VM Parameters line under IDE settings: "-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=5005"
You're all set.  Just click the Debug button, and you're good to go.  SBT will be launched with jetty-run, and you can now debug (note that if you already had an SBT console open, you'll have to kill it).


So, that's all there is to it!  :)  This is a lot to go through to get going, but now that you've got all of this set up, you should have everything you need to develop with Scala and Lift, and distribute the code when you're done.

I'm using Scala and Lift now

I've started learning the language Scala and the web framework built with it, Lift. I'll be making some blog posts soon on these.

I'm going to hold off on making any final judgments until I've written some real things in both.  But so far I'm really liking Scala and Lift. If you haven't looked in to them, I would recommend you at least give Scala a quick look over.  Even if you don't ever intend to use it for your main web app, it can be great for back end processing jobs, as concurrency is built in to the language.  And, if like me you work on some projects that can only be distributed to customer's systems that only run the JVM, Scala is built right on top of Java.  Sometimes JRuby will allow you to distribute on to the JVM, but you don't always get the newest and best running JRuby.

Learning Scala and Lift and blog background

Several months ago I read the awesome book Seven Programming Languages In Seven Weeks, and knew I wanted to learn some more about Scala.  Most of the projects I work on are low volume and wouldn't necessarily benefit a whole lot from using Scala, so I couldn't motivate myself to learn a whole new language that I might not even use.  Then I read about the Lift framework, and knew I wanted to learn both and create something real with them, regardless of what the projects I do at work are.

I've read through Programming Scala, another great book published by The Pragmatic Bookshelf, and feel that I can really develop some good Scala applications.  I've just now started reading Lift In Action to learn about Lift.  Since the community around Lift is still pretty small, and there are a lot of rapidly moving pieces with Lift, I wanted to start blogging about using Scala and Lift.  My main goal here will be to post helpful info that will save others some time.

So some background on me.  I've been doing software development for over 9 years, with the past 5 years of that being web application development.  I've done mostly Ruby on Rails and some PHP for those 5 years.  While I do still like Ruby on Rails, I think it's time for me to learn another new language that overcomes some of the down sides of Ruby and Rails.  I was an early adopter of Rails (started using it in 2006, at version 1.1), and I loved programming in something new and revolutionary.  I have a feeling Scala and Lift will play this role in the next few years.  Soon I'm going to build a side project with Scala and Lift, and I'm hoping I can incorporate it in to projects at my job.

Saturday, June 11, 2011

HTTP Long Polling (aka Comet) with Nginx

Say you need to have live updates on your web site, like to receive chat messages. In this article, I will explain how to use the HTTP push module for the Nginx web server for this. There are several other ways to do this, but most of them have serious drawbacks.

Short Polling
Probably the easiest to implement is simple polling - just fire off an AJAX request every 5, 10, or 15 seconds to an action on your server that simply checks for new messages, and if there are new messages they are sent as the response. Either send the data back as JSON and have Javascript on the client end take care of what to do with it, or send back Javascript instructions for what to do with the data. This two big issues that really preclude it from being used on a large scale:
  1. Client doesn't get real time updates. The time to get a new message is on average at least half of your poll time.
  2. A lot of server and client overhead. Every 5,10,15 seconds an entirely new HTTP connection is established. This uses up resources on the server when you have more than just a few people. And, the server resources to execute the check for messages in your Rails code every 5,10,15 seconds adds up.
Web Sockets
Maybe in 4 years this will be a real option, but right now (2011) it's not an option. The standard is not yet finalized. Firefox 4 and Chrome support it, but it's disabled in Firefox due to security concerns. IE doesn't yet support it.

Long Poll (aka Comet) As A Controller Action
Another pretty simple thing to set up is long polling. With this, the client issues an AJAX request to an action, and instead of returning immediately if there is no data on the server, you sleep for a second or two and check for messages again. Keep doing this in a loop until a message is there, or you hit a timeout period. On the client end, when a response is received, it's processed, and then a connection is immediately re-established to get more messages. This is more responsive than short polling but has two huge problems:
  1. Server resources for open connections. For each client that connects, you need to have an open HTTP connection. With languages like Ruby on Rails and servers using Passenger or FastCGI, that means a separate process for each connection using up lots of memory. This severely limits the number of clients that can be on your server at once. Using JRuby and a Java server can help here but you'll still run in to some problems.
  2. Server resources for checking for messages every 1 or 2 seconds. If you're checking the database for new messages this often, lots of clients connected at once can really eat up the processor on your server when you have a lot of clients.
Nginx Push Module for Long Polling
While researching a better way to do this for a web site that I work on where traffic is ramping up, I stumped upon the HTTP Push Module for Nginx. This is a module for the web server Nginx that acts a message queue. Clients wishing to receive messages issue an HTTP connection to a path on your server configured for "subscribing" and leave the connection open. Then anyone wishing to send a message issues an HTTP POST to another path on the server configured for publishing. Clients listening will get a response to the HTTP connection with the data. Any number of different channels can be used, so you can have a channel for each user of your system. The messages are queued up, so that the client doesn't have to actually be listening, they can check in at a later point to get the message.

This works similar to the long polling I described above, except that you don't actually write the code to check for messages, and the open connections are all handled by Nginx - no Ruby. Nginx is extremely efficient at handling tons of open connections at once. I was able to test 5000 (yes, that's five THOUSAND) concurrent clients listening for messages, and publishing a message to EACH client once every minute on an Amazon EC2 micro instance, and the server didn't even break a sweat. The entire server (not just Nginx but every process on the system) was only using 220 megs of memory, and was averaging 1 or 2 % CPU usage.

Server Configuration
To install Nginx with long poll, I did these steps on a Ubuntu server:
  1. sudo apt-get install libpcre3 libpcre3-dev
  2. wget http://nginx.org/download/nginx-0.8.54.tar.gz
  3. wget http://pushmodule.slact.net/downloads/nginx_http_push_module-0.692.tar.gz
  4. tar -xzf nginx-0.8.54.tar.gz
  5. tar -xzf nginx_http_push_module-0.692.tar.gz
  6. cd nginx-0.8.54
  7. ./configure --prefix=/usr/local/nginx-0.8.54 --add-module=/home/ubuntu/nginx_http_push_module-0.692
  8. make
  9. sudo make install
  10. sudo ln -s /usr/local/nginx-0.8.54 /usr/local/nginx
  11. sudo ln -s /usr/local/nginx/sbin/nginx /usr/local/sbin/nginx
  12. Add /etc/init.d/nginx from http://wiki.nginx.org/Nginx-init-ubuntu
  13. sudo chmod a+x /etc/init.d/nginx
  14. sudo /usr/sbin/update-rc.d -f nginx defaults
  15. sudo mkdir /var/log/nginx
Note that to use this, you have to actually re-compile Nginx with the push module. If you're using the Passenger module as well, you'll have to compile that in too. I have to confess, I'm just going to use the Nginx server as a load balancer and long poll server, so it will just forward actual application requests (other than long poll) on to another server that will actually execute the Rails code. So I don't know how to compile the passenger module in as well off the top of my head. Maybe the install-passenger-nginx-module has an option to include other modules?

That's not it though, you'll need to configure the end points for subscribing and publishing. Add the following to your nginx.conf file:
# internal publish endpoint (keep it private / protected)
location /publish {
set $push_channel_id $arg_id; #/?id=239aff3 or somesuch
push_publisher;
push_store_messages on; # enable message queueing
push_message_timeout 2h; # messages expire after 2 hours, set to 0 to never expire
push_message_buffer_length 10; # store 10 messages
}
# public long-polling endpoint
location /subscribe {
push_subscriber;
# how multiple listener requests to the same channel id are handled
# - last: only the most recent listener request is kept, 409 for others.
# - first: only the oldest listener request is kept, 409 for others.
# - broadcast: any number of listener requests may be long-polling.
push_subscriber_concurrency broadcast;
set $push_channel_id $arg_id;
default_type text/plain;
}

If you really want to have a ton of clients connected, you'll need to edit some system settings to allow lots of open files. First, edit /etc/security/limits.conf and add:
* soft nofile 50000
* hard nofile 50000

Then edit /etc/sysctl.conf and add:
fs.file-max = 100000

Client Code
So that should be it for configuring the server. Here is an example page with full Javascript using Prototype to listen in the background for messages, and then display the messages with a timestamp. Enter a message to send and click Send Message, and you should see that message show up (as long as you save this file on a server with the Nginx HTTP push module configured as I describe above).

<html>
<head>
<script src="javascripts/prototype.js" type="text/javascript"></script>
<script type="text/javascript">
// Code to listen for messages from an Nginx server with
// the HTTP push module installed, and publish messages
// to this server.
// See http://rails.brentsowers.com/2011/06/http-long-polling-aka-comet-with-nginx.html
// for details on how to set the server up.
// Note that for everything to work properly, the Nginx
// server has to be the same server that this file is
// on. If you set up a separate server to handle this,
// the Javascript won't quite work as expected because
/ of the Same Origin Policy.

// Just use a generic channel ID here, this can be any
// text string, for a real system you'll want this to be
// some sort of identifier for the client.
var channelId = "asdf";

// Default initial values
var etag=0;
var lm='Thu, 1 Jan 1970 00:00:00 GMT';

function doRequest() {
new Ajax.Request('/subscribe?id=' + channelId, {
method: 'GET',
onSuccess: handleResponse,
onFailure: handleFailure,
// Custom HTTP headers have to be sent, based on
// the HTTP response from the previous request.
// This tells the server at which point to look
// for messages after. If these aren't included,
// the server will just return the first message
// in the queue
requestHeaders: [
'If-None-Match', etag,
'If-Modified-Since', lm
]
});
}

function handleResponse(response) {
var txt = response.responseText.stripScripts().stripTags();
addMessage(txt);
// Read the headers from the server response. The
// header will contain a Last-Modified header that
// is the date/time of the message we just received.
// This time will be specified on the next request, so
// we get messages after this time. There is no
// acknowledgement of messages, messages stay in the
// queue on the server until the limits set in the
// server config are met.
etag = response.getHeader("Etag") || 0;
lm = response.getHeader("Last-Modified") ||
'Thu, 1 Jan 1970 00:00:00 GMT';
doRequest();
}

function handleFailure(response) {
addMessage(error);
}

function publishMessage() {
var txt = $F('pubtext').stripScripts().stripTags();
if (txt.length == 0) {
alert("You must enter text to publish");
} else {
// The response is XML with how many messages are
// queued up, no point in looking at it here.
new Ajax.Request('/publish?id=' + channelId, {
method: 'POST',
postBody: txt
});
}
}

function addMessage(msg) {
var d = new Date();
var msg = d.toString() + ": " + msg;
$('data').insert(msg + "<br />");
}
</script>
</head>
<body onload="doRequest()">
Messages:
<div id="data">
</div>

<input type="text" name="pubtext" id="pubtext" />
<input type="button" value="Send Message" onclick="publishMessage()" />
</body>
</html>


At some point I'll set up an example page on my server so you can actually see this in action.

Integration with your web app
This is a simple example, but you could use this in a complex system as I am. When one part of your app wants to send a message to a user, simply issue an HTTP POST request to the long poll server from within your controller action (or rake task, or whatever else), using net/http, HTTParty, or any other Ruby code to issue HTTP requests. As long as the long poll server is on the same network as your app server, the response time for this will be extremely fast.

One big downside to the http push module is that there is no authentication out of the box. So theoretically anyone could listen for messages to a user (by default any number of clients can be listening for messages on the same channel). A way to get around this, is to dynamically generate random channel IDs for each subscriber every time they log in, and then store the mapping of your user ID to the current random channel ID. You'll need to set push_authorized_channels_only setting to on (see the description of this). This way, a subscriber cannot create a channel. Then when the user authenticates, issue a POST to create the channel. I haven't implemented this but I know it can be done.

Useful Links