Sunday, January 18, 2015

My 100 Favorite Albums By Statistical Analysis

Background

A co-worker asked for some album recommendations since he knows I'm a big metal fan, and it got me thinking. What are my favorite albums? Sure I can probably pick my top 5 or so off the top of my head. But beyond that? I really can't say whether Metallica - Ride The Lightning is my 7th favorite album or 15th, I know is's somewhere around there. Being a nerd, I'm always looking for reasons to make a new spreadsheet to answer some questions for me, so I decided to set up a spreadsheet to answer what my favorite albums are, ranked 1 to 100.

Ranking Method

To rank my favorite albums requires lots of data. Fortunately I've got it all already - my huge library in Winamp. For the past 7+ years I've been hoarding albums (almost up to 1,200) and rating each song in Winamp (1 to 5 stars). These ratings, and the played last time, allow me to create an awesome smart playlist that I can just let play and never skip anything.

To reconcile my ratings for consistency I took a quick look at every album in my list and adjusted some ratings. I didn't change much though, I've found my ratings to really hold up. Even though Limp Bizkit has really fallen out of favor since 1999, I do still put Nookie as a 5 star!

Winamp shows an average star rating for each album, but this isn't good enough. It doesn't show decimals just 1, 2, 3, 4, or 5 (rounded up). Plus, this rating weighs each track the same regardless of length. So the 2 minute interlude track Patterns In The Ivy counts the same as the 11 minute masterpiece The Drapery Falls on Blackwater Park by Opeth. Unacceptable!

To get a true album average that is the average rating for a single second of an album, I created a formula that sums up the number of "star seconds" for each track on an album divided by the length of the album in seconds. A track's star second value is the star rating times the track's length in seconds. This end result is a true rating for the album! I'm calling this a Time Rating.

As an example, this is the data for Opeth - Blackwater Park:
Track #NameLengthRating
1The Leper Affinity10:234
2Bleak9:155
3Harvest6:013
4The Drapery Falls10:535
5Dirge For November7:533
6The Funeral Portrait8:444
7Patterns In The Ivy1:521
8Blackwater Park12:085

If you add up the ratings and divide by the number of tracks, you get an average rating of 3.75. But look at the times of the highest and lowest rated tracks - the lower rated ones are short (especially the 1 star track 7), and the highest rated ones are the longest. To get a better average, take the seconds times rating for each track, add them up, and divide by the total seconds of the album:
(623*4+555*5+361*3+653*5+473*3+524*4+112*1+728*5) / 4029 = 4.190

So the true average rating of this album is 4.190, not 3.750, which is a huge difference!

Data Collection

To collect this I had to manually enter the rating and length for each track for what I thought were my favorite top 100+ albums. I just went through all albums and picked out ones I thought might make my top 100. No data export from Winamp so I had to type all this in by hand.

Results

Caveats

So, these are my personal favorites. I am in no way saying that these are the best 100 albums of all time in all of rock music for anyone. These are the ones I like best. So yeah, get ready for lots of Metallica, Opeth, nu metal, and 2000's hard rock. If you disagree with my rankings, well, I don't really care. Although I would love to hear recommendations for other albums you think I'd like.

Data



Data Exceptions

One note to the data. A lot of albums have long instrumentals at the end, meant more as filler and not a normal song, I did not count these. Examples are Days Of The New - Cling - a 13 minute track at the end that is mostly ambient nature sounds. This doesn't really detract from the album for me because I never listen to it. And Korn - My Gift To You - a good 4 minute song with a 5+ minute long gap of silence and then an "extra" hidden song. I only counted the 4 minutes of My Gift To You. Several other ending tracks are like this as well, and I only counted the length of the first song. Fortunately bands don't seem to be doing this any more, that was a gimmicky thing before digital album sales. Also I don't include bonus/extra tracks, these are only the tracks on the original album.

Reality Check

So, this is all good and analytical and all, but how does it compare to what I think my favorite albums actually are?  Pretty good it turns out. #1 through #6 are definitely what I would consider my top 6, in that order. Although, before I saw this data, I wouldn't have been able to pick a favorite between The Black Album and And Justice For All. #7, Opeth - Blackwater Park, is rated a little too high in this data I feel. While I do love the 5 star songs on there, they are really long, especially Blackwater Park, so that drags it down slightly in my mind. The formula skews it a little further up than what I would want. I would probably put Alice In Chains - Dirt at 7, Led Zeppelin IV at 8, and Blackwater Park at 9. After that, it's just about right down through #25.  Beyond that, while the data certainly isn't a perfect ranking of my favorites, it's very close. If I pick two albums randomly, the ranking matches my opinion for which is better in almost all cases.

Bands You May Never Have Heard Of

If you've made it this far, you probably already know most of the bands in the list. There are a few though that you might not have heard of that I would highly recommend:

  • Opeth - If you're in to metal a little but not a die hard metalhead, you might not have heard of Opeth. They are an excellent progressive death metal band that I love, and are really unlike any other band and type of music. Check out their albums on this list (don't bother with their newer albums). Their stretch from 1997 through 2008 was amazing, 7 albums in this list (tied with Metallica for most for me!)
  • Sevendust - They've got their own sound with elements of nu metal, thrash metal, and post grunge hard rock, and it all comes together so well. 4 albums in my top 100 is well deserved.
  • Clutch - a great bluesy hard rock and a little metal band. One thing I love about them is each album has a really distinct sound, and like Opeth, there really is nothing else that sounds like Clutch. They've got 3 albums in the list, and a few more of theirs would be just a little under #100
  • Iced Earth - power, speed, and thrash metal combined with some theatrical elements. Great rhythm guitar and vocal work.  Most of their albums tell a really interesting sci-fi or horror story, like Framing Armageddon and Crucible Of Man. 
  • Kyng - my top album of the 2010's is their debut Trampled Sun at #66, Awesome 3 piece hard rock band with a very distinct sound. Not much is happening this decade in my opinion, they are one bright spot
  • Seventh Void - a doom metal supergroup with just one album so far, at #74 for me.
  • Ghost - a pretty new band to the 2010's with a 70's metal sound. The band is a bit of an act, with most of the songs about Satan in some way, and the lead singer dressed up as an evil pope, but the music is great - catchy and heavy. Their debut, Opus Eponymous, comes in at #81.
  • King Giant - unless you live in the Washington DC area I'm pretty sure you have never heard of them. A local band that has a dark and gritty doom/sludge metal and southern rock sound. Their best album, Southern Darkness, comes in at #92
  • The Obsessed - Hard rock/doom metal in their own unique sound, with awesome guitar riffs. Fronted by Wino (lead singer of Saint Vitus). The Church Within holds up the bottom of the list at #100.

Surprises 

For Me

There was nothing too surprising to me in this data. A few things do stand out though. Bands that I've gotten in to more so in the past 6 years, like Clutch, Lamb Of God, and Judas Priest (yeah, I was never really that in to Judas Priest until 6 years ago), ranked lower than what I would have expected. And, bands that I've been in to for longer that that, who are past their peak and their albums in the past 6+ years haven't been as good, like Sevendust and Opeth, rank a little higher than what I would have thought. This makes sense. And when I think back, yeah, I would rank Sevendust and Opeth's best over Clutch and Lamb Of God's best.

For Others

If you're a big metal fan and know me well, some albums and bands here might be a surprise. Yes, I really would rank Staind - Dysfunction at #12. Yes, I really would rate Alice In Chains that highly. And yes, despite Creed being known more now as a joke because of Scott Stapp and the overly dramatic more pop friendly songs, I think Creed in their prime was still a great rock band, and I still love listening to them. Part of this might be because I was in my formative years in the late 90's, but there was some amazing hard rock and metal in that era. Despite the fact that hard rock and metal had more mainstream success at that time than any other time, and it was all new bands (Korn, Limp Bizkit, Creed, Linkin Park, etc), a lot of people want to forget about that era.

Trends

I collected some stats on averages, and decade by decade results:


Yeah, not much going on this decade for me. Some bands have to step it up in the latter half of this decade!


Wednesday, May 2, 2012

Detecting non-ASCII characters in a git commit hook

If you don't want to allow non-ASCII characters in your code, which can appear when pasting text from Word, you can simply add a pre commit hook to git to check for this. Create a file called pre-commit in the .git/hooks folder of your code repo with the following contents, and change the permissions to user executable (chmod u+x .git/hooks/pre-commit), and git will halt when you attempt to commit if there are non-ASCII characters in the commit (binary files are not looked at). Git will also display the character(s) found, and show the diff of the file that includes the character. Here is what the pre-commit file should look like: If you need to add non-ASCII text that you know is safe, you can temporarily disable the script by running "chmod u-x .git/hooks/pre-commit", make your commit, then "chmod u+x .git/hooks/pre-commit" to re-enable it.

Sunday, April 22, 2012

My Trips Facebook app will not work after June 1

Starting on June 1, 2012, the My Trips Facebook app will no longer be available. This is because Facebook will stop supporting a technology, FBML, that My Trips is built with. Because My Trips is just a fun little side project for me that I did on the site, completely outside of my regular job, and the usage of My Trips is very low, I can't justify spending the time that it would take to redesign My Trips with a supported technology.

In a nutshell, FBML allowed me to pretty quickly create My Trips without having to specify font sizes, colors, etc. Things like the tabbed look of My Trips are possible with a very simple FBML command. When I started work on My Trips in 2009, Facebook was promoting FBML as one way to create Facebook apps. Had it not been for FBML, I probably would not have created My Trips. However, in 2010 Facebook started discouraging the use of FBML. I suspect this is mainly because it uses too many resources on their servers. I don't agree with Facebook's decision to completely abandon FBML, however, as a a software developer I can understand why they would abandon it.

I'd like to thank everyone for using My Trips over the years. If you know of any Facebook apps that provide similar functionality, please post a comment here!

Thursday, March 29, 2012

jQuery .on performance

jQuery's .on() function is very useful.  It allows you to bind event listeners for elements that haven't yet been created.  On pages where you're dynamically adding elements, this can make the code much cleaner and unobtrusive.  Rather than attaching the event handler to every newly created element one at a time, simply attach a class to all new elements, and call .on() for this class name with the event handler function once when the page loads for the first time.

.on() simply grabs the event when it happens at the higher level that you specify (usually document or a container div), checks if the element that caused the event matches any of the selectors for any added .on() calls, and if so calls your handler.

This functionality is also provided by .live(), but as of jquery 1.7, this function is deprecated. Use .on() instead.

tl;dr

Use .on()! Using .on() to capture the event over attaching a handler directly to each element has virtually no performance impact when the event is triggered, even when there are a huge number of unique elements with their own .on() handler on the page. However, using .on() does have a very noticeable performance advantage when generating/rendering elements. So any performance arguments against .on() are invalid.

Measuring Performance

Because of the way that it works, you may think that there is a performance hit to using .on() instead of attaching the handler to each element when it's created.  So I decided to do some extensive testing to see if this was the case.

I wrote a simple test page that dynamically generates lots of clickable elements.  See this page at http://coordinatecommons.com/jquery-on-test.html.

For each test case, there are two different measures of performance. First is how long it takes to dynamically generate the elements. When using .on, this is mostly the time to simply generate the DOM elements. However, when using .click to bind the listener one at a time, it takes longer because of the added step to attach the listener at this point.

The second measure is how long it takes for the callback to be called after clicking. For this, the time is how long between the parent container's mousedown event and the event handler being called. Because the initial time is on mousedown, there is some variability test to test based on how much time it took me to let go of the mouse button. So any result here can vary by 100-150ms, the results should not be analyzed any more precise than 150ms intervals. And realistically you can probably subtract on average 80-100ms from each of these to get the actual times.

Test Cases

  1. Generate 10,000 divs with the same class name, using .on - generate 10,000 of the same type of element that will all use the same event handler. Attach the same class name to all elements, one call to .on.
  2. Generate 10,000 divs with one of 100 different classes names, click handler using .on - 100 different event handlers, 10,000 total elements. .on is called 100 times
  3. Generate 1,000 divs with unique classes, click handler using .on - 1,000 unique event handlers for 1,000 elements. .on is called 1,000 times
  4. Generate 10,000 divs with unique classes, click handler using .on - 10,000 unique event handlers for 10,000 elements. .on is called 10,000 times
  5. Generate 1,000 divs with unique IDs, click handler using .click - attach an event listener to each element with .click as the element is being added.
  6. Generate 10,000 divs with unique IDs, click handler using .click - same as above but with 10,000 elements.

Tests 1 and 6 are the ones that will really evenly compare performance of attaching a handler to each element as it's added versus using .on.

Test Conditions

For Chrome, Firefox, and IE9, a desktop machine (quad core 3 GHz, 8 gigs of RAM) running Windows 7 Professional 64 bit was used. For IE6, 7, and 8, a Windows XP Virtualbox VM running on the desktop machine above was used.

Performance Results Table

Chrome 17 Firefox 11 IE9 IE8 IE7 IE6
Test 1 RENDER- 10K same class/handler .on 912 ms 271 ms 3020 ms 3142 ms 3668 ms 3877 ms
Test 1 CLICK - 10K same class/handler .on 70 ms 74 ms 110 ms 121 ms 110 ms 133 ms
Test 2 RENDER - 10K one of 100 class .on 1081 ms 344 ms 3270 ms 4857 ms 5732 ms 5965 ms
Test 2 CLICK - 10K one of 100 .on 94 ms 114 ms 111 ms 131 ms 137 ms 95 ms
Test 3 RENDER - 1,000 unique classes .on 328 ms 164 ms 832 ms 1483 ms 1385 ms 1021 ms
Test 3 CLICK - 1,000 unique classes .on 140 ms 162 ms 107 ms 140 ms 107 ms 120 ms
Test 4 RENDER - 10,000 unique classes .on 2772 ms 1397 ms 14050 ms 15602 ms 47609 ms 29614 ms
Test 4 CLICK - 10,000 unique classes .on 245 ms 252 ms 149 ms 421 ms 409 ms 442 ms
Test 5 RENDER - 1,000 unique ID .click 281 ms 175 ms 898 ms 1983 ms 2133 ms 2023 ms
Test 5 CLICK - 1,000 unique ID .click 106 ms 112 ms 100 ms 103 ms 100 ms 90 ms
Test 6 RENDER - 10,000 unique ID .click 2826 ms 1576 ms 14618 ms 50673 ms 65835 ms 66606 ms
Test 6 CLICK - 10,000 unique ID .click 80 ms 113 ms 106 ms 94 ms 100 ms 130 ms

Results


Using .on() to capture the event over attaching a handler directly to each element has virtually no performance impact when the event is triggered, even when there are a huge number of unique elements with their own .on() handler on the page. I expected there to be at least some noticeable lag in the click times when there are 10,000 unique elements, but it was only noticeable on IE8 and below and just barely noticeable. And, that's with using .on() in a way that it shouldn't be used. Test 1 is the way that .on() should be used, and it performs wonderfully. Times are identical to test 6, where each element has a directly attached handler.

However, using .on() does have a very noticeable performance advantage when generating/rendering elements. This is obvious in test 1, the render times for the same number of elements is anywhere from 7 to 17 times faster than attaching the handler to each rendered element!

So based on this my recommendation is to use .on() to attach event handlers any time there will be more than one element added with the same function used for the handler.

Other observations


Another thing I found interesting is that on nearly all tests, Firefox is the fastest. Chrome is definitely behind Firefox for these tests. Also, seeing the numbers for IE8, it's a real shame that nearly 25% of the world is using this browser. Microsoft did very little to improve performance in between 6 and 8, and performance improvements in 9 many times are very small. Microsoft, IE10 better be blazingly fast! And, please, work on getting Windows XP users to upgrade to IE10. Firefox and Chrome run perfectly well on Windows XP, your own browser should as well.

Sunday, January 22, 2012

Database migrations with deployed JRuby WAR files

In my previous post Compiling Rails project for distribution in a WAR file with JRuby, I explained how to build a  WAR file from a Rails project to distribute on to systems that have a Java app server like Tomcat or Glassfish.  If you're running this in production, you're probably going to want to run database migrations after the WAR file is deployed.  Unfortunately this is not as straightforward as you might expect.  But it's not too difficult.  To run database migrations, you must first create a file in your project, config/warbler.rb with the following contents:

Then, add a file named script/db_migrate with the following contents:

Now, on the production system, after the WAR file has been deployed, from the root directory of your web app, run the command:

jruby -S ./script/db_migrate

If you're running in 1.9 mode, add --1.9 before the -S. This assumes that you have a jruby executable in your path somewhere on the server. There should be a way to run the JRuby that is bundled in the WAR file, but I have not spent enough time looking in to it to figure out how. Has anyone had success with this?

Tuesday, January 17, 2012

Compiling Rails project for distribution in a WAR file with JRuby

I recently started using JRuby for a Rails project and overall the experience has been excellent.  Using RVM, you can just switch to jruby to build a new project (rvm use jruby), and just about everything will work the same.  One of the big features of JRuby is that you can bundle your entire app, including JRuby itself, in to a WAR file that Java servers like Tomcat and Glassfish can serve up, so your app can be distributed on to servers that only have Java.

After you have JRuby installed, simply install the gem warbler.  You'll then get a command line tool, warble, to generate war files from your project.  Simply run warble from your project's directory, and a war file will be produced.  It's as easy as that!

Another great feature is that the Ruby code can be compiled down to Java class files, so your source code is not visible.  This is great for distributing on to a server where other companies will have access that you don't want seeing your source code.  However, this is not working for me.  Warbler should support this, just run "warble compiled war" from the command line in your projects directory instead of "warble".  This will produce a war file with both .rb and .class files.  The .rb files however are simply stubs that require the .class file, none of your code is in there.  But, for me, it's not generating .class files on all of my controllers.  I've entered an issue for this at https://github.com/jruby/warbler/issues/72.


I will be making another post on how to do database migrations on the server you're deploying to.

Wednesday, January 4, 2012

Rails 3.1 "Could not find a JavaScript runtime" error

I just got my first Rails 3.1 on JRuby project created.  However I keep getting the error "Could not find a JavaScript runtime ..." when trying to do pretty much anything with it.  After digging in to this some, it turns out that with the addition of CoffeeScript in 3.1, Rails needs to run Javascript code natively.  A gem called execjs was added to Rails to allow this.  EXCEPT, execjs itself needs something else to actually evaluate the Javascript.    See https://github.com/sstephenson/execjs for a list.  If you're running in Mac or Windows, you're good, there are system libraries to do it.  But if you're in Linux, unless you have node.js installed, it won't work.

Let me repeat that.  Rails 3.1 apps by default will NOT work in Linux, unless you have node.js installed.  This is absolutely ridiculous... guess what Rails team, a lot of people are developing in Linux and not Mac.  I understand how stuff not working in Windows is released, but Linux??

So to get things to work you'll need to add one of the gems listed on the execjs page.  If you're running the MRI Ruby, just add:

gem 'therubyracer'

to your Gemfile.  I've read a lot of complaints about therubyracer.  But it appears to be the most popular.  If you're running JRuby like me, add:

gem 'therubyrhino'

to your Gemfile.  I've verified that this works correctly.

FOLLOW UP 1/13/2012:
See my comments.  I have made a fix to Rails to put the appropriate gem in the Gemfile if you are in Linux, and issued pull requests to the Rails core to incorporate this code.  They have made some comments on it, hopefully it will be pulled in to Rails soon.