JRuby Deployment with GlassFish and Capistrano

Background

At CollegeJobConnect, we need to process Word and PDF-format resumes that we receive from our members.  In order to carefully curate our initial candidate pool, we have done a lot of this work by hand.  However, as we grow we need to automate as much of the registration process as we can.  This involves using automated text extraction to retrieve some basic attributes from these files.  Fortunately, there are some great open source tools out there that wrangle Word and PDF documents into raw text.  Unfortunately, they are written in Java.  Enter JRuby.

JRuby makes our job drop-dead simple.  We can tap into the Java libraries we need, while keeping our code in the language we love.  Unfortunately, the state-of-the-art in JRuby deployment is in a bit of disarray.  Charles Nutter does a good job of guiding the decision in his August 2009 blog post.  We settled on his recommended approach for simple apps -- GlassFish gem.  However, things fell apart in the details.  A post on Jacob Kessler's blog at Sun provided some guidance, but we had trouble getting his deployment script working properly.  We also wanted something that we could administer with initscripts, to DRY out process control.  After a few hours of head-scratching frustration, we got things running.  Here's how:
Prerequisites
  • Java 6 JRE (sun-java6-jre on Ubuntu)
  • JRuby 1.5
  • Git
(Our server is running Ubuntu 10.04 LTS "Lucid Lynx", but this should work in 9.10 and 8.04, and with minor adjustments in any similar Unix environment.)
Getting It Done
1. Install GlassFish gem:
sudo jruby -S gem install glassfish
2. Capify your JRuby application. From your application's root directory, run:
capify .
3. Customize config/deploy.rb for GlassFish.  (Change the 
"example_app" 
references to match your app.)
4. Install the glassfish-example_app initscript in /etc/init.d.  (Change the "example_app" references and customize the GlassFish arguments to suit your fancy.)
5. Install the initscript in rc.d (this will only work on Ubuntu/Debian):
sudo update-rc.d glassfish-example_app defaults
6. Deploy your app with cap deploy.
... and you're done!  Post in the comments if you have any comments or questions.  If this saves you time, let me know!

On Privacy (or: what Buzz failed to learn from Newsfeed)

On Friday, September 8, 2006, Mark Zuckerberg published an apology for the botched launch of Facebook Newsfeed.

The gist of his open letter was:
  • We did a poor job explaining the new features
  • We did not provide anywhere near enough privacy control
  • We are launching better privacy controls today, after a marathon 48 hour coding session
A public apology from the CEO of a major company is commendable.  But contrast this with his post on Tuesday night, just 3 days beforehand:
  • Newsfeed is great and you need to give it a chance
  • Your privacy settings have not changed
  • Things that were private are still private
  • Your friends can see the exact same things they could see before
What changed?  Why the sudden change in tone?  Mark was coming to grips with a fundamental issue facing social software:

Companies don't understand privacy!

... and as a result, they make the same mistakes, over and over again.

We live in public.  And we always have.  Moreso now, perhaps.  But if you have ever had your picture taken, eaten at a restaurant, or had an argument in a public place, a little bit of your self has been copied into the ether.  Yet, just because we do these things does not mean we want them to be public.  That argument with your significant other would drop dead if a film crew showed up and pointed a camera in your direction.

In a declarative sense, one can think about privacy as action and context.  What we do, and where and with whom we do it.  Indeed, the Googles and Facebooks of the world have gotten pretty sophisticated about this kind of privacy.  "Share my vacation photos with my close friends."  "Invite my coworkers to my housewarming."

However, (as is typical with computers) things break down when you start to make inferences.  She wrote on someone's public wall, therefore she intended for everyone to read.  She posted the pictures on her public blog, therefore she wants her coworkers to see them.  It's pretty easy to see where the wheels come off.  That public argument we were discussing earlier?  It happened in the middle of Faneuil Hall with thousands of people around.  Let's podcast it automatically!

Action: audio conversation; context: extraordinarily public place.  Privacy setting: Everyone.

What's missing is intent.

Just because something is public does not mean it is intended to be seen.  We do things in public all the time that would be humiliating or destructive if broadcasted.  We are so used to doing these things that sometimes we don't even notice the context in which we are acting!  Relying on the context of an action to determine intent is a recipe for failure.

Privacy and the human mind.

As we use software, we map its features to our intentions.  We learn how to perform actions, and we learn to control context using privacy settings.  But that map is not perfect.  We can be confused or misled by the abundance of options or by our interpretation of instructions.  We can be downright lazy.  Or, like the public argument, we can simply forget (or ignore) context.  When those things happen, we rely on software to understand our intent, and shield us from mistakes caused by our imperfect mental map.  And unfortunately, when it comes to understanding intent, computers fail.  (SorryGoogle.)

What about Buzz?

Back to Buzz.  The controversy over privacy is not stemming from the design of Buzz.  It stems from default settings.  The Google Buzz team is attempting to make Buzz useful right out of the gate by guessing your preferred list of followers and followees.  But computers fail at intent, and despite thoughtful design of privacy settings, Buzz is a privacy failure.

A path forward?

Failed launches like Newsfeed and Buzz can teach us something about how people think about privacy, and how to design software to be understood and accepted by its users.  Facebook taught us that declarative privacy settings can work.  However, both of these failures highlight two more key issues:
  • Computers suck at intent.  Inferring privacy preferences for new software, based on prior actions in old software, is a recipe for failure, and a PR nightmare.
  • People assume computers are great at intent.  We publish things to much wider contexts than we intend, and don't notice or care until new products and features make incorrect inferences based on that.
The good news is that smart people are working on these problems.  Let's just hope they are learning from each other.

In summary
  • Shockingly, these mistakes have been made before
  • Companies like Facebook and Google are only just beginning to understand privacy
  • Explicit, declarative privacy settings are good
  • Users are loose with their settings, and trust software to "do the right thing"
  • When it comes to social software, computers suck at figuring out what the right thing is
  • Products can improve if competitors can learn from each other

Installing Redis on CentOS 4

TL;DR: CentOS 4 initscript for Redis at: http://bit.ly/7a23Fe

Vanity Is Nice

I am a huge fan of Vanity, Assaf Arkin's "Experiment Driven Development" framework.  (A/B testing and metric tracking.)  "Gem install" to "cap deploy" in under an hour.  Talk about reducing the time investment required to get going with data-driven development!  It Just Makes Life Easier.

Vanity uses Redis as its backing store.  Redis is fast and lightweight -- no problem there.  It builds on CentOS right out of the gate.  However, I want to use Redis with Vanity on a production site.  If Redis is down, Vanity crashes and takes my app with it.  I need it installed at runlevels 345, so it always starts at boot time. (We need adequate monitoring, too, but that's a topic for another post.)

CentOS Needs a Friend

There is a Ubuntu initscript floating around, but building dpkg to get the start-stop-service binary is a little heavyweight for my taste.  Instead, I've gutted an old Nginx initscript, and it works handily.

You can find it at: http://bit.ly/7a23Fe

If You Came for the Show

How to install Redis on CentOS 4, in 13 easy steps:
  1. curl http://redis.googlecode.com/files/redis-1.02.tar.gz | tar zx
  2. cd redis-1.02
  3. make
  4. sudo cp redis-server /usr/local/sbin
  5. sudo cp redis-cli /usr/local/bin
  6. sudo mkdir /var/lib/redis /etc/redis
  7. sudo sed -e "s/^daemonize no$/daemonize yes/" -e "s/^dir \.\//dir \/var\/lib\/redis\//" -e "s/^loglevel debug$/loglevel notice/" -e "s/^logfile stdout$/logfile \/var\/log\/redis.log/" redis.conf > /etc/redis/redis.conf
  8. curl http://gist.github.com/gists/257849/download | tar -zxO > redis-server
  9. chmod u+x redis-server
  10. mv redis-server /etc/init.d
  11. sudo /sbin/chkconfig --add redis-server
  12. sudo /sbin/chkconfig --level 345 redis-server on
  13. sudo /sbin/service redis-server start
Congratulations, you have a complete Redis installation!
  • Config: /etc/redis/redis.conf
  • Data: /var/lib/redis
  • Logfile: /var/log/redis.log
  • PID: /var/run/redis.pid
If you tweak the paths, you'll need to edit redis.conf and/or the redis-server initscript accordingly.

Pirates Are People Too