Thursday, June 19, 2008

Google Visualization Library (Gap Minder) and a nice little plugin

Google bought Hans Roslings Gap Minder over a year ago and I've been dying for them to release an API ever since. They've finally done it. The animated bubble charts were what I really wanted to sink my teeth into so naturally I've built a rails plugin to do just that. The API itself is javascript and flash. It promises to not store/steal your data, but writing all your data out in javascript is less than desirable. So how about something like this:


<% gap_minder_for(@collection, :width => 500, :height => 300) do |gm| %>
<% gm.label("Department") {|c| c.department } %>
<% gm.time("Time") {|c| c.time } %>
<% gm.x("X Procedure") {|c| c.x_procedure * 2 } %>
<% gm.y("Y Procedure") {|c| c.y } %>
<% gm.bubble_size("Volume of Cases") {|c| c.volume } %>
<% gm.extra_column("Something extra to select") {|c| c.extra_stuff } %>
<% end %>


I like it.

To explain it's simply a block helper calling a templating class and doing some funky meta-programming to make it as extensible as you want it. There are 5 required method calls on the gap minder object passed to the block (label, time, x, y, and bubble_size).

There are 2 other note worthy methods. First, color. Be default the color procedure keeps track of the label procedure values and assigns a color accordingly. You can override that with your own block passed in the same way that the other methods are passed (first argument is the title, the block receives an item of your collection).

The second method of note is extra_column. This can be called any number of times. With this tool x, y, z, and even the color metric used in the graph can be selected from drop downs. By adding extra columns you can add other data as selectable options.

It's that simple. It's up on github. Enjoy!

Special thanks to Mark Daly for trudging through the Google documentation for me and pushing me to meet deadlines. C2X, right?

Saturday, May 31, 2008

Migration: Change the file name fix the pain?

So I'm watching the keynote by Jeremy Kepler and he had struggles... everyone does... with rails migrations. Sadly the big answer to our woes with migrations was changing the identifier on the front of the file name from a incrementing integer to a time stamp. Well I'm not going to dog on the change because I've thought it was obvious and needed, but why in the world are we putting that time stamp at the front of the file? We name our migrations for a reason and perhaps it's just my love of the command line but I want to use what I've named it instead of trying to get the time stamp associated with a migration before hitting tab. So we could argue about storing the names in the database table for migrations and all that jazz, but its such a simple patch to put the time stamp at the end of the file. Well that's what I'll do. With arguably the nicest string manipulation methods in any standard library I think I can pull off the ever complicated reverse! method I'll have to use to do it.

Rails = Success; Me = Failure

It's been forever and a half... which I've been recently told that the world will end in 2038 so I guess it's not that long. I've decided to stop letting better be the enemy of good on this blog and just write stuff. We'll see what people can get out of it. I fully intend to finish the meta-programming guide in ruby ASAP as exposure at rails conf has lead me to believe that it is desperately needed.

This years rails conf (Portland, OR) has been great so far. DHH gave a keynote that I very much appreciated last night on doing something else with your life, and taking advantage of the surplus of time we have as rails developers (while it lasts) to improve who we are as programmers and people.

Avi Bryant has done some incredible work with stacking ruby on top of smalltalk technology. He and Gemstone gave an incredibly promising presentation yesterday. (Maglev)

Last night I caught the last MAX (1:13 AM) back to my hotel. Well worth it though. I hosted a "Semantics on Rails" birds of a feather session and there were some fantastic people there. I'm certainly more motivated to fix, finish, and for the love of sammy sosa write specs for ActiveSesame... which may soon be known as "Angular Cat."

Tonight we're going to have a semantic hack fest after the Science 2.0 BoF. I hope to get some junk done before then.

Saturday, January 12, 2008

Ruby Meta-Programming: a bread and butter guide (Part 1)

I was on the Rails forum the other day and someone was looking for Ruby meta-programming articles. I suggested "The Ruby Way," a fantastic book with a good section on meta-programming, but I'm surprised I had nothing but a book to suggest. So I asked myself, "Self, why hasn't page rank stumbled across a grand and all encompassing meta-programming guide for ruby?" I have a guess: There aren't any. Ruby is, for the most part, SmallTalk reborn. But its hype has only come through the growing Rails community. Rails programmers, at least the new ones (which I think are the vast majority and will be for another 2 years or so) learn Rails and forget to learn Ruby. Ruby is not the cool thing to them. Plus, many come from a Java background so they don't even know what higher order procedures are. To them Lambda is the Ultimate unknown. It's not their fault though... I blame their parents. So I'm going to take some time to give some introductory examples and explanations on Ruby Meta Programming.

Higher Order Procedures: For those still writing java/jsp/php/asp code in Ruby a higher order procedure is a procedure that takes a procedure as an argument or returns a procedure. Read that carefully. I did not say it takes the evaluation of a procedure. This is not a higher order procedure:



def not_higher_order(arg1)
return performed_function(arg1)
end

not_higher_order(different_performed_function("blah"))


The above passes the returned value of different_performed_function("blah") into not_higher_order which returns the return value of performed_function(arg1). Higher order procedures use procedures themselves (in Ruby the Proc object) as the argument and or return value, not its substitution after execution. So here is an example of a higher order procedure you must have used in Ruby. It's already built in to Enumerable Objects:


[1,2,3,4].collect {|i| i * 2 } # Returns [2,4,6,8]


Blocks are Ruby syntactic sugar for using higher order procedures. You use them all the time in Ruby, which is why you can do so much with Ruby so quickly without extensive knowledge and experience with a massive standard library. Every time you use a block you create an anonymous (unnamed) procedure and pass it into the called procedure (collect in this case) as an argument. As you will mostly use blocks instead of the Proc.new object or Lambda keyword we will glaze over what those are at this point. However, you should look them up to get a better understanding of how procedures are first order objects in Ruby. Lets build our first higher order procedure after one that exists that you might not know about: Inject:


#Monkey patching Enumerable (will explain shortly) module Enumerable
#the ampersand declares the argument being passed is a proc object
module Enumerable
def inject(memo, &block)
#run enumerables each method and reset memo
#with the evaluation of the passed block
self.each {|enumerated_item| memo = yield(memo, enumerated_item) }
return memo
end
end
end

puts [1,2,3,4].inject(0) {|memo,item| memo + item } #returns 10 (the result of 0+1+2+3+4)
puts ["sam","likes","to","eat","bunnies"].inject("The person ") {|memo, item| memo + item} #returns "The person samlikestoeatbunnies"
puts [1,2,3,4].inject(8) {|memo,item| item % 2 == 0 ? memo + item : memo} #returns 14 (8+2+4)


What we just did is rebuild the inject method. We reopened the Enumerable module which is "mixed in" to Array, String, etc and overwrote the already existing inject method. The original and what we wrote do exactly the same thing. Then we used the & in front of the argument named block to let the ruby interpreter know that this is a procedure being passed as an argument. Because we use this special notation we can then call our new inject function with a block. (side note: You could also use keywords such as lambda to pass any procedure to any named argument. But often you'll only need/want one and so it's a nice clean way to do it.) Then we called the each method and used within it the yield keyword. This takes as its arguments the parameters you wish to pass to the block. You then catch those parameters in the block within the pipes: |memo, item|. The variable names in the pipes don't matter, just like in a collect or each statement. They are how you reference those values within the block itself.

If you've never done anything like this before take some time to experiment. See if you can recreate the Fibonacci numerical pattern and pass each iteration to an anonymous procedure (block). If you want to cheat there is an example of how to do this within the Ruby Pickaxe (Programming Ruby).

Higher order procedures are hugely important. If you are a rails programmer there is no better way to DRY up your views than by building block helpers and using concat and capture. If you plan to build a domain specific language (DSL) for some strange task in Ruby or Rails blocks are a must have so that you don't introduce 5000 extra keywords to Ruby and your documentation needs a fork lift to be moved. For a more in depth look at why and where you might want to use higher order procedures in any language check out the first chapter of "Structure and Interpretation of Computer Programming," also available in a video lecture.

ActiveSesame: In Progress

Working in the world of medical informatics it's impossible to avoid ontology hype. Through the '90s the answer to every problem of disparate medical terminology between institutions, departments, and IT systems was coding. Let's give everything it's own unique code and it will all be okay.
return coding == failure? build_ontologies : save_the_world!
It failed, adding yet another layer of complexity to an already confused industry. So the world started building ontologies. RDF (resource descriptions framework) and OWL (the web ontology language, renamed from WOL when Time Finnin said wol was a stinky name and suggested swapping letters) is all the rage. Web 3.0! Well the Cyc project has been working since the 80s on a true ontology for the world... Ever hear of the Psych project? Well that should tell you something... But the thing that urks me about the situation isn't the dreamers. We need those senior academics... maybe. What is missing is good tools for using ontologies in programs. So a while back I convinced my boss that I should do something about it and started work on ActiveSesame. I wanted a Ruby Gem that would be the ActiveRecord of Triple Stores with the obvious bonus: ActiveSesame would have to read the ontology and build Ruby objects on the fly. Why? Isn't ActiveRDF out there? The sad answer to that question is that I couldn't get ActiveRDF to connect to a triple store. I tried for a few days and gave up. Now, that could be my fault entirely and I'm excited for what the ActiveRDF cats are doing, but it's good to have a few projects to choose from. And as of today the ActiveRDF docs still bite. So here is a feature list of what I'm shooting for before this years RailsConf.

ActiveSesame Feature List:
  • Connect to and Interact with the AllegroGraph triple store via the Sesame Protocol
    • Make SPARQL queries
    • Add Triples to the store
  • Build Ruby Objects based on SPARQL xml return data
    • Build Classes whose instances are RDF individuals
      • Build dynamically or Declared as a Model allowing application specific extensions
    • Instances to include methods for all RDF attributes with a Domain of the class of the individual
    • When attributes range is an RDF class and not a literal build new build a new Ruby Object(s) based on cardinality rules
    • Handle Blank nodes
  • Save RDF Objects to the triples store
    • New Objects can be saved with .save
      • Include a uniqueness check
    • Objects already in the triple store can be updated
      • Only the attributes which have been changed will be updated
  • Abstract common SPARQL Queries into .find method
    • Grab classes by namespace (MyRDFClass.find(:first))
    • Build RDF xml to datatype methods to make find_by_sparql easier to use

So It's a big list, but I have a major chunk of it done. Certainly enough to want to show it off a bit at RailsConf. It wasn't terribly hard actually. Ruby has some excellent meta-programming features. I say excellent, but really they should be in every language. The ideas behind them are not new. I've been hammered with a lot of other things at work recently and haven't been able to work on it for a while, but if my rails conf proposal is accepted then I'll have great reason to demand more time to work on it from my employers.