Saturday, January 12, 2008

Ruby Meta-Programming: a bread and butter guide (Part 1)

I was on the Rails forum the other day and someone was looking for Ruby meta-programming articles. I suggested "The Ruby Way," a fantastic book with a good section on meta-programming, but I'm surprised I had nothing but a book to suggest. So I asked myself, "Self, why hasn't page rank stumbled across a grand and all encompassing meta-programming guide for ruby?" I have a guess: There aren't any. Ruby is, for the most part, SmallTalk reborn. But its hype has only come through the growing Rails community. Rails programmers, at least the new ones (which I think are the vast majority and will be for another 2 years or so) learn Rails and forget to learn Ruby. Ruby is not the cool thing to them. Plus, many come from a Java background so they don't even know what higher order procedures are. To them Lambda is the Ultimate unknown. It's not their fault though... I blame their parents. So I'm going to take some time to give some introductory examples and explanations on Ruby Meta Programming.

Higher Order Procedures: For those still writing java/jsp/php/asp code in Ruby a higher order procedure is a procedure that takes a procedure as an argument or returns a procedure. Read that carefully. I did not say it takes the evaluation of a procedure. This is not a higher order procedure:



def not_higher_order(arg1)
return performed_function(arg1)
end

not_higher_order(different_performed_function("blah"))


The above passes the returned value of different_performed_function("blah") into not_higher_order which returns the return value of performed_function(arg1). Higher order procedures use procedures themselves (in Ruby the Proc object) as the argument and or return value, not its substitution after execution. So here is an example of a higher order procedure you must have used in Ruby. It's already built in to Enumerable Objects:


[1,2,3,4].collect {|i| i * 2 } # Returns [2,4,6,8]


Blocks are Ruby syntactic sugar for using higher order procedures. You use them all the time in Ruby, which is why you can do so much with Ruby so quickly without extensive knowledge and experience with a massive standard library. Every time you use a block you create an anonymous (unnamed) procedure and pass it into the called procedure (collect in this case) as an argument. As you will mostly use blocks instead of the Proc.new object or Lambda keyword we will glaze over what those are at this point. However, you should look them up to get a better understanding of how procedures are first order objects in Ruby. Lets build our first higher order procedure after one that exists that you might not know about: Inject:


#Monkey patching Enumerable (will explain shortly) module Enumerable
#the ampersand declares the argument being passed is a proc object
module Enumerable
def inject(memo, &block)
#run enumerables each method and reset memo
#with the evaluation of the passed block
self.each {|enumerated_item| memo = yield(memo, enumerated_item) }
return memo
end
end
end

puts [1,2,3,4].inject(0) {|memo,item| memo + item } #returns 10 (the result of 0+1+2+3+4)
puts ["sam","likes","to","eat","bunnies"].inject("The person ") {|memo, item| memo + item} #returns "The person samlikestoeatbunnies"
puts [1,2,3,4].inject(8) {|memo,item| item % 2 == 0 ? memo + item : memo} #returns 14 (8+2+4)


What we just did is rebuild the inject method. We reopened the Enumerable module which is "mixed in" to Array, String, etc and overwrote the already existing inject method. The original and what we wrote do exactly the same thing. Then we used the & in front of the argument named block to let the ruby interpreter know that this is a procedure being passed as an argument. Because we use this special notation we can then call our new inject function with a block. (side note: You could also use keywords such as lambda to pass any procedure to any named argument. But often you'll only need/want one and so it's a nice clean way to do it.) Then we called the each method and used within it the yield keyword. This takes as its arguments the parameters you wish to pass to the block. You then catch those parameters in the block within the pipes: |memo, item|. The variable names in the pipes don't matter, just like in a collect or each statement. They are how you reference those values within the block itself.

If you've never done anything like this before take some time to experiment. See if you can recreate the Fibonacci numerical pattern and pass each iteration to an anonymous procedure (block). If you want to cheat there is an example of how to do this within the Ruby Pickaxe (Programming Ruby).

Higher order procedures are hugely important. If you are a rails programmer there is no better way to DRY up your views than by building block helpers and using concat and capture. If you plan to build a domain specific language (DSL) for some strange task in Ruby or Rails blocks are a must have so that you don't introduce 5000 extra keywords to Ruby and your documentation needs a fork lift to be moved. For a more in depth look at why and where you might want to use higher order procedures in any language check out the first chapter of "Structure and Interpretation of Computer Programming," also available in a video lecture.

ActiveSesame: In Progress

Working in the world of medical informatics it's impossible to avoid ontology hype. Through the '90s the answer to every problem of disparate medical terminology between institutions, departments, and IT systems was coding. Let's give everything it's own unique code and it will all be okay.
return coding == failure? build_ontologies : save_the_world!
It failed, adding yet another layer of complexity to an already confused industry. So the world started building ontologies. RDF (resource descriptions framework) and OWL (the web ontology language, renamed from WOL when Time Finnin said wol was a stinky name and suggested swapping letters) is all the rage. Web 3.0! Well the Cyc project has been working since the 80s on a true ontology for the world... Ever hear of the Psych project? Well that should tell you something... But the thing that urks me about the situation isn't the dreamers. We need those senior academics... maybe. What is missing is good tools for using ontologies in programs. So a while back I convinced my boss that I should do something about it and started work on ActiveSesame. I wanted a Ruby Gem that would be the ActiveRecord of Triple Stores with the obvious bonus: ActiveSesame would have to read the ontology and build Ruby objects on the fly. Why? Isn't ActiveRDF out there? The sad answer to that question is that I couldn't get ActiveRDF to connect to a triple store. I tried for a few days and gave up. Now, that could be my fault entirely and I'm excited for what the ActiveRDF cats are doing, but it's good to have a few projects to choose from. And as of today the ActiveRDF docs still bite. So here is a feature list of what I'm shooting for before this years RailsConf.

ActiveSesame Feature List:
  • Connect to and Interact with the AllegroGraph triple store via the Sesame Protocol
    • Make SPARQL queries
    • Add Triples to the store
  • Build Ruby Objects based on SPARQL xml return data
    • Build Classes whose instances are RDF individuals
      • Build dynamically or Declared as a Model allowing application specific extensions
    • Instances to include methods for all RDF attributes with a Domain of the class of the individual
    • When attributes range is an RDF class and not a literal build new build a new Ruby Object(s) based on cardinality rules
    • Handle Blank nodes
  • Save RDF Objects to the triples store
    • New Objects can be saved with .save
      • Include a uniqueness check
    • Objects already in the triple store can be updated
      • Only the attributes which have been changed will be updated
  • Abstract common SPARQL Queries into .find method
    • Grab classes by namespace (MyRDFClass.find(:first))
    • Build RDF xml to datatype methods to make find_by_sparql easier to use

So It's a big list, but I have a major chunk of it done. Certainly enough to want to show it off a bit at RailsConf. It wasn't terribly hard actually. Ruby has some excellent meta-programming features. I say excellent, but really they should be in every language. The ideas behind them are not new. I've been hammered with a lot of other things at work recently and haven't been able to work on it for a while, but if my rails conf proposal is accepted then I'll have great reason to demand more time to work on it from my employers.