Futuretech - Starfish

Posted by tobi — 10:41 PM Aug 18

Lucas Carlson talks about his exciting new distributed application approach dubbed Starfish which isessentially a 20% work (or less) 80% the effect implementation of google’s phenomenally clever MapReduce technology.

A distributed log file parser can look as simple as this:


    server do |map_reduce|
      map_reduce.type = File
      map_reduce.input = "/tmp/big_log_file" 
      map_reduce.queue_size = 1000 # how many lines of the file to
buffer at a time
      map_reduce.lines_per_client = 100 # how many lines each client
will process at a time
      map_reduce.rescan_when_complete = true
    end

    client do |line|
      if line =~ /some_regex/
        logger.info(line)
      end
    end

Save and run it as statistics.rb and run


# starfish statistics.rb

Which leads the server to read in the affectionaly called big_ass_file in chunks of 1000 lines, tickle them out to any amount of clients which in turn parse the data by regexp and report back their findings to the server. The server can then act upon this new found wisdom. Perhaps by updating your client’s statistics or by issuing warnings to abusive customers. Any sizable data mining task should be accomplishable with this strategy.

The library only works with ActiveRecord data sets at this point but array and file, as demonstrated above, are in the pipeline.

Update: Starfish 1.1 is now available with the file support mentioned above.

Comments

Commenting are now closed…