We’re still actively developing wukong. The newest version is available via Git on github:
$ git clone git://github.com/infochimps-labs/wukong
A gem is available from gemcutter:
$ sudo gem install wukong --source=http://gemcutter.org
(don’t use the gems.github.com version — it’s way out of date.)
You can instead download this project in either zip or tar formats.
1. Allow Wukong to discover where his elephant friend lives by setting a $HADOOP_HOME environment variable: export HADOOP_HOME="/usr/local/share/hadoop"
2. Add wukong’s bin/
directory to your $PATH if you’d like to use the wutils
(see also: Ruby Hadoop Quickstart)
Wukong was primarily developed for Hadoop, and we think it’s the best way to use Hadoop (it’s certainly the most fun!).
Even if you have a bunch of machines with spare cycles, lots of RAM, and a shared filesystem… do yourself a favor and start out using the Cloudera AMIs on Amazon’s EC2 cloud. There are an overwhelming number of fiddly little parameters and you’ll be glad for the user experience before you get into server setup. If it’s still mid-late 2009 when you read this, ignore prudence and jump straight to using Hadoop 0.20. It will be a) more fun, b) much more robust (trust me, at “v0.20” you want to live on the bleeding edge), and c) you won’t have to suffer through migrating your HDFS two weeks after setup.
To set up hadoop, your best bet are the Cloudera AMIs on Amazon’s EC2 compute cloud:
EC2 means anyone with a $10 bill can rent a 10-machine cluster with 1TB of distributed storage for 8 hours.
AWS Elastic MapReduce saves the trouble of even setting up a cluster: click, bam, there it is.
Phil Ripperger has prepared a Ruby Hadoop Quickstart explaining how to get started with Wukong, Hadoop and the Amazon Elastic MapReduce cloud — it’s better than anything we could put here. Thanks Phil!
If you have a local cluster, or just want to experiment with a single-machine install, check out the Cloudera packages for both Debian/Ubuntu-based and Redhat/RPM-based Linux systems.
I’ve braindumped some random notes on configuring and using hadoop over here
Wukong is used by many in an non-Hadoop environment — anywhere you can stream data records, you can unleash its monkey power.
Please see the usage notes for more!