December 20, 2013

Apache BigTop (Hadoop ecosystem installation)

Installing all Hadoop and related software is a painful experience.  For development environment, it's probably the easiest to use BigTop.  The site explains it as:
    Bigtop  
    Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem.

    The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc...) developed by a community with a focus on the system as a whole, rather than individual projects.
Installing all the Hadoop and related software can take days and weeks -- Hadoop/Yarn, ZooKeeper, Pig, Hive, HBase, Mahout, Whirr, Oozie, Sqoop, Hue, Flume...etc.  And BigTop make it possible to install most of them very quickly (like about an hour or less).

You just need to make sure you to meet all the requirements and follow the steps correctly.  I've done this for CentOS 6.3 64-bit and worked fine. 

Requirements - http://bigtop.apache.org/
Install Steps - https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.6.0

The latest version is 0.7.0, so for the first step, I did this:

  1. sudo wget -O /etc/yum.repos.d/bigtop.repo http://bigtop.s3.amazonaws.com/releases/0.7.0/redhat/6/x86_64/bigtop.repo
  2. sudo yum install hadoop\* flume-* mahout\* oozie\* whirr-* hive\* hue\*
I haven't tested this thoroughly, and it's for single-server environment setup.  For cluster setup, I'm still going to do manually -- I have to know how the systems are structured and what configuration files are involved and located anyway. For development, this seems to be the way to go for quick set up.


No comments: