Friday, May 23, 2014

Hadoop Cluster setup using Cloudera on Ubuntu 12.04.1

4 Node Cluster Setup for Testing @Azuga
Hardware configurations:
                4 Dell machines with bellow hardware configurations
1) Memory : 8 Gib
2) Processor : Intel Core i5-3340s CPU @2.80GHz*4
3) Disk : 1 TB
4)  all machines should be a same network and able to ping each other.
Software's:
ubuntu-12.04.1-desktop-i386 64-bit
Open SSH client and server install.
Oracle Jdk7 installer.

Pre Installation steps:
1) install the ubuntu-12.04.1-desktop-i386 64-bit .
            use the same user and password for all the machines
            Ex : user/Pass: azuga/azuga
2) Login into the Linux box with user & password "azuga/azuga"
3) Update the Ubuntu
            $ sudo apt-get update
            then restart machine.
4) change the hosts and hostname with static ip address.
            $ sudu gedit /etc/hosts
            master  : 192.168.10.91
            slave1  : 192.168.10.92
            slave2  : 192.168.10.93
            slave3  : 192.168.10.94  ..... etc.

            $ sudo gedit /etc/hostname

            update the file according the machine name
            for master machine just add "master"
            for slave1 machine just add "slave1"... likewise for others



5) install open ssh client and server for accessing using putty and winscp.

$ sudo apt-get install openssh-client (all machines)
$ sudo apt-get install openssh-server ( all machines)
Note: after installation all the machines we can able to connect through putty and WinScp from windows.
6) Setup the password less sudo access for all the machines.
      $ sudo nano /etc/sudoers
-- modify users and groups as belloew
      root  ALL=(ALL:ALL) NOPASSWD:ALL
      azuga   ALL=(ALL:ALL) NOPASSWD:ALL     

      sudo    ALL=(ALL:ALL) NOPASSWD:ALL

7)Update the hosts files for all the machines
keep the default localhost and its ip for master node( its required for Postgree sql installation)
      $ sudo nano /etc/hosts
      192.168.10.91 master
      192.168.10.92 slave1
      192.168.10.93 slave2
      192.168.10.94 slave3
8) setup the password less access for all the machines for ssh access
 -- generate the key and add to all the slave machines
so master node can access all the slave machines without password
      -- for generate key
      $ ssh-keygen -t rsa -P "" -f ~/.ssh/id_dsa
      --for self ssh access
      $ cat $HOME/.ssh/id_dsa.pub >> $HOME/.ssh/authorized_keys
      -- for slave machines ssh access   username:<user for slave machine> slave-hostname:<slave host>
      $ ssh-copy-id -i $HOME/.ssh/id_dsa.pub username@slave-hostname   

examples:
ssh-copy-id -i $HOME/.ssh/id_dsa.pub azuga@master
ssh-copy-id -i $HOME/.ssh/id_dsa.pub azuga@slave1
ssh-copy-id -i $HOME/.ssh/id_dsa.pub azuga@slave2
ssh-copy-id -i $HOME/.ssh/id_dsa.pub azuga@slave3

9)install  JDK7 (internet is required for bellow)
                a) $ sudo add-apt-repository ppa:webupd8team/java
                b) $ sudo apt-get update && sudo apt-get install oracle-jdk7-installer
                c) $ sudo gedit /etc/environment
                                update the PATH="..:/..:/..:/use/lib/jvm/java-7-oracle/bin"
                                Append to the end of the file:   JAVA_HOME=/usr/lib/jvm/java-7-oracle
                d)$ source /etc/environment


10) Hadoop installation:
                Download CDH one click installer.
$ mkdir cloudera                              -- create a folder for one click installer.
$ sudo wget  - http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
$ sudo chmod ugo+rwx  cloudera-manager-installer.bin                                --make executable file
$ sudo  ./ cloudera-manager-installer.bin                              -- installation starts for here

Once done it will give the browser URL to choose the CDH packages
open the browser
                http://localhost:7180/  -- will open the Cloudera manager login page User/pass: admin/admin
Just fallow the default selections and continue .. and give all the master and slave host name for cluster
Select the user as 'azuga' instead of 'root' then just continue..
Add the required  services to the cluster and its nodes.. then restart all the services.

......Installation done successfully....

12) Enable Oozie Web Console:
                $ cd /var/lib/oozie/                         -- go to the Oozie dir
                $ sudo wget - http://extjs.com/deploy/ext-2.2.zip                          --down load ext-2.2.zip
                $ sudo jar -xvf ext-2.2.zip                             --extract the ext-2.2.zip
then restart the Oozie service. then open the Oozie console.
http://localhost:11000/oozie/
ex to run Job:
oozie job -oozie http://master:11000/oozie  -config /home/azuga/hadoop/examples/apps/ssh/job.properties -run

for oozie server DB need to reset the MySql in CM console. and pointing to oozie DB user.
and install the jdbc driver jar in oozie lib .
$ sudo cp  mysql-connector-java-5.1.27-bin.jar  /var/lib/oozie/
$ sudo cp /home/azuga/cloudera/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar /opt/cloudera/parcels/CDH-4.6.0-1.cdh4.6.0.p0.26/lib/sqoop/lib
for running oozie
                $  export OOZIE_URL="http://master:11000/oozie"
ex to run Job:
oozie job -config /home/cloudera/examples/apps/sqoop/job.properties -run
13) install MySql connecter to Sqoop.
down load java mysql connector for sqoop.
$ sudo cp mysql-connector-java-5.1.27-bin.jar /opt/cloudera/parcels/CDH-4.5.0-1.cdh4.5.0.p0.30/lib/sqoop/lib/

note: If you installed Sqoop via Cloudera Manager, using parcels, copy the .jar file to $HADOOP_CLASSPATH instead of /usr/lib/sqoop/lib/