Search This Blog

Wednesday, December 26, 2012

Install Sqoop and Hbase on macbook pro OS X 10.8.2

Apache Sqoop helps transfer data between hadoop and datastores(such as relational databases like oracle, db2 and a bunch of others). Read more about sqoop here
http://sqoop.apache.org/

If you are just getting started with hadoop you may want to refer to my earlier posts regarding installing hadoop:
http://springandgrailsmusings.blogspot.com/2012/12/install-hadoop-111-on-macbook-pro-os-x.html
and installing hive:
http://springandgrailsmusings.blogspot.com/2012/12/installing-hive-on-on-macbook-pro-os-x.html

As I mentioned in my previous posts, homebrew provides a simple way to install anything, in this case, sqoop.

Open a terminal and install sqoop with this command:
brew install sqoop

Homebrew takes care of installing all related dependencies for you, which for sqoop are, hbase and zookeeper.



Your terminal output should be similar to this:

$ brew install sqoop
==> Installing sqoop dependency: hbase
==> Downloading http://www.apache.org/dyn/closer.cgi?path=hbase/hbase-0.94.2/hbase-0.94.2.tar.gz
==> Best Mirror http://www.poolsaboveground.com/apache/hbase/hbase-0.94.2/hbase-0.94.2.tar.gz
######################################################################## 100.0%
==> Caveats
Requires Java 1.6.0 or greater.

You must also edit the configs in:
  /usr/local/Cellar/hbase/0.94.2/libexec/conf
to reflect your environment.

For more details:
  http://wiki.apache.org/hadoop/Hbase
==> Summary
/usr/local/Cellar/hbase/0.94.2: 3086 files, 115M, built in 3.9 minutes
==> Installing sqoop dependency: zookeeper
==> Downloading http://www.apache.org/dyn/closer.cgi?path=zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
==> Best Mirror http://www.fightrice.com/mirrors/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
######################################################################## 100.0%
/usr/local/Cellar/zookeeper/3.4.5: 193 files, 12M, built in 18 seconds
==> Installing sqoop
==> Downloading http://apache.mirror.iphh.net/sqoop/1.4.2/sqoop-1.4.2.bin__hadoop-1.0.0.tar.gz
######################################################################## 100.0%
==> Caveats
Hadoop, Hive, HBase and ZooKeeper must be installed and configured
for Sqoop to work.
==> Summary
/usr/local/Cellar/sqoop/1.4.2: 60 files, 4.4M, built in 24 seconds



Now you are all set to use sqoop to work with any supported data store.
Have fun.


Installing hive 0.9 on macbook pro OS X 10.8.2



If you are reading this post I assume you are interested in getting started with hive on your macbook and already have hadoop installed. For details on installing hadoop please refer to my post here
http://springandgrailsmusings.blogspot.com/2012/12/install-hadoop-111-on-macbook-pro-os-x.html

Again howbrew provides an easy way to get hive on your mac.
Run this from your mac terminal:
> brew install hive

You will see brew installs hive on your mac and you will see output similar to the one below:
==> Downloading http://www.apache.org/dyn/closer.cgi?path=hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
==> Best Mirror http://apache.claz.org/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
######################################################################## 100.0%
==> Caveats
Hadoop must be in your path for hive executable to work.
After installation, set $HIVE_HOME in your profile:
  export HIVE_HOME=/usr/local/Cellar/hive/0.9.0/libexec

You may need to set JAVA_HOME:
  export JAVA_HOME="$(/usr/libexec/java_home)"
==> Summary
/usr/local/Cellar/hive/0.9.0: 276 files, 25M, built in 13 seconds

Export HIVE_HOME and JAVA_HOME as prompted from your terminal

export HIVE_HOME=/usr/local/Cellar/hive/0.9.0/libexec
export JAVA_HOME="$(/usr/libexec/java_home)"

Now you can start hive as follows:

/usr/local/Cellar/hive/0.9.0/bin/hive


You should be all set at this point to work with hive.
Hope this helps.


Sunday, December 23, 2012

Install Hadoop 1.1.1 on macbook pro OS X 10.8.2

I recently installed hadoop on my new Macbook and here is the steps I followed. to get it working.
I write this with the hope that someone might find this useful.

First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.

First up I used homebrew to install hadoop
brew install hadoop

I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.

This is how my conf files look(located in  /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.



core-site.xml



<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Note:Had to create two folders as the original poster indicates like this


mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp


hdfs-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
 NOTE:change dfs.replication according to your needs.



mapred-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>


Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"





Format the Hadoop Namenode using:
hadoop namenode -format





Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh

Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/

Health can be checked at http://localhost:50070/dfshealth.jsp

You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100

You should see output similar to the following


Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient:  map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient:  map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient:  map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient:  map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient:  map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient:  map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient:  map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient:  map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient:   Job Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient:   File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient:   File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient:   FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient:   Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient:     Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient:     Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient:     Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient:     Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient:     Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000

Hope this helps.






Thursday, October 4, 2012

Using logback and slf4j in grails application

Recently I configured grails application with logback.
Logback is better than log4j as mentioned here: http://logback.qos.ch/reasonsToSwitch.html

The changes to a grails application are these

We need to exclude log4j in buildconfig.
We can also specify where the logback configuration file is located


In BuildConfig.groovy

   

   // inherit Grails' default dependencies

    inherits("global") {

        ...
        // excludes 'ehcache'
excludes 'grails-plugin-log4j'
    }

    dependencies {
  ....
compile 'ch.qos.logback:logback-classic:1.0.6'
    }

//last line in the file to specify config file location
this.classLoader.rootLoader.addURL(new File("${basedir}/grails-app/conf").toURI().toURL()) 

Once the changes above are made, we can use annotation to log information.

Using logging in your controller or service is as simple as using an annotation and the right log method.
You need to use the annotation @Slf4j.


Here is an example:

import groovy.util.logging.Slf4j

@Slf4j
class XXXController {

def yyyy() {
log.debug "yyyy called."
...
render view:"myview";
}
We were able to use the @Slf4j in services controller and even spock integration tests.

Hope this helps you in configuring  Slf4j and logback in your grails application.