Search This Blog

Wednesday, December 26, 2012

Install Sqoop and Hbase on macbook pro OS X 10.8.2

Apache Sqoop helps transfer data between hadoop and datastores(such as relational databases like oracle, db2 and a bunch of others). Read more about sqoop here
http://sqoop.apache.org/

If you are just getting started with hadoop you may want to refer to my earlier posts regarding installing hadoop:
http://springandgrailsmusings.blogspot.com/2012/12/install-hadoop-111-on-macbook-pro-os-x.html
and installing hive:
http://springandgrailsmusings.blogspot.com/2012/12/installing-hive-on-on-macbook-pro-os-x.html

As I mentioned in my previous posts, homebrew provides a simple way to install anything, in this case, sqoop.

Open a terminal and install sqoop with this command:
brew install sqoop

Homebrew takes care of installing all related dependencies for you, which for sqoop are, hbase and zookeeper.



Your terminal output should be similar to this:

$ brew install sqoop
==> Installing sqoop dependency: hbase
==> Downloading http://www.apache.org/dyn/closer.cgi?path=hbase/hbase-0.94.2/hbase-0.94.2.tar.gz
==> Best Mirror http://www.poolsaboveground.com/apache/hbase/hbase-0.94.2/hbase-0.94.2.tar.gz
######################################################################## 100.0%
==> Caveats
Requires Java 1.6.0 or greater.

You must also edit the configs in:
  /usr/local/Cellar/hbase/0.94.2/libexec/conf
to reflect your environment.

For more details:
  http://wiki.apache.org/hadoop/Hbase
==> Summary
/usr/local/Cellar/hbase/0.94.2: 3086 files, 115M, built in 3.9 minutes
==> Installing sqoop dependency: zookeeper
==> Downloading http://www.apache.org/dyn/closer.cgi?path=zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
==> Best Mirror http://www.fightrice.com/mirrors/apache/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
######################################################################## 100.0%
/usr/local/Cellar/zookeeper/3.4.5: 193 files, 12M, built in 18 seconds
==> Installing sqoop
==> Downloading http://apache.mirror.iphh.net/sqoop/1.4.2/sqoop-1.4.2.bin__hadoop-1.0.0.tar.gz
######################################################################## 100.0%
==> Caveats
Hadoop, Hive, HBase and ZooKeeper must be installed and configured
for Sqoop to work.
==> Summary
/usr/local/Cellar/sqoop/1.4.2: 60 files, 4.4M, built in 24 seconds



Now you are all set to use sqoop to work with any supported data store.
Have fun.


Installing hive 0.9 on macbook pro OS X 10.8.2



If you are reading this post I assume you are interested in getting started with hive on your macbook and already have hadoop installed. For details on installing hadoop please refer to my post here
http://springandgrailsmusings.blogspot.com/2012/12/install-hadoop-111-on-macbook-pro-os-x.html

Again howbrew provides an easy way to get hive on your mac.
Run this from your mac terminal:
> brew install hive

You will see brew installs hive on your mac and you will see output similar to the one below:
==> Downloading http://www.apache.org/dyn/closer.cgi?path=hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
==> Best Mirror http://apache.claz.org/hive/hive-0.9.0/hive-0.9.0-bin.tar.gz
######################################################################## 100.0%
==> Caveats
Hadoop must be in your path for hive executable to work.
After installation, set $HIVE_HOME in your profile:
  export HIVE_HOME=/usr/local/Cellar/hive/0.9.0/libexec

You may need to set JAVA_HOME:
  export JAVA_HOME="$(/usr/libexec/java_home)"
==> Summary
/usr/local/Cellar/hive/0.9.0: 276 files, 25M, built in 13 seconds

Export HIVE_HOME and JAVA_HOME as prompted from your terminal

export HIVE_HOME=/usr/local/Cellar/hive/0.9.0/libexec
export JAVA_HOME="$(/usr/libexec/java_home)"

Now you can start hive as follows:

/usr/local/Cellar/hive/0.9.0/bin/hive


You should be all set at this point to work with hive.
Hope this helps.


Sunday, December 23, 2012

Install Hadoop 1.1.1 on macbook pro OS X 10.8.2

I recently installed hadoop on my new Macbook and here is the steps I followed. to get it working.
I write this with the hope that someone might find this useful.

First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.

First up I used homebrew to install hadoop
brew install hadoop

I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.

This is how my conf files look(located in  /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.



core-site.xml



<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Note:Had to create two folders as the original poster indicates like this


mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp


hdfs-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
 NOTE:change dfs.replication according to your needs.



mapred-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>


Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"





Format the Hadoop Namenode using:
hadoop namenode -format





Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh

Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/

Health can be checked at http://localhost:50070/dfshealth.jsp

You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100

You should see output similar to the following


Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient:  map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient:  map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient:  map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient:  map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient:  map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient:  map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient:  map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient:  map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient:   Job Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient:   File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient:   File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient:   FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient:   Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient:     Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient:     Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient:     Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient:     Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient:     Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000

Hope this helps.






Thursday, October 4, 2012

Using logback and slf4j in grails application

Recently I configured grails application with logback.
Logback is better than log4j as mentioned here: http://logback.qos.ch/reasonsToSwitch.html

The changes to a grails application are these

We need to exclude log4j in buildconfig.
We can also specify where the logback configuration file is located


In BuildConfig.groovy

   

   // inherit Grails' default dependencies

    inherits("global") {

        ...
        // excludes 'ehcache'
excludes 'grails-plugin-log4j'
    }

    dependencies {
  ....
compile 'ch.qos.logback:logback-classic:1.0.6'
    }

//last line in the file to specify config file location
this.classLoader.rootLoader.addURL(new File("${basedir}/grails-app/conf").toURI().toURL()) 

Once the changes above are made, we can use annotation to log information.

Using logging in your controller or service is as simple as using an annotation and the right log method.
You need to use the annotation @Slf4j.


Here is an example:

import groovy.util.logging.Slf4j

@Slf4j
class XXXController {

def yyyy() {
log.debug "yyyy called."
...
render view:"myview";
}
We were able to use the @Slf4j in services controller and even spock integration tests.

Hope this helps you in configuring  Slf4j and logback in your grails application.

Monday, February 21, 2011

Registering new bean classes in grails application programmatically

As part of amazing goodies of grails is the ability to extend your applications -  registering new classes programmaticaly is a breeze
As the grails guru Burt points out here it just requires minimal effort;

//first create a bean definition
def myBeanDef = new GenericBeanDefinition()
//set bean class
myBeanDef.setBeanClass(YourBeanClass)
//set scope
myBeanDef.setScope(BeanDefinition.SCOPE_SINGLETON)
//set others

//register
grailsApplication.mainContext.registerBeanDefinition "someName", myBeanDef 

Programming in grails is really a pleasure.

Saturday, February 12, 2011

Using Amazon SES from grails application

This summary is not available. Please click here to view the post.

Thursday, January 27, 2011

Getting user credentials in a grails application using Spring Security


Getting user credentials in a grails application using Spring Security is pretty easy

If you followed the instructions given for spring security plugin then LoginController takes care of storing the user credentaisl for you.
To retrieve the stored credentials you need to do these
1. Import SecurityContextHolder
import org.springframework.security.core.context.SecurityContextHolder as SCH

2.Get the principal
SCH.context?.authentication?.principal

3. once you get hold of the principal object you can retrieve and use any creds.

SCH.context?.authentication?.principal?.username


Update: Thanks for Burt for pointing out how I can make this easier.

I can use SpringSecurity service injected  by declaring
def springSecurityService

Then this can be used like 
                     springSecurityService.principal 
to get the principal.




Tuesday, January 11, 2011

Removing unwanted jar dependencies in the grails built war file

Peter Ledbrook shared a neat way of removing runtime dependency in the grails built war file. If for e.g. we wanted to exclude hsqldb
    grails.project.dependency.resolution = {
        inherits("global") {
            if (Environment.current == Environment.PRODUCTION) {
                exclude "hsqldb"
            }
        }
        ...
    }

Friday, December 24, 2010

Grails template driven emailing with mail plugin


In this post I will walk through how to configure email using mail plugin.
1. Install mail plugin in your project.
grails install-plugin mail
2.
Add activation and mail jars to lib directory.
I use these jars
activation:1.1.1
mail-1.4.3
3.
To resources.groovy add mailSender definition and configure it
    mailSender(org.springframework.mail.javamail.JavaMailSenderImpl) {
        host = 'smtp.gmail.com'
        port = 465
        username = 'xxx@gmail.com'
        password = 'xxx'
        javaMailProperties = ['mail.smtp.auth': 'true',
                'mail.smtp.socketFactory.port': '465',
                'mail.smtp.socketFactory.class': 'javax.net.ssl.SSLSocketFactory',
                'mail.smtp.socketFactory.fallback': 'false']
    }
    
    
4. Add a email template(if needed)
Create a directory grails-app\views\emailtemplates
Add a email.gsp with code you want -sample below
<%@ page contentType="text/html" %>
<html>
<head><title>${mytitle}</title></head>
<body><h1>${mymail}</h1></body>
</html>
ContentType of the gsp should be changed based on your need.
5. In your controller add the emailing code
           mailService.sendMail{
        to    "xxx@yahoo.com"
                from  "xxx@gmail.com"
                subject  "Email test"
            body  (
                view: "/emailtemplates/email",
                model: [mytitle: "My title", mymail: "Test Email"]
            )
            }
            
     If you dont need a template to render from just supply the body of the email.

Tuesday, December 21, 2010

Grails and JSON:Consuming JSON data via multi-part form upload

JSON is a concise format that is extremely popular for data exchange.
Most webservices these days consume JSON as input for processing instead of XML as it is lightweight(depenedent on domain)

Here is an example of consuming a json file uploaded via a web form in you web application using grails.

Grails as usual provides fantastic support for JSON parsing.
Parsing a JSON input and creating domain objects out this is very easy with grails.

I have a domain object like this
class Message {
String name
String content
    static constraints = {
    name(nullable:false)
    content(nullable:false)
    }
}

Here is a sample JSON input file that can be used for creating multiple domain objects

File:Messages.json
[
{"name":"ann","content":"What a Wonderful World"},
{"name":"nandhu","content":"Wonderful World"},
]

Note that JSON representation is very simple.
Key,Value pairs separated by ":" for each property and enclosed in {} constitute one object.
Objects can be separated by comma and enclosed with [] to constitute an array of like objects.

In order to upload the JSON file we would need a multipart form upload gsp as shown below

//form multipart upload code
        <div>
        <g:form method="post" action="postRest" enctype="multipart/form-data">
    <input type="file" name="file"/>
    <input type="submit"/>
</g:form>
        </div>

This form does a post to action called "postRest" on the same controller.
A different controller can be used by using controller="name" property in addition

Now finally you need to add the controller action.
There you have to get the JSON representation first.
Then you create a domain object and fill data into it from JSON representation.
Finally you save this object.
Make sure you check the newly saved object to ensure there arent any errors.

    def postRest = {
     println request.getFile("file").inputStream.text
    
     def jsonArray = JSON.parse(request.getFile("file").getInputStream(),"UTF-8")
     println "jsonArray=${jsonArray}"
     jsonArray.each{
     println it
     Message m = new Message()
     bindData(m, it)
     m.save(flush:true)
     if(m.errors) {
m.errors.each{println it}    
     }
     }
    }

In the next post I will blog on how to expose this as a webservice.

Monday, December 20, 2010

Grails testing support enhanced:full emulation of GORM

Grails testing support enhanced to provide a full emulation of GORM.
Peter LedBrook(author of Grails in Action) has written a post how to get this in your project.
Pretty easy config and powerful features added.
Grails Rocks!

Config changes needed in short are


1.Add to repositories:
mavenRepo "http://maven.springframework.org/milestone" 
mavenRepo "http://snapshots.repository.codehaus.org" 


2. 
test( "org.grails:grails-datastore-gorm-test:1.0.0.M1") {
     excludes "persistence-api", "sl4j-simple", "commons-logging"

}

3. Add to test
 @Mixin(DatastoreUnitTestMixin) and 
add a call to disconnect() in your tearDown()


Wednesday, December 8, 2010

Grails and multi tenancy - Multi tenancy setup with grails




I was playing with multi tenancy setup with grails.
While there were some examples I found most them were old and werent updated for recent grails versions.
Ran into a bunch of errors when I tried to use multitenancy plugin with spring security.
Had to try out a few things before i was able to correct all the errors and get this working.
I try here to give a step by step guide to setting up multitenancy for grails based apps.

Create a simple domain class
grails create-domain-class com.helloworld.Message

install spring security plugin
>grails install-plugin spring-security-core

Create the user and roles
grails s2-quickstart org.racetrack User Role

Install multitenant plugin
grails install-plugin multi-tenant

install Multi-Tenant Spring Security Integration
grails install-plugin multi-tenant-spring-security

Add to config.groovy
tenant {
mode = "multiTenant" // "singleTenant" OR "multiTenant"
datasourceResolver.type = "db"
resolver {
type = "request"
resolver.request.dns.type = "db"
request.dns.type = "db"
}
}


Annotate domain classes that are to be shared after importing Multitenat annotation
import com.infusion.tenant.groovy.compiler.MultiTenant;

@MultiTenant

Add to the User Class
/** for multi-tenant-acegi plugin **/
Integer userTenantId
Add the following constraints 
// Existing constraints
userTenantId(min: 0)
Create a dns map table
grails create-dns-map
Now with all the setup in place you should be able to run your app with multitenancy setup.