Spring and Grails Musings: Install Hadoop 1.1.1 on macbook pro OS X 10.8.2

I recently installed hadoop on my new Macbook and here is the steps I followed. to get it working.
I write this with the hope that someone might find this useful.

First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.

First up I used homebrew to install hadoop
brew install hadoop

I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.

This is how my conf files look(located in /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.

core-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Note:Had to create two folders as the original poster indicates like this

mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp

hdfs-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
NOTE:change dfs.replication according to your needs.

mapred-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>

Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"

Format the Hadoop Namenode using:
hadoop namenode -format

Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh

Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/

Health can be checked at http://localhost:50070/dfshealth.jsp

You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100

You should see output similar to the following

Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient: map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient: map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient: map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient: map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient: map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient: map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient: map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient: map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient: Job Counters
12/12/23 16:31:19 INFO mapred.JobClient: Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient: File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient: File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient: FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient: Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient: Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient: Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient: Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient: Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient: Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient: Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient: Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000

Hope this helps.

Spring and Grails Musings

Search This Blog

Sunday, December 23, 2012

Install Hadoop 1.1.1 on macbook pro OS X 10.8.2

No comments:

Post a Comment