Search This Blog

Sunday, December 23, 2012

Install Hadoop 1.1.1 on macbook pro OS X 10.8.2

I recently installed hadoop on my new Macbook and here is the steps I followed. to get it working.
I write this with the hope that someone might find this useful.

First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.

First up I used homebrew to install hadoop
brew install hadoop

I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.

This is how my conf files look(located in  /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.



core-site.xml



<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
    <description>A base for other temporary directories.</description>
  </property>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9000</value>
  </property>
</configuration>

Note:Had to create two folders as the original poster indicates like this


mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp


hdfs-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
 NOTE:change dfs.replication according to your needs.



mapred-site.xml

<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>


Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"





Format the Hadoop Namenode using:
hadoop namenode -format





Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh

Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/

Health can be checked at http://localhost:50070/dfshealth.jsp

You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100

You should see output similar to the following


Number of Maps  = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient:  map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient:  map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient:  map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient:  map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient:  map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient:  map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient:  map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient:  map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient:   Job Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient:     Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient:   File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient:   File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient:     Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient:   FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient:   Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient:     Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient:     Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient:     Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient:     Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient:     Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient:     Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient:     Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient:     Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000

Hope this helps.






No comments:

Post a Comment