I recently installed hadoop on my new Macbook and here is the steps I followed. to get it working.
I write this with the hope that someone might find this useful.
First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.
First up I used homebrew to install hadoop
brew install hadoop
I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.
This is how my conf files look(located in /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.
core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Note:Had to create two folders as the original poster indicates like this
mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp
hdfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
NOTE:change dfs.replication according to your needs.
mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
Format the Hadoop Namenode using:
hadoop namenode -format
Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh
Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/
Health can be checked at http://localhost:50070/dfshealth.jsp
You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100
You should see output similar to the following
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient: map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient: map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient: map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient: map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient: map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient: map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient: map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient: map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient: Job Counters
12/12/23 16:31:19 INFO mapred.JobClient: Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient: File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient: File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient: FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient: Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient: Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient: Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient: Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient: Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient: Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient: Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient: Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000
Hope this helps.
I write this with the hope that someone might find this useful.
First up there a couple of very nice posts regarding this which helped me get this done.
http://ragrawal.wordpress.com/2012/04/28/installing-hadoop-on-mac-osx-lion
http://dennyglee.com/2012/05/08/installing-hadoop-on-osx-lion-10-7/
http://geekiriki.blogspot.com/2011/10/flume-and-hadoop-on-os-x.html
I mainly followed these three(i mixed steps provided by couple of them) to get my installation working.
First up I used homebrew to install hadoop
brew install hadoop
I enabled Remote Login on my mac and created a rsa key using ssh-keygen
Finally I tested I was able to ssh, by doing ssh localhost.
I used rsa but dsa can be used as well for ssh.
This is how my conf files look(located in /usr/local/Cellar/hadoop/1.1.1/libexec/conf folder)
The links provided above detail these-I have not added any changes of my own except for the hadoop install dir change.
core-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/Cellar/hadoop/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Note:Had to create two folders as the original poster indicates like this
mkdir /usr/local/Cellar/hadoop/hdfs
mkdir /usr/local/Cellar/hadoop/hdfs/tmp
hdfs-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
NOTE:change dfs.replication according to your needs.
mapred-site.xml
<?xml version=”1.0″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<!– Put site-specific property overrides in this file. –>
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9010</value>
</property>
</configuration>
Find the line # export HADOOP_OPTS=-server
Now add this line
export HADOOP_OPTS="-Djava.security.krb5.realm=OX.AC.UK -Djava.security.krb5.kdc=kdc0.ox.ac.uk:kdc1.ox.ac.uk"
Format the Hadoop Namenode using:
hadoop namenode -format
Start Hadoop by running the script:
/usr/local/Cellar/hadoop/1.1.1/libexec/bin/start-all.sh
Run
ps ax | grep hadoop | wc -l
If you see 6 as output you are all set.
If not check the logs at
ls /usr/local/Cellar/hadoop/1.1.1/libexec/logs/
Health can be checked at http://localhost:50070/dfshealth.jsp
You can run an example that is provided like this
cd /usr/local/Cellar/hadoop/1.1.1/libexec
Run this command
hadoop jar /usr/local/Cellar/hadoop/1.1.1/libexec/hadoop-examples-1.1.1.jar pi 10 100
You should see output similar to the following
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/12/23 16:31:00 INFO mapred.FileInputFormat: Total input paths to process : 10
12/12/23 16:31:00 INFO mapred.JobClient: Running job: job_201212231524_0003
12/12/23 16:31:01 INFO mapred.JobClient: map 0% reduce 0%
12/12/23 16:31:04 INFO mapred.JobClient: map 20% reduce 0%
12/12/23 16:31:06 INFO mapred.JobClient: map 40% reduce 0%
12/12/23 16:31:08 INFO mapred.JobClient: map 60% reduce 0%
12/12/23 16:31:09 INFO mapred.JobClient: map 80% reduce 0%
12/12/23 16:31:11 INFO mapred.JobClient: map 100% reduce 0%
12/12/23 16:31:12 INFO mapred.JobClient: map 100% reduce 26%
12/12/23 16:31:18 INFO mapred.JobClient: map 100% reduce 100%
12/12/23 16:31:19 INFO mapred.JobClient: Job complete: job_201212231524_0003
12/12/23 16:31:19 INFO mapred.JobClient: Counters: 27
12/12/23 16:31:19 INFO mapred.JobClient: Job Counters
12/12/23 16:31:19 INFO mapred.JobClient: Launched reduce tasks=1
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=16432
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/12/23 16:31:19 INFO mapred.JobClient: Launched map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: Data-local map tasks=10
12/12/23 16:31:19 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=13728
12/12/23 16:31:19 INFO mapred.JobClient: File Input Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Read=1180
12/12/23 16:31:19 INFO mapred.JobClient: File Output Format Counters
12/12/23 16:31:19 INFO mapred.JobClient: Bytes Written=97
12/12/23 16:31:19 INFO mapred.JobClient: FileSystemCounters
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_READ=226
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_READ=2560
12/12/23 16:31:19 INFO mapred.JobClient: FILE_BYTES_WRITTEN=267335
12/12/23 16:31:19 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=215
12/12/23 16:31:19 INFO mapred.JobClient: Map-Reduce Framework
12/12/23 16:31:19 INFO mapred.JobClient: Map output materialized bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Map input records=10
12/12/23 16:31:19 INFO mapred.JobClient: Reduce shuffle bytes=280
12/12/23 16:31:19 INFO mapred.JobClient: Spilled Records=40
12/12/23 16:31:19 INFO mapred.JobClient: Map output bytes=180
12/12/23 16:31:19 INFO mapred.JobClient: Total committed heap usage (bytes)=1931190272
12/12/23 16:31:19 INFO mapred.JobClient: Map input bytes=240
12/12/23 16:31:19 INFO mapred.JobClient: Combine input records=0
12/12/23 16:31:19 INFO mapred.JobClient: SPLIT_RAW_BYTES=1380
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input records=20
12/12/23 16:31:19 INFO mapred.JobClient: Reduce input groups=20
12/12/23 16:31:19 INFO mapred.JobClient: Combine output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Reduce output records=0
12/12/23 16:31:19 INFO mapred.JobClient: Map output records=20
Job Finished in 19.303 seconds
Estimated value of Pi is 3.14800000000000000000
Hope this helps.
No comments:
Post a Comment