Just 15 steps required to setup hadoop in local and run map reduce jobs. Watch below
- Check the JDK by typing in the terminal java -version java version “1.8.0_65” Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode)
- Setup password less SSH Check. >> ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa >> cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- Download Hadoop from Hadoop binary 2.7.1 version
- untar the binary(tar zxvf hadoop-2.7.1.tar.gz) and place it inside a folder.
- Edit the bash_profile by doing vi .bash_profile as below.
export JAVA_HOME=$(/usr/libexec/java_home) export HADOOP_PREFIX=/Users/<root>/<your_folder>/hadoop-2.7.1 export HADOOP_HOME=$HADOOP_PREFIX export HADOOP_COMMON_HOME=$HADOOP_PREFIX export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export HADOOP_HDFS_HOME=$HADOOP_PREFIX export HADOOP_MAPRED_HOME=$HADOOP_PREFIX export HADOOP_YARN_HOME=$HADOOP_PREFIX export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin - Check, whether Hadoop is installed or not by typing in the terminal.That should display version information for hadoop. >> cd $HADOOP_PREFIX >> bin/hadoop version
- Open your core-site.xml present inside /hadoop-2.7.1/etc/hadoop/ and add the property as below
- <configuration>
<property><name>fs.defaultFS</name> <value>hdfs://localhost:8020</value>
<description>NameNode URI</description>
</property>
</configuration> - Open your hdfs-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
<configuration>
<property> <name>dfs.datanode.data.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/datanode</value> <description>Paths on the local filesystem for DataNode blocks.</description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///Users/<root>/<your_folder>/hadoop-2.7.1/hdfs/namenode</value> <description>Path on the local filesystem for the NameNode namespace and transaction logs.</description> </property>
</configuration> - Open your mapred-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>512</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>1024</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx618m</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx384m</value> </property> <property> <name>mapreduce.jobtracker.address</name> <value>local</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>1024</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx618m</value> </property> </configuration> - d.Open your yarn-site.xml present inside hadoop-2.7.1/etc/hadoop/ and add the property as<configuration>
<property>
<name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> <description>Total RAM available to all containers on a node</description> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <description>Total # of CPU available to all containers on a node</description> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>128</value> <description>Minimum RAM per container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>2048</value> <description>Max RAM allocated to a container</description> </property> <property> <name>yarn.scheduler.minimum-allocation-vcores</name> <value>1</value> <description>Min core allocated to a container</description> </property> <property> <name>yarn.scheduler.maximum-allocation-vcores</name> <value>2</value> <description>Max core allocated to a container</description>
</property><
/configuration> - Format the namenode by typing in there terminal as below >> $HADOOP_HOME/bin/hdfs namenode -format
- Start hadoop, Go to /hadoop-2.7.1/ and start hadoop by typing as
sbin/start-all.sh - You should see the below services has been started.
14340 DataNode
14452 SecondaryNameNode
14660 NodeManager
14712 RunJar
14569 ResourceManager
14251 NameNode
14765 Jps - Check the namenode: http://localhost:50070
- Check YARN: http://localhost:8088
- Now run the map reduce job by going inside the /hadoop2.7.1/bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar randomwriter out
- To Stop hadoop, type, sbin/stop-all.sh
Download the checksum hadoop-X.Y.Z-src.tar.gz.sha512 or hadoop-X.Y.Z-src.tar.gz.mds from Apache. Shasum -a 512 hadoop-X.Y.Z-src.tar.gz; All previous releases of Hadoop are available from the Apache release archive site. Many third parties distribute products that include Apache Hadoop and related tools. Download the Latest Version of Hadoop. The next step is to download the latest version of Hadoop. There are many ways in which you can install Hadoop and some are more simple than others. For the purposes of this exercise, and to get maximum understanding of Hadoop, I'm going to do a basic install from the Apache download. Bitnami Hadoop Stack Virtual Machines Bitnami Virtual Machines contain a minimal Linux operating system with Hadoop installed and configured. Using the Bitnami Virtual Machine image requires hypervisor software such as VMware Player or VirtualBox.Both of these hypervisors are.
After comparing different guides on the internet, I ended up my own version base on the Hadoop official guide with manual download. If you prefer Homebrew, this one would be your best choice. Actually there is no difference in the configuration of these two methods except the file directories. Here I extend the official guide by more details in case you need it.
Also, this guide is part of my Hadoop tutorial 1. It aims to setting up the pseudo-distributed mode in single node cluster. And I will explain the HDFS configurations and command lines in Hadoop tutorial 2.
1. Required software
1) Java
Run the following command in a terminal:
If Java is already installed, you can see a similar result like:
If not, the terminal will prompt you for installation or you can download Java JDK here.
2) SSH
First enable Remote Login in System Preference -> Sharing.
Now check that you can ssh to the localhost without a passphrase:
If you cannot ssh to localhost without a passphrase, execute the following commands:
2. Get a Hadoop distribution
You can download it from Apache Download Mirror.
3. Prepare to start the Hadoop cluster
1) Unpack the downloaded Hadoop distribution.
2) Run the following command to figure out where is your Java home directory:
You can see a result like:
3) In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:
4) Try the following command:
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Standalone mode
- Pseudo-distributed mode
- fully-distributed mode
We will go through pseudo-distributed mode and run a MapReduce job on YARN here. In this mode, Hadoop runs on a single node and each Hadoop daemon runs in a separate Java process.
4. Configuration
![How Do I Download Hadoop On Mac How Do I Download Hadoop On Mac](/uploads/1/3/3/4/133401082/629477779.png)
Edit following config files in your Hadoop directory
1) etc/hadoop/core-site.xml:
2) etc/hadoop/hdfs-site.xml:
3) etc/hadoop/mapred-site.xml:
4) etc/hadoop/yarn-site.xml:
5. Execution
1) Format and start HDFS and YARN
Format the filesystem:
Start NameNode daemon and DataNode daemon:
Now you can browse the web interface for the NameNode at - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
Start ResourceManager daemon and NodeManager daemon:
Browse the web interface for the ResourceManager at - http://localhost:8088/
2) Test examples code that came with the hadoop version
Copy the input files into the distributed filesystem:
Run some of the examples provided:
This example counts the words starting with 'dfs' in the input.
Examine the output files:
Copy the output files from the distributed filesystem to the local filesystem and examine them:
or View the output files on the distributed filesystem:
You can see the result like:
3) Stop YARN and HDFS
Hadoop Free Download
When you're done, stop the daemons with: