How to install Hadoop on Ubuntu

This post is the record of how I installed Hadoop on Ubuntu. I am using the VMWare workstation for Ubuntu 22.04.1.

Usually, people make a separate account for Hadoop for security and efficiency. Still, I am using Ubuntu only for this study, so I didn't create a separate account and just used my original root account.

FYI: The file system of my Ubuntu for Hadoop is:

home

|------nahyeon

|-------hdoop

|------hadoop-2.8.5

|------tmpdata

1. Install openjdk on Ubuntu

Follow the next commands:

$ sudo apt update

$ sudo apt install openjdk-8-jdk -y

2. ssh setting(for enable non password ssh communication)

Follow the next commands:

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

$ chmod 0600 ~/.ssh/authorized_keys

$ ssh localhost

3. Download Hadoop

I downloaded Hadoop 2.8.5 from this link:

https://hadoop.apache.org/release/2.8.5.html

I located hadoop-2.8.5.tar.gz in /home/nahyeon/hdoop and unzipped with this command:

$ tar xzf hadoop-2.8.5.tar.gz

4. Install Hadoop

*This post shows how to install Hadoop in pseudo-distributed mode.

We need to add some codes in 6 files: bashrc, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

1. .bashrc

$ vi ~/.bashrc

(current path doesn't matter)

insert these codes at the bottom of bashrc:

export HADOOP_HOME=/home/nahyeon/hdoop/hadoop-2.8.5

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

Then, type this command to apply the settings above:

$ source ~/.bashrc

2. hadoop-env.sh

(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)

$ vi hadoop-env.sh

Type the following code in line 25(instead of the original code: export JAVA_HOME=${JAVA_HOME})

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

* you can find the path of java with:

$ which javac

$ readlink -f /usr/bin/javac

3. core-site.xml

(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)

$ vi core-site.xml

Type the following code at the bottom:

<name>hadoop.tmp.dir</name>

<value>/home/nahyeon/hdoop/tmpdata</value>

</property>

<name>fs.default.name</name>

</property>

</configuration>

*tmpdata is an directory that user made

4. hdfs-site.xml

(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)

$ vi hdfs-site.xml

Type the following code at the bottom:

<value>/home/nahyeon/hdoop/dfsdata/namenode</value>

</property>

<value>/home/nahyeon/hdoop/dfsdata/datanode</value>

</property>

<name>dfs.replication</name>

</property>

</configuration>

5. mapred-site.xml

(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)

$ vi mapred-site.xml (or, mapred-site.xml.template. File name might be vary.)

Type the following code at the bottom:

<name>mapreduce.framework.name</name>

</property>

</configuration>

6. yarn-site.xml

(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)

$ vi yarn-site.xml

Type the following code at the bottom:

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<name>yarn.resourcemanager.hostname</name>

</property>

<name>yarn.resourcemanager.address</name>

</property>

<name>yarn.web-proxy.address</name>

</property>

<name>yarn.acl.enable</name>

</property>

<name>yarn.nodemanager.env-whitelist</name>

<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>

</property>

</configuration>

5. Execute Hadoop

Initialize name node:

(current path: home/nahyeon/hdoop/hadoop-2.8.5/bin)

$ hdfs namenode -format

Start hadoop cluster:

(current path: home/nahyeon/hdoop/hadoop-2.8.5/sbin) !!!path has changed!

$ start-dfs.sh

$ start-yarn.sh

Instead of these two separate command, you can just use-

$ start-all.sh

* How to stop:

$ stop-dfs.sh

$ stop-yarn.sh

$ stop-all.sh

By command jps, you can check whether all processes started well.

Now, you can finally connect to Hadoop server!

Type the URL to firefox in Ubuntu:

http://localhost:8088

YARN resource manager

http://localhost:50070

Namenode UI

** The port number 50070 is for Hadoop version 2.x!!! Refer to this URL to find the right port number for your Hadoop version!

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.0.0/administration/content/hdfs-ports.html

Also, if you see "Unable to connect", try these:

1. Check whether you typed localhost:8088 to firefox in Ubuntu(If you installed Hadoop on Ubuntu like me)

2. Check the firewall. You can clear the firewall with the following commands:

$ sudo firewall-cmd --permanent --zone=public --add-port=8088/tcp

success

$ sudo firewall-cmd --reload

success

References:

1. Overall information: https://spidyweb.tistory.com/214

2. Firewall: https://chulkang.tistory.com/33

이 블로그 검색

Newborn Turtle Anna