How to install Hadoop on Ubuntu
This post is the record of how I installed Hadoop on Ubuntu. I am using the VMWare workstation for Ubuntu 22.04.1.
Usually, people make a separate account for Hadoop for security and efficiency. Still, I am using Ubuntu only for this study, so I didn't create a separate account and just used my original root account.
FYI: The file system of my Ubuntu for Hadoop is:
home
|
|------nahyeon
|
|-------hdoop
|
|------hadoop-2.8.5
|------tmpdata
1. Install openjdk on Ubuntu
Follow the next commands:
$ sudo apt update
$ sudo apt install openjdk-8-jdk -y
2. ssh setting(for enable non password ssh communication)
Follow the next commands:
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ ssh localhost
3. Download Hadoop
I downloaded Hadoop 2.8.5 from this link:
I located hadoop-2.8.5.tar.gz in /home/nahyeon/hdoop and unzipped with this command:
$ tar xzf hadoop-2.8.5.tar.gz
4. Install Hadoop
*This post shows how to install Hadoop in pseudo-distributed mode.
We need to add some codes in 6 files: bashrc, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
1. .bashrc
$ vi ~/.bashrc
(current path doesn't matter)
insert these codes at the bottom of bashrc:
export HADOOP_HOME=/home/nahyeon/hdoop/hadoop-2.8.5
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Then, type this command to apply the settings above:
$ source ~/.bashrc
2. hadoop-env.sh
(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)
$ vi hadoop-env.sh
Type the following code in line 25(instead of the original code: export JAVA_HOME=${JAVA_HOME})
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
* you can find the path of java with:
$ which javac
$ readlink -f /usr/bin/javac
$ vi core-site.xml
Type the following code at the bottom:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/nahyeon/hdoop/tmpdata</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
*tmpdata is an directory that user made
4. hdfs-site.xml
(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)
$ vi hdfs-site.xml
Type the following code at the bottom:
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/home/nahyeon/hdoop/dfsdata/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/nahyeon/hdoop/dfsdata/datanode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5. mapred-site.xml
(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)
$ vi mapred-site.xml (or, mapred-site.xml.template. File name might be vary.)
Type the following code at the bottom:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
6. yarn-site.xml
(current path: home/nahyeon/hdoop/hadoop-2.8.5/etc/hadoop)
$ vi yarn-site.xml
Type the following code at the bottom:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>0.0.0.0</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>0.0.0.0:8032</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>0.0.0.0:8089</value>
</property>
<property>
<name>yarn.acl.enable</name>
<value>0</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
5. Execute Hadoop
Initialize name node:
(current path: home/nahyeon/hdoop/hadoop-2.8.5/bin)
$ hdfs namenode -format
Start hadoop cluster:
(current path: home/nahyeon/hdoop/hadoop-2.8.5/sbin) !!!path has changed!
$ start-dfs.sh
$ start-yarn.sh
Instead of these two separate command, you can just use-
$ start-all.sh
* How to stop:
$ stop-dfs.sh
$ stop-yarn.sh
or
$ stop-all.sh
By command jps, you can check whether all processes started well.
Now, you can finally connect to Hadoop server!
Type the URL to firefox in Ubuntu:
http://localhost:8088
YARN resource manager
http://localhost:50070
Namenode UI
** The port number 50070 is for Hadoop version 2.x!!! Refer to this URL to find the right port number for your Hadoop version!
Also, if you see "Unable to connect", try these:
1. Check whether you typed localhost:8088 to firefox in Ubuntu(If you installed Hadoop on Ubuntu like me)
2. Check the firewall. You can clear the firewall with the following commands:
$ sudo firewall-cmd --permanent --zone=public --add-port=8088/tcp
success
$ sudo firewall-cmd --reload
success
References:
1. Overall information: https://spidyweb.tistory.com/214
2. Firewall: https://chulkang.tistory.com/33






댓글
댓글 쓰기