Install Hadoop 2.6.3 cluster on CentOS 6.7
在
CentOS 6.7 x64
上搭建Hadoop 2.6.3
完全分布式环境,并在DigitalOcean
上测试通过
- Tip: 内网部署可以不用配置
SSH
免密 - Tip: 单个NameNode/DataNode内存应大于1G,在512MB内存的机器上可能无法成功启动
本文假设:
主节点(NameNode)域名(主机名):m.fredlab.org
子节点(DataNode)域名(主机名):s1.fredlab.org
s2.fredlab.org
s3.fredlab.org
一、配置SSH免密码登陆
1、master机上生成公私钥:id_rsa和id_rsa.pub
ssh-keygen
2、上传到每个节点机器的.ssh/目录下
.ssh/----
|--
|--id_rsa
|--id_rsa.pub
|--authorized_keys
|--
3、更改私钥权限为0600
chmod 0600 id_rsa
4、复制公钥至authorized_keys
cat id_rsa.pub >> authorized_keys
5、配置SSH免询问(yes)登录(可选)
修改/etc/ssh/ssh_config的以下两行
StrictHostKeyChecking no
GSSAPIAuthentication no
重启ssh服务
service sshd restart
二、配置域名(主机名)
1、修改主机名
vim /etc/sysconfig/network
修改HOSTNAME=s1.fredlab.org(主机名),其他节点同理。
2、将主机名和对应ip地址加入/etc/hosts,格式类似:
104.236.142.235 m.fredlab.org
104.236.143.22 s1.fredlab.org
104.236.143.54 s2.fredlab.org
107.170.224.199 s3.fredlab.org
三、安装Java JDK
1、下载java jdk rpm包
http://www.oracle.com/technetwork/java/javase/downloads/index.html
2、安装
rpm -ih jdk-8u72-linux-x64.rpm
3、检验java路径,以及版本
which java
which javac
java -version
4、默认JAVA_HOME=/usr
四、安装hadoop 2.6.3
注:以下操作均在master上进行
1、下载hadoop 2.6.3
wget http://www.eu.apache.org/dist/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz
2、解压安装(安装位置/usr/local/hadoop2)
tar zxf hadoop-2.6.3.tar.gz
mv hadoop-2.6.3 /usr/local/hadoop2
3、增加数据目录和临时目录(位置可选,与配置文件对应即可)
mkdir /usr/local/hadoop2/tmp
mkdir /usr/local/hadoop2/hdfs
4、修改配置文件(位于/usr/local/hadoop2/etc/hadoop/目录下)
主配置文件:core-site,其中m.fredlab.org为NameNode的域名(主机名)
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop2/tmp</value>
<description>temp dir</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://m.fredlab.org:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
hdfs配置文件:hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop2/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop2/hdfs/data</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>10</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
将java路径加入hadoop-env.sh和yarn-env.sh
echo "export JAVA_HOME=/usr" >> hadoop-env.sh
echo "export JAVA_HOME=/usr" >> yarn-env.sh
在slaves文件中加入各个节点的域名(主机名),每行一个,类似:
s1.fredlab.org
s2.fredlab.org
s3.fredlab.org
五、复制配置好的hadoop到各个Slaves
将/usr/local/hadoop2目录一起拷贝到各个DataNode机器上
scp -r /usr/local/hadoop2 root@s1.fredlab.org:/usr/local/
六、启动hadoop集群
1、格式化文件系统
/usr/local/hadoop2/bin/hdfs namenode -format
2、启动集群
启动hdfs
/usr/local/hadoop2/sbin/start-dfs.sh
启动yarn
/usr/local/hadoop2/sbin/start-yarn.sh
3、检查文件系统
/usr/local/hadoop2/bin/hdfs dfsadmin -report
4、web页面观测集群运行情况
集群运行状态:http://m.fredlab.org:50070
http://master_ip:50070
集群应用状态:http://m.fredlab.org:8088
http://master_ip:8088
5、查看各个节点java进程
NameNode,m.fredlab.org上执行:jps
12058 ResourceManager
22298 NameNode
11914 SecondaryNameNode
11180 Jps
DataNode,s1.fredlab.org上执行:jps
13909 Jps
13494 DataNode
Disclaimer
- License under
CC BY-NC 4.0
- Copyright issue feedback
me#imzye.me
, replace # with @ - Not all the commands and scripts are tested in production environment, use at your own risk
- No privacy information is collected here