一、部署规划

1.1 版本说明

软件 版本
操作系统 CentOS Linux release 7.8.2003 (Core)
hadoop hadoop-3.2.2
JAVA jdk-8u271-linux-x64

1.2 集群规划

hostname IP 组件
master 172.16.20.200 NameNode ZKFailoverController
secondmaster 172.16.20.201 NameNode ZKFailoverController
slave1 172.16.20.202 Zookeeper journalnode DataNode NodeManage ResourceManager
slave2 172.16.20.203 Zookeeper journalnode DataNode NodeManage ResourceManager
slave3 172.16.20.204 Zookeeper journalnode DataNode NodeManage ResourceManager

节点规划说明:

zookeeper集群: 需要至少3个节点,并且节点数为奇数个,可以部署在任意独立节点上,NameNode及ResourceManager依赖zookeeper进行主备选举和切换

NameNode: 至少需要2个节点,一主多备,可以部署在任意独立节点上,用于管理HDFS的名称空间和数据块映射,依赖zookeeper和zkfc实现高可用和自动故障转移,并且依赖journalnode实现状态同步

ZKFailoverController: 即zkfc,在所有NameNode节点上启动,用于监视和管理NameNode状态,参与故障转移

journalnode: 至少需要3个节点,并且节点数为奇数个,可以部署在任意独立节点上,用于主备NameNode状态信息同步

ResourceManager: 至少需要2个节点,一主多备,可以部署在任意独立节点上,依赖zookeeper实现高可用和自动故障转移,用于资源分配和调度

DataNode: 至少需要3个节点,因为hdfs默认副本数为3,可以部署在任意独立节点上,用于实际数据存储

NodeManage: 部署在所有DataNode节点上,用于节点资源管理和监控

1.3 配置目录规划

服务 目录
hadoop namenode /data1/hadoop/dfs/name, /data2/hadoop/dfs/name
hadoop datanode /data1/hadoop/dfs/data, /data2/hadoop/dfs/data
hadoop 临时目录 /data/hadoop/tmp
zookeeper 数据目录 /data/zookeeper/data/
zookeeper Log目录 /data/zookeeper/logs/

每台一个默认数据分区/data, 并挂载3块数据硬盘, 分别挂载到/data1, /data2, /data3

二、 环境部署

2.1 系统配置

Hosts文件

1
2
3
4
5
6
7
cat >> /etc/hosts << EOF
172.16.20.200 master m
172.16.20.201 secondmaster sm
172.16.20.202 slave1 s1
172.16.20.203 slave2 s2
172.16.20.204 slave3 s3
EOF

免密登录

master和secondmaster做相同操作

1
2
3
4
5
6
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@master
ssh-copy-id -i /root/.ssh/id_rsa.pub root@secondmaster
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave1
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave2
ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave3

修改最大打开文件描述符

1
2
3
4
5
6
cat >> /etc/security/limits.conf <<EOF
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
EOF

2.2 JAVA环境

1
2
3
mkdir -pv /usr/java/jdk1.8
tar -zxf jdk-8u271-linux-x64.tar.gz
mv jdk1.8.0_271/ /usr/java/jdk1.8/

配置/etc/profile, 加入如下

1
2
3
4
5
6
7
8
9
10
11
12
13
cat >> /etc/profile << 'EOF'
#JAVA
JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271
JRE_HOME=/usr/java/jdk1.8/jdk1.8.0_271/jre
CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib
PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export JAVA_HOME JRE_HOME CLASSPATH PATH

EOF
source /etc/profile

## 验证
java -version

配置同步到其他节点, 并配置环境变量

1
2
3
4
rsync -av /usr/java root@sm:/usr/
rsync -av /usr/java root@s1:/usr/
rsync -av /usr/java root@s2:/usr/
rsync -av /usr/java root@s3:/usr/

2.3 数据目录配置

  • 每台节点默认一个/data分区, 并挂载3块HDFS数据盘/dev/sdb, /dev/sdc, /dev/sdd, 将其格式化并挂载到对应的/data1,/data2,/data3下

目录创建

1
2
3
4
5
6
7
8
9
10
# master节点
mkdir -pv /opt/hadoop
mkdir -pv /data/hadoop/tmp
mkdir -pv /{data1,data2,/data3}

# slave节点
mkdir -pv /opt/hadoop
mkdir -pv /data/hadoop/tmp
mkdir -pv /{data1,data2,/data3}
mkdir -pv /data/zookeeper/{data,logs}

磁盘分区

1
2
3
4
5
6
7
8
9
parted /dev/sdb
# 将磁盘设置为gpt模式
mklabel gpt
# 只分一个分区,大小为从其实扇区到最后
mkpart primary 2048s -1
# 查看
print
## 保存退出
quit

格式化

1
mkfs.xfs -L /data1 -f /dev/sdb1

磁盘挂载

1
2
3
4
5
6
vim /etc/fstab
// 加入
LABEL="/data1" /data1 xfs defaults 0 0

## 自动挂载
mount -a

data2,data3操作相同, 其余节点同理

三、zookeeper集群部署

slave1节点

3.1 下载解压

下载地址: https://dlcdn.apache.org/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz

1
2
tar -zxf apache-zookeeper-3.7.0-bin.tar.gz -C /opt/hadoop/
ln -s /opt/hadoop/apache-zookeeper-3.7.0-bin /usr/local/zookeeper

各节点配置环境变量, /etc/profie下加入

1
2
3
4
5
6
7
8
cat >> /etc/profile << 'EOF'
#Zookeeper
ZK_HOME=/usr/local/zookeeper
PATH=$ZK_HOME/bin:$PATH
export PATH ZK_HOME

EOF
source /etc/profile

3.2 修改配置

1
2
3
4
5
6
7
8
9
10
11
12
13
mkdir -pv /data/zookeeper/{data,logs}
cat > /usr/local/zookeeper/conf/zoo.cfg << EOF
admin.serverPort=10080
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/logs
clientPort=2181
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
EOF

3.3 同步配置

1
2
rsync -av /opt/hadoop/apache-zookeeper-3.7.0-bin root@s2:/opt/hadoop/
rsync -av /opt/hadoop/apache-zookeeper-3.7.0-bin root@s3:/opt/hadoop/

节点需创建软连接

1
ln -s /opt/hadoop/apache-zookeeper-3.7.0-bin /usr/local/zookeeper

3.4 创建myid

1
2
#各节点都需要配置,server.1就echo 1, server.2就echo 2
echo 1 > /data/zookeeper/data/myid

3.5 启动zookeeper

shell终端启动

1
2
3
zkServer.sh start
## 查看状态
zkServer.sh status

使用systemd管理zookeeper服务(推荐)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
cat > /usr/lib/systemd/system/zookeeper.service << EOF
[Unit]
Description=Zookeeper Service

[Service]
Environment=JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271
Type=forking
ExecStart=/usr/local/zookeeper/bin/zkServer.sh start
ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop
ExecStop=/usr/local/zookeeper/bin/zkServer.sh restart
Restart=always
TimeoutSec=20
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl restart zookeeper
systemctl enable zookeeper
zkServer.sh status

四、部署hadoop集群

4.1 下载解压

下载地址: https://dlcdn.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz

1
2
tar -zxf hadoop-3.2.2.tar.gz -C /opt/hadoop/
ln -s /opt/hadoop/hadoop-3.2.2 /usr/local/hadoop

master节点配置环境变量, /etc/profie下加入

1
2
3
4
5
6
7
8
cat >> /etc/profile << 'EOF'
#hadoop
HADOOP_HOME=/usr/local/hadoop
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH HADOOP_HOME

EOF
source /etc/profile

slave节点配置环境变量, /etc/profie下加入

1
2
3
4
5
6
7
8
cat >> /etc/profile << 'EOF'
#hadoop
HADOOP_HOME=/usr/local/hadoop
PATH=$HADOOP_HOME/bin:$PATH
export PATH HADOOP_HOME

EOF
source /etc/profile

4.2 修改配置

1
cd $HADOOP_HOME/etc/hadoop

4.2.1 hadoop-env.sh

1
2
3
4
5
6
7
8
cat >> hadoop-env.sh << 'EOF'
export JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271
export HADOOP_PID_DIR=$HADOOP_HOME/tmp/pids
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
EOF

4.2.2 yarn-env.sh

1
2
3
4
5
cat >> yarn-env.sh << 'EOF'
export YARN_REGISTRYDNS_SECURE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
EOF

4.2.2 core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
cat > core-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<!-- 指定hdfs的nameservice为mycluster -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>

<property>
<name>io.file.buffer.size</name>
<value>4096</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
EOF

4.2.3 hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
cat > hdfs-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>

<!-- mycluster下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>

<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>secondmaster:8020</value>
</property>

<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>secondmaster:9870</value>
</property>

<property>
<name>dfs.replication</name>
<value>3</value>
<description>Hadoop的备份系数是指每个block在hadoop集群中有几份,系数越高,冗余性越好,占用存储也越多</description>
</property>
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>

<!-- 配置namenode和datanode的工作目录-数据存储目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data1/hadoop/dfs/name,file:///data2/hadoop/dfs/name,file:///data3/hadoop/dfs/name,</value>
<description>namenode上存储hdfs名字空间元数据 </description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data1/hadoop/dfs/data,file:///data2/hadoop/dfs/data,file:///data3/hadoop/dfs/name</value>
<description>datanode上数据块的物理存储位置</description>
</property>

<!-- 指定NameNode的edits元数据的共享存储位置。也就是JournalNode列表,该url的配置格式:qjournal://host1:port1;host2:port2;host3:port3/journalId,journalId推荐使用nameservice,默认端口号是:8485 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://slave1:8485;slave2:8485;slave3:8485/mycluster</value>
</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/tmp/dfs/journal</value>
</property>

<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>

<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>

<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>

<property>
<name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
<value>60000</value>
</property>
</configuration>
EOF

配置说明:

dfs.nameservices 配置命名空间,所有namenode节点配置在命名空间mycluster下
dfs.replication 指定dataNode存储block的副本数量,默认值是3个
dfs.blocksize 大型文件系统HDFS块大小为256MB,默认是128MB
dfs.namenode.rpc-address 各个namenode的 rpc通讯地址
dfs.namenode.http-address 各个namenode的http状态页面地址
dfs.namenode.name.dir 存放namenode名称表(fsimage)的目录
dfs.datanode.data.dir 存放datanode块的目录
dfs.namenode.shared.edits.dir HA集群中多个NameNode之间的共享存储上的目录。此目录将由活动服务器写入,由备用服务器读取,以保持名称空间的同步。
dfs.journalnode.edits.dir 存储journal edit files的目录
dfs.ha.automatic-failover.enabled 是否启用故障自动处理
dfs.ha.fencing.methods 处于故障状态的时候hadoop要防止脑裂问题,所以在standby机器切换到active后,hadoop还会试图通过内部网络的ssh连过去,并把namenode的相关进程给kill掉,一般是sshfence 就是ssh方式
dfs.ha.fencing.ssh.private-key-files 配置了 ssh用的 key 的位置。

4.2.3 mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
cat > mapred-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>0.0.0.0:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>0.0.0.0:19888</value>
</property>
</configuration>
EOF

配置说明

  • mapreduce.framework.name 设置MapReduce运行平台为yarn
  • mapreduce.jobhistory.address 历史服务器的地址
  • mapreduce.jobhistory.webapp.address 历史服务器页面的地址

4.2.4 yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
cat > yarn-site.xml << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>

<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>

<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2,rm3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>slave1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>slave2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm3</name>
<value>slave3</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>slave1:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm3</name>
<value>slave3:8088</value>
</property>
<property>
<name>hadoop.zk.address</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
EOF

4.2.4 workers

1
2
3
4
5
cat > workers << 'EOF'
slave1
slave2
slave3
EOF

4.3 同步配置

分发配置文件到其他节点

1
2
3
4
rsync -av /opt/hadoop/hadoop-3.2.2 root@sm:/opt/hadoop/
rsync -av /opt/hadoop/hadoop-3.2.2 root@s1:/opt/hadoop/
rsync -av /opt/hadoop/hadoop-3.2.2 root@s2:/opt/hadoop/
rsync -av /opt/hadoop/hadoop-3.2.2 root@s3:/opt/hadoop/

在节点执行

1
ln -s /opt/hadoop/hadoop-3.2.2 /usr/local/hadoop

4.4 初始化hadoop

4.4.1 格式化ZooKeeper

任意master节点执行

1
$HADOOP_HOME/bin/hdfs zkfc -formatZK

zk节点验证, 看是否生成dfs.nameservices命名的目录

1
2
3
zkCli.sh
ls /hadoop-ha
[mycluster]

4.5 启动hadoop各组件

1. 启动journalnode

  • slave1, slave2, slave3节点执行
1
$HADOOP_HOME/bin/hdfs --daemon start journalnode

2. 启动HDFS

  1. 在master节点先执行namenode格式化
1
$HADOOP_HOME/bin/hdfs namenode -format
  1. 同步数据到secondmaster节点(也就是其余namenode节点)
1
2
3
rsync -av /data1/hadoop root@sm:/data1/
rsync -av /data2/hadoop root@sm:/data2/
rsync -av /data3/hadoop root@sm:/data3/
  1. 启动HDFS各组件(包含NameNode, DataNode, ZKFS, journalnode)
1
$HADOOP_HOME/sbin/start-dfs.sh

3. 启动yarn

1
$HADOOP_HOME/sbin/start-yarn.sh

五、验证启动状态

5.1 命令查看

master节点

1
2
3
// JPS命令查看
2593 DFSZKFailoverController //监视和管理NameNode
2511 NameNode //管理HDFS的名称空间和数据块映

slave节点

1
2
3
4
5
6
// JPS命令查看
5587 NodeManager //节点资源管理和监控
5300 DataNode //数据存储
4631 QuorumPeerMain //zookeeper进程
5208 JournalNode //主备NameNode状态信息同步
5518 ResourceManager //资源分配和调度

查看NameNode节点状态

1
2
3
4
hdfs haadmin -getServiceState nn1
active
hdfs haadmin -getServiceState nn2
standby

NameNode节点强制转换状态:

1
2
// nn2转强制换为Standby, 轻易不要使用, 否则会影响自动切换
hdfs haadmin -transitionToStandby -forcemanual nn2

查看ResourceManager状态

1
2
3
4
5
6
yarn rmadmin -getServiceState rm1
standby
yarn rmadmin -getServiceState rm2
standby
yarn rmadmin -getServiceState rm3
active

5.2 web页面查看

NameNode页面

1
2
3
4
5
master
http://172.16.20.200:9870/

secondmaster
http://172.16.20.201:9870/

Resourse Manager页面

1
2
随便访问一个node节点,会自跳转至leader节点
http://172.16.20.202:8088/cluster

六、 高可用验证

6.1 NameNdoe验证

同时访问master和secondmaster的namenode页面, master状态为active, secondmaster状态为standby

杀掉master的namenode进程, 再次访问namenode页面, 查看secondmaster页面是否为active

6.2 ResourceManager验证

杀掉leader状态的ResourceManager进程, 访问任意ResourceManager节点(被杀死进程的节点除外), 查看是否指向新的leader节点