一、部署规划
1.1 版本说明
| 软件 |
版本 |
| 操作系统 |
CentOS Linux release 7.8.2003 (Core) |
| hadoop |
hadoop-3.2.2 |
| JAVA |
jdk-8u271-linux-x64 |
1.2 集群规划
| hostname |
IP |
组件 |
|
|
|
|
|
|
| master |
172.16.20.200 |
|
NameNode |
ZKFailoverController |
|
|
|
|
| secondmaster |
172.16.20.201 |
|
NameNode |
ZKFailoverController |
|
|
|
|
| slave1 |
172.16.20.202 |
Zookeeper |
|
|
journalnode |
DataNode |
NodeManage |
ResourceManager |
| slave2 |
172.16.20.203 |
Zookeeper |
|
|
journalnode |
DataNode |
NodeManage |
ResourceManager |
| slave3 |
172.16.20.204 |
Zookeeper |
|
|
journalnode |
DataNode |
NodeManage |
ResourceManager |
节点规划说明:
zookeeper集群: 需要至少3个节点,并且节点数为奇数个,可以部署在任意独立节点上,NameNode及ResourceManager依赖zookeeper进行主备选举和切换
NameNode: 至少需要2个节点,一主多备,可以部署在任意独立节点上,用于管理HDFS的名称空间和数据块映射,依赖zookeeper和zkfc实现高可用和自动故障转移,并且依赖journalnode实现状态同步
ZKFailoverController: 即zkfc,在所有NameNode节点上启动,用于监视和管理NameNode状态,参与故障转移
journalnode: 至少需要3个节点,并且节点数为奇数个,可以部署在任意独立节点上,用于主备NameNode状态信息同步
ResourceManager: 至少需要2个节点,一主多备,可以部署在任意独立节点上,依赖zookeeper实现高可用和自动故障转移,用于资源分配和调度
DataNode: 至少需要3个节点,因为hdfs默认副本数为3,可以部署在任意独立节点上,用于实际数据存储
NodeManage: 部署在所有DataNode节点上,用于节点资源管理和监控
1.3 配置目录规划
| 服务 |
目录 |
| hadoop namenode |
/data1/hadoop/dfs/name, /data2/hadoop/dfs/name |
| hadoop datanode |
/data1/hadoop/dfs/data, /data2/hadoop/dfs/data |
| hadoop 临时目录 |
/data/hadoop/tmp |
| zookeeper 数据目录 |
/data/zookeeper/data/ |
| zookeeper Log目录 |
/data/zookeeper/logs/ |
每台一个默认数据分区/data, 并挂载3块数据硬盘, 分别挂载到/data1, /data2, /data3
二、 环境部署
2.1 系统配置
Hosts文件
1 2 3 4 5 6 7
| cat >> /etc/hosts << EOF 172.16.20.200 master m 172.16.20.201 secondmaster sm 172.16.20.202 slave1 s1 172.16.20.203 slave2 s2 172.16.20.204 slave3 s3 EOF
|
免密登录
master和secondmaster做相同操作
1 2 3 4 5 6
| ssh-keygen -t rsa ssh-copy-id -i /root/.ssh/id_rsa.pub root@master ssh-copy-id -i /root/.ssh/id_rsa.pub root@secondmaster ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave1 ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave2 ssh-copy-id -i /root/.ssh/id_rsa.pub root@slave3
|
修改最大打开文件描述符
1 2 3 4 5 6
| cat >> /etc/security/limits.conf <<EOF * soft nofile 65536 * hard nofile 65536 * soft nproc 65536 * hard nproc 65536 EOF
|
2.2 JAVA环境
1 2 3
| mkdir -pv /usr/java/jdk1.8 tar -zxf jdk-8u271-linux-x64.tar.gz mv jdk1.8.0_271/ /usr/java/jdk1.8/
|
配置/etc/profile, 加入如下
1 2 3 4 5 6 7 8 9 10 11 12 13
| cat >> /etc/profile << 'EOF'
JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271 JRE_HOME=/usr/java/jdk1.8/jdk1.8.0_271/jre CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/lib PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH export JAVA_HOME JRE_HOME CLASSPATH PATH
EOF source /etc/profile
java -version
|
配置同步到其他节点, 并配置环境变量
1 2 3 4
| rsync -av /usr/java root@sm:/usr/ rsync -av /usr/java root@s1:/usr/ rsync -av /usr/java root@s2:/usr/ rsync -av /usr/java root@s3:/usr/
|
2.3 数据目录配置
- 每台节点默认一个/data分区, 并挂载3块HDFS数据盘/dev/sdb, /dev/sdc, /dev/sdd, 将其格式化并挂载到对应的/data1,/data2,/data3下
目录创建
1 2 3 4 5 6 7 8 9 10
| mkdir -pv /opt/hadoop mkdir -pv /data/hadoop/tmp mkdir -pv /{data1,data2,/data3}
mkdir -pv /opt/hadoop mkdir -pv /data/hadoop/tmp mkdir -pv /{data1,data2,/data3} mkdir -pv /data/zookeeper/{data,logs}
|
磁盘分区
1 2 3 4 5 6 7 8 9
| parted /dev/sdb
mklabel gpt
mkpart primary 2048s -1
print
quit
|
格式化
1
| mkfs.xfs -L /data1 -f /dev/sdb1
|
磁盘挂载
1 2 3 4 5 6
| vim /etc/fstab // 加入 LABEL="/data1" /data1 xfs defaults 0 0
mount -a
|
data2,data3操作相同, 其余节点同理
三、zookeeper集群部署
slave1节点
3.1 下载解压
下载地址: https://dlcdn.apache.org/zookeeper/zookeeper-3.7.0/apache-zookeeper-3.7.0-bin.tar.gz
1 2
| tar -zxf apache-zookeeper-3.7.0-bin.tar.gz -C /opt/hadoop/ ln -s /opt/hadoop/apache-zookeeper-3.7.0-bin /usr/local/zookeeper
|
各节点配置环境变量, /etc/profie下加入
1 2 3 4 5 6 7 8
| cat >> /etc/profile << 'EOF' #Zookeeper ZK_HOME=/usr/local/zookeeper PATH=$ZK_HOME/bin:$PATH export PATH ZK_HOME
EOF source /etc/profile
|
3.2 修改配置
1 2 3 4 5 6 7 8 9 10 11 12 13
| mkdir -pv /data/zookeeper/{data,logs} cat > /usr/local/zookeeper/conf/zoo.cfg << EOF admin.serverPort=10080 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/logs clientPort=2181 server.1=slave1:2888:3888 server.2=slave2:2888:3888 server.3=slave3:2888:3888 EOF
|
3.3 同步配置
1 2
| rsync -av /opt/hadoop/apache-zookeeper-3.7.0-bin root@s2:/opt/hadoop/ rsync -av /opt/hadoop/apache-zookeeper-3.7.0-bin root@s3:/opt/hadoop/
|
节点需创建软连接
1
| ln -s /opt/hadoop/apache-zookeeper-3.7.0-bin /usr/local/zookeeper
|
3.4 创建myid
1 2
| echo 1 > /data/zookeeper/data/myid
|
3.5 启动zookeeper
shell终端启动
1 2 3
| zkServer.sh start
zkServer.sh status
|
使用systemd管理zookeeper服务(推荐)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
| cat > /usr/lib/systemd/system/zookeeper.service << EOF [Unit] Description=Zookeeper Service
[Service] Environment=JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271 Type=forking ExecStart=/usr/local/zookeeper/bin/zkServer.sh start ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop ExecStop=/usr/local/zookeeper/bin/zkServer.sh restart Restart=always TimeoutSec=20 Restart=on-failure
[Install] WantedBy=multi-user.target EOF
systemctl daemon-reload systemctl restart zookeeper systemctl enable zookeeper zkServer.sh status
|
四、部署hadoop集群
4.1 下载解压
下载地址: https://dlcdn.apache.org/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
1 2
| tar -zxf hadoop-3.2.2.tar.gz -C /opt/hadoop/ ln -s /opt/hadoop/hadoop-3.2.2 /usr/local/hadoop
|
master节点配置环境变量, /etc/profie下加入
1 2 3 4 5 6 7 8
| cat >> /etc/profile << 'EOF'
HADOOP_HOME=/usr/local/hadoop PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH export PATH HADOOP_HOME
EOF source /etc/profile
|
slave节点配置环境变量, /etc/profie下加入
1 2 3 4 5 6 7 8
| cat >> /etc/profile << 'EOF'
HADOOP_HOME=/usr/local/hadoop PATH=$HADOOP_HOME/bin:$PATH export PATH HADOOP_HOME
EOF source /etc/profile
|
4.2 修改配置
1
| cd $HADOOP_HOME/etc/hadoop
|
4.2.1 hadoop-env.sh
1 2 3 4 5 6 7 8
| cat >> hadoop-env.sh << 'EOF' export JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271 export HADOOP_PID_DIR=$HADOOP_HOME/tmp/pids export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_JOURNALNODE_USER=root export HDFS_ZKFC_USER=root EOF
|
4.2.2 yarn-env.sh
1 2 3 4 5
| cat >> yarn-env.sh << 'EOF' export YARN_REGISTRYDNS_SECURE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root EOF
|
4.2.2 core-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| cat > core-site.xml << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/tmp</value> </property>
<property> <name>io.file.buffer.size</name> <value>4096</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>slave1:2181,slave2:2181,slave3:2181</value> </property> </configuration> EOF
|
4.2.3 hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
| cat > hdfs-site.xml << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>master:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>secondmaster:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>master:9870</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>secondmaster:9870</value> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Hadoop的备份系数是指每个block在hadoop集群中有几份,系数越高,冗余性越好,占用存储也越多</description> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///data1/hadoop/dfs/name,file:///data2/hadoop/dfs/name,file:///data3/hadoop/dfs/name,</value> <description>namenode上存储hdfs名字空间元数据 </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data1/hadoop/dfs/data,file:///data2/hadoop/dfs/data,file:///data3/hadoop/dfs/name</value> <description>datanode上数据块的物理存储位置</description> </property>
<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://slave1:8485;slave2:8485;slave3:8485/mycluster</value> </property>
<property> <name>dfs.journalnode.edits.dir</name> <value>/data/hadoop/tmp/dfs/journal</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value> sshfence shell(/bin/true) </value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.fencing.ssh.connect-timeout</name> <value>30000</value> </property> <property> <name>ha.failover-controller.cli-check.rpc-timeout.ms</name> <value>60000</value> </property> </configuration> EOF
|
配置说明:
dfs.nameservices 配置命名空间,所有namenode节点配置在命名空间mycluster下
dfs.replication 指定dataNode存储block的副本数量,默认值是3个
dfs.blocksize 大型文件系统HDFS块大小为256MB,默认是128MB
dfs.namenode.rpc-address 各个namenode的 rpc通讯地址
dfs.namenode.http-address 各个namenode的http状态页面地址
dfs.namenode.name.dir 存放namenode名称表(fsimage)的目录
dfs.datanode.data.dir 存放datanode块的目录
dfs.namenode.shared.edits.dir HA集群中多个NameNode之间的共享存储上的目录。此目录将由活动服务器写入,由备用服务器读取,以保持名称空间的同步。
dfs.journalnode.edits.dir 存储journal edit files的目录
dfs.ha.automatic-failover.enabled 是否启用故障自动处理
dfs.ha.fencing.methods 处于故障状态的时候hadoop要防止脑裂问题,所以在standby机器切换到active后,hadoop还会试图通过内部网络的ssh连过去,并把namenode的相关进程给kill掉,一般是sshfence 就是ssh方式
dfs.ha.fencing.ssh.private-key-files 配置了 ssh用的 key 的位置。
4.2.3 mapred-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
| cat > mapred-site.xml << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
<property> <name>mapreduce.jobhistory.address</name> <value>0.0.0.0:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> </property> </configuration> EOF
|
配置说明
- mapreduce.framework.name 设置MapReduce运行平台为yarn
- mapreduce.jobhistory.address 历史服务器的地址
- mapreduce.jobhistory.webapp.address 历史服务器页面的地址
4.2.4 yarn-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
| cat > yarn-site.xml << 'EOF' <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster1</value> </property>
<property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2,rm3</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>slave1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>slave2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm3</name> <value>slave3</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>slave1:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>slave2:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm3</name> <value>slave3:8088</value> </property> <property> <name>hadoop.zk.address</name> <value>slave1:2181,slave2:2181,slave3:2181</value> </property>
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration> EOF
|
4.2.4 workers
1 2 3 4 5
| cat > workers << 'EOF' slave1 slave2 slave3 EOF
|
4.3 同步配置
分发配置文件到其他节点
1 2 3 4
| rsync -av /opt/hadoop/hadoop-3.2.2 root@sm:/opt/hadoop/ rsync -av /opt/hadoop/hadoop-3.2.2 root@s1:/opt/hadoop/ rsync -av /opt/hadoop/hadoop-3.2.2 root@s2:/opt/hadoop/ rsync -av /opt/hadoop/hadoop-3.2.2 root@s3:/opt/hadoop/
|
在节点执行
1
| ln -s /opt/hadoop/hadoop-3.2.2 /usr/local/hadoop
|
4.4 初始化hadoop
4.4.1 格式化ZooKeeper
任意master节点执行
1
| $HADOOP_HOME/bin/hdfs zkfc -formatZK
|
zk节点验证, 看是否生成dfs.nameservices命名的目录
1 2 3
| zkCli.sh ls /hadoop-ha [mycluster]
|
4.5 启动hadoop各组件
1. 启动journalnode
- slave1, slave2, slave3节点执行
1
| $HADOOP_HOME/bin/hdfs --daemon start journalnode
|
2. 启动HDFS
- 在master节点先执行namenode格式化
1
| $HADOOP_HOME/bin/hdfs namenode -format
|
- 同步数据到secondmaster节点(也就是其余namenode节点)
1 2 3
| rsync -av /data1/hadoop root@sm:/data1/ rsync -av /data2/hadoop root@sm:/data2/ rsync -av /data3/hadoop root@sm:/data3/
|
- 启动HDFS各组件(包含NameNode, DataNode, ZKFS, journalnode)
1
| $HADOOP_HOME/sbin/start-dfs.sh
|
3. 启动yarn
1
| $HADOOP_HOME/sbin/start-yarn.sh
|
五、验证启动状态
5.1 命令查看
master节点
1 2 3
| // JPS命令查看 2593 DFSZKFailoverController //监视和管理NameNode 2511 NameNode //管理HDFS的名称空间和数据块映
|
slave节点
1 2 3 4 5 6
| // JPS命令查看 5587 NodeManager //节点资源管理和监控 5300 DataNode //数据存储 4631 QuorumPeerMain //zookeeper进程 5208 JournalNode //主备NameNode状态信息同步 5518 ResourceManager //资源分配和调度
|
查看NameNode节点状态
1 2 3 4
| hdfs haadmin -getServiceState nn1 active hdfs haadmin -getServiceState nn2 standby
|
NameNode节点强制转换状态:
1 2
| // nn2转强制换为Standby, 轻易不要使用, 否则会影响自动切换 hdfs haadmin -transitionToStandby -forcemanual nn2
|
查看ResourceManager状态
1 2 3 4 5 6
| yarn rmadmin -getServiceState rm1 standby yarn rmadmin -getServiceState rm2 standby yarn rmadmin -getServiceState rm3 active
|
5.2 web页面查看
NameNode页面
1 2 3 4 5
| master http://172.16.20.200:9870/
secondmaster http://172.16.20.201:9870/
|
Resourse Manager页面
1 2
| 随便访问一个node节点,会自跳转至leader节点 http://172.16.20.202:8088/cluster
|
六、 高可用验证
6.1 NameNdoe验证
同时访问master和secondmaster的namenode页面, master状态为active, secondmaster状态为standby
杀掉master的namenode进程, 再次访问namenode页面, 查看secondmaster页面是否为active
6.2 ResourceManager验证
杀掉leader状态的ResourceManager进程, 访问任意ResourceManager节点(被杀死进程的节点除外), 查看是否指向新的leader节点