一、部署规划

hadoop高可用集群部署参考: Hadoop3.X分布式高可用集群部署

1.1 版本说明

软件 版本
操作系统 CentOS Linux release 7.8.2003 (Core)
JAVA jdk-8u271-linux-x64
Hadoop hadoop-3.2.2
Scala Scala2.12.15
Spark spark-3.1.2-bin-hadoop3.2

1.2 集群规划

hostname IP 组件
master 172.16.20.200 NameNode Spark-Master
secondmaster 172.16.20.201 NameNode Spark-Master
slave1 172.16.20.202 Zookeeper DataNode NodeManage Spark-Worker
slave2 172.16.20.203 Zookeeper DataNode NodeManage Spark-Worker
slave3 172.16.20.204 Zookeeper DataNode NodeManage Spark-Worker

二、环境配置

2.1 配置Scala环境

  • 所有节点相同操作

下载解压

下载地址: https://downloads.lightbend.com/scala/2.13.6/scala-2.13.6.tgz

1
2
tar -zxf scala-2.12.15.tgz
mv scala-2.12.15 /usr/local/scala

配置环境变量

1
2
3
4
5
6
7
8
cat >> /etc/profile << 'EOF'
#SCALA
SCALA_HOME=/usr/local/scala
PATH=$SCALA_HOME/bin:$PATH
export PATH SCALA_HOME

EOF
source /etc/profile

验证

1
2
scala -version
Scala code runner version 2.12.15 -- Copyright 2002-2021, LAMP/EPFL and Lightbend, Inc.

配置同步到其余节点, 并配置环境变量

1
2
3
4
rsync -av /usr/local/scala root@sm:/usr/local/
rsync -av /usr/local/scala root@s1:/usr/local/
rsync -av /usr/local/scala root@s2:/usr/local/
rsync -av /usr/local/scala root@s3:/usr/local/

三、Spark集群部署

3.1 下载解压

下载地址: http://spark.apache.org/downloads.html

1
2
tar -zxf spark-3.1.2-bin-hadoop3.2.tgz -C /opt/hadoop/
ln -s /opt/hadoop/spark-3.1.2-bin-hadoop3.2 /usr/local/spark

各节点配置环境变量, /etc/profie下加入

1
2
3
4
5
6
7
8
cat >> /etc/profile << 'EOF'
#SPARK
SPARK_HOME=/usr/local/spark
PATH=$SPARK_HOME/bin:$PATH
export PATH SPARK_HOME

EOF
source /etc/profile

3.2 修改配置

1
cd $SPARK_HOME/conf

spark-env.sh

1
2
3
4
5
6
7
8
9
10
11
mkdir -pv /data/spark
cat > spark-env.sh << 'EOF'
export JAVA_HOME=/usr/java/jdk1.8/jdk1.8.0_271
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=slave1:2181,slave2:2181,slave3:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_LOCAL_DIRS=/data/spark
export SPARK_DRIVER_MEMORY=4g
export SPARK_WORKER_CORES=4
EOF

workers

1
2
3
4
5
cat > workers << EOF
slave1
slave2
slave3
EOF

3.3 同步配置

1
2
3
4
rsync -av /opt/hadoop/spark-3.1.2-bin-hadoop3.2 root@sm:/opt/hadoop/
rsync -av /opt/hadoop/spark-3.1.2-bin-hadoop3.2 root@s1:/opt/hadoop/
rsync -av /opt/hadoop/spark-3.1.2-bin-hadoop3.2 root@s2:/opt/hadoop/
rsync -av /opt/hadoop/spark-3.1.2-bin-hadoop3.2 root@s3:/opt/hadoop/

并在节点上创建软连接

1
ln -s /opt/hadoop/spark-3.1.2-bin-hadoop3.2 /usr/local/spark

3.4 启动

master节点集群方式启/停spark集群

1
2
$SPARK_HOME/sbin/start-all.sh
$SPARK_HOME/sbin/stop-all.sh

secondmaster节点单节点方式启/停Master

1
2
$SPARK_HOME/sbin/start-master.sh
$SPARK_HOME/sbin/stop-master.sh

四、验证启动状态

5.1 命令查看

查看zk数据

1
2
3
zkCli.sh
ls /spark
[leader_election, master_status]

JPS查看

master节点

1
2
// JPS命令查看
15928 Master

slave节点

1
2
// JPS命令查看
11907 Worker

5.2 web页面查看

访问master和secondmaster的8080端口, 查看spark主页

master: Status: ALIVE

secondmaster: Status: STANDBY

五、高可用验证

停止master节点Master进程, 访问secondmaster的spark页面,查看状态是否切换为ALIVE