(1)软件版本:CentOS 7.0;JDK 8u141;Hadoop 2.7.3;Scala 2.11.8;Spark 2.2.0。
(2)IP地址:192.168.106.128(主节点);192.168.106.129(从节点);192.168.106.130(从节点)。
配置服务器IP地址,如下所示:
vim /etc/sysconfig/network-scripts/ifcfg-ens33 BOOTPROTO=static ONBOOT=yes #系统启动时是否激活网卡 IPADDR=192.168.106.xxx GATEWAY=192.168.106.2 NETMASK=255.255.255.0 DNS1=192.168.106.2 NM_CONTROLLED=no #如果为yes,那么实时生效 service network restart
(3)修改主机名:hostnamectl set-hostname "hostname"。
(4)修改/etc/hosts文件:192.168.106.128 Master;192.168.106.129 Slave1;192.168.106.130 Slave2。
2. 配置SSH无密码登录
(1)yum install openssh-server;yum install openssh-clients
(2)ssh-keygen -t rsa -P ''
(3)sudo vim /etc/ssh/sshd_config,如下所示:
RSAAuthentication yes #启用RSA认证 PubkeyAuthentication yes #启用公钥私钥配对认证方式 AuthorizedKeysFile .ssh/authorized_keys #公钥文件路径
(4)拷贝公钥
scp ssw@slave1:~/.ssh/id_rsa.pub ~/.ssh/slave1_rsa.pub scp ssw@slave2:~/.ssh/id_rsa.pub ~/.ssh/slave2_rsa.pub cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys cat slave1_rsa.pub >> authorized_keys cat slave2_rsa.pub >> authorized_keys scp authorized_keys ssw@slave1:~/.ssh/ scp authorized_keys ssw@slave2:~/.ssh/(5)sudo chmod 700 ~/.ssh;sudo chmod 600 ~/.ssh/authorized_keys;service sshd restart
说明:除(4)仅在Master上操作外,其它均在Master,Slave1,Slave2上操作。
3. Java和Scala环境搭建
(1)Java环境搭建
编辑/etc/profile,如下所示:
export JAVA_HOME=/usr/local/jdk1.8.0_141 export JRE_HOME=/usr/local/jdk1.8.0_141/jre export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$PATH
(2)Scala环境搭建
编辑/etc/profile,如下所示:
export SCALA_HOME=/usr/local/scala-2.11.8 export PATH=$SCALA_HOME/bin:$PATH
4. Hadoop环境搭建
(1)编辑/etc/profile,如下所示:
export HADOOP_HOME=/opt/hadoop-2.7.3/ export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_ROOT_LOGGER=INFO,console export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
(2)编辑$HADOOP_HOME/etc/hadoop/hadoop-env.sh,如下所示:
export JAVA_HOME=/usr/local/jdk1.8.0_141(3)编辑$HADOOP_HOME/etc/hadoop/slaves,如下所示:
Slave1 Slave2(4)编辑$HADOOP_HOME/etc/hadoop/core-site.xml,如下所示:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://Master:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.7.3/tmp</value> </property> </configuration>(5)编辑$HADOOP_HOME/etc/hadoop/hdfs-site.xml,如下所示:
<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>Master:9001</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop-2.7.3/hdfs/data</value> </property> </configuration>(6)编辑$HADOOP_HOME/etc/hadoop/mapred-site.xml,如下所示:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> </configuration>(7)编辑$HADOOP_HOME/etc/hadoop/yarn-site.xml,如下所示:
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>Master</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>Master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>Master:8088</value> </property> </configuration>(8)配置Slave1和Slave2
scp -r /opt/hadoop-2.7.3 root@Slave1:/opt scp -r /opt/hadoop-2.7.3 root@Slave2:/opt(9)启动Hadoop集群
sudo chmod -R a+w /opt/hadoop-2.7.3 hadoop namenode -format /opt/hadoop-2.7.3/sbin/start-all.sh(10)查看集群是否启动成功
Master:SecondaryNameNode;ResourceManager;NameNode。Slave:NodeManager;Datanode。
5. Spark环境搭建
(1)编辑/etc/profie,如下所示:
export SPARK_HOME=/opt/spark-2.2.0 export PATH=$PATH:$SPARK_HOME/bin(2)编辑$SPARK_HOME/conf/spark-env.sh,如下所示:
export JAVA_HOME=/usr/local/jdk1.8.0_141 export SCALA_HOME=/usr/local/scala-2.11.8 export HADOOP_HOME=/opt/hadoop-2.7.3 export SPARK_HOME=/opt/spark-2.2.0 export HADOOP_CONF_DIR=/opt/hadoop-2.7.3/etc/hadoop export SPARK_MASTER_IP=192.168.106.128 export SPARK_DRIVER_MEMORY=1G(3)编辑$SPARK_HOME/conf/slaves,如下所示:
Master Slave1 Slave2(4)配置Slave1和Slave2
scp -r /opt/spark-2.2.0 root@Slave1:/opt scp -r /opt/spark-2.2.0 root@Slave2:/opt(5)启动Spark集群
/opt/spark-2.2.0/sbin/start-all.sh
(6)查看集群是否启动成功
Master:Master,Worker。Slave:Worker。
参考文献:
[1]Hadoop2.7.3+Spark2.1.0完全分布式集群搭建过程:http://www.cnblogs.com/zengxiaoliang/p/6478859.html