创新路
我们一直在努力

Hadoop高可用

    NameNode 是 HDFS 的核心配置,HDFS 又是Hadoop 的核心组件,NameNode 在 Hadoop 集群中至关重要,NameNode机器宕机,将导致集群不可用,如果NameNode 数据丢失将导致整个集群的数据丢失,而 NameNode 的数据的更新又比较频繁,实现 NameNode 高可用势在必行

    官方提供了两种解决方案

    A. HDFS with NFS  //nfs存储数据还是一个单点

    B. HDFS with QJM   //做两个namenode,一主一备,备只接受数据,不发送

        注:QJM全称是Quorum Journal Manager, 由JournalNode(JN)组成,一般是奇数点结点组成。每个JournalNode对外有一个简易的RPC接口,以供NameNode读写EditLog到JN本地磁盘。当写EditLog时,NameNode会同时向所有JournalNode并行写文件,只要有N/2+1结点写成功则认为此次写操作成功,遵循Paxos协议,还有一点非常重要,任何时刻,只能有一个ActiveNameNode,否则将会导致集群操作的混乱,那么两个NameNode将会分别有两种不同的数据状态,可能会导致数据丢失,或者状态异常,这种情冴通常称为“split-brain”(脑裂,三节点通讯阻断,即集群中不同的Datanode 看到了不同的Active NameNodes)。对于JNS而言,任何时候只允讲一个NameNode作为writer;在failover期间,原来的Standby Node将会接管Active的所有职能,并负责向JNS写入日志记录,这中机制阻止了其他NameNode基于处于Active状态的问题。

  (如果看到两个active,直接干掉一个,数据会保留一半)

架构图:

image.png

hadoop 高可用

      ALL: 配置 /etc/hosts

          注:ALL代表所有机器

          192.168.1.10 nn01

          192.168.1.20 nn02

          192.168.1.11 node1

          192.168.1.12 node2

          192.168.1.13 node3

    ALL: 除了 zookeeper 其他 hadoop ,kafka 服务全部停止

    ALL: 初始化 hdfs 集群,删除 /var/hadoop/*

    nn02: 关闭 ssh key 验证,部署公钥私钥

        StrictHostKeyChecking no

        scp nn01:/root/.ssh/id_rsa /root/.ssh/

        scp nn01:/root/.ssh/authorized_keys /root/.ssh/

        同步/var/hadoop到所有的主机

    nn01上配置core-site.xml

          <configuration>

                <property>

                        <name>fs.defaultFS</name>

                        <value>hdfs://nsdcluster</value>

                </property>

                <property>

                        <name>hadoop.tmp.dir</name>

                        <value>/var/ha</value>

                </property>

                <property>

                        <name>ha.zookeeper.quorum</name>

                        <value>node1:2181,node2:2181,node3:2181</value>

                </property>

                <property>

                        <name>hadoop.proxyuser.lxy.groups</name>

                        <value>*</value>

                </property>

                <property>

                        <name>hadoop.proxyuser.lxy.hosts</name>

                        <value>*</value>

                </property>

          </configuration>

 

    nn01上配置hdfs-site.xml  //secondarynamenode 在高可用里面没有用途,这里把他关闭

          <configuration>

            <property>

              <name>dfs.replication</name>

              <value>2</value>

            </property>

            <property>

              <name>dfs.nameservices</name>

                     //指定两台namenode集群名nsdcluster

              <value>nsdcluster</value>

            </property>

            <property>

              <name>dfs.ha.namenodes.nsdcluster</name>

                 //定义集群的两个namenode的名称为nn1,nn2,声明的名字,不能乱改!,不是两台namenode的主机名!

              <value>nn1,nn2</value>

            </property>

 

            <property>

              <name>dfs.namenode.rpc-address.nsdcluster.nn1</name>

                  //配置nn1和nn2的rpc通信端口,(nn1和2是上面声明的名字)

              <value>nn1:8020</value>

            </property>

            <property>

              <name>dfs.namenode.rpc-address.nsdcluster.nn2</name>

              <value>nn2:8020</value>

            </property>

 

            <property>

              <name>dfs.namenode.http-address.nsdcluster.nn1</name>

                 //配置nn1和2的http通信端口

              <value>nn1:50070</value>

            </property>

            <property>

              <name>dfs.namenode.http-address.nsdcluster.nn2</name>

              <value>nn2:50070</value>

            </property>

 

            <property>

              <name>dfs.namenode.shared.edits.dir</name>

              <value>qjournal://node1:8485;node2:8485;node3:8485/nsdcluster</value>

            </property>

 

          <property>

            <name>dfs.journalnode.edits.dir</name>

            <value>/var/ha/journal</value>

          </property>

 

          <property>

            <name>dfs.client.failover.proxy.provider.nsdcluster</name>

            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

          </property>

 

          <property>

            <name>dfs.ha.fencing.methods</name>

            <value>sshfence</value>

          </property>

 

          <property>

            <name>dfs.ha.fencing.ssh.private-key-files</name>

            <value>/root/.ssh/id_rsa</value>

          </property>

 

          <property>

            <name>dfs.ha.automatic-failover.enabled</name>

            <value>true</value>

          </property>

        </configuration>

 

    nn01上配置yarn-site.xml

            <configuration>

        <!– Site specific YARN configuration properties –>

            <property>

                <name>yarn.nodemanager.aux-services</name>

                <value>mapreduce_shuffle</value>

            </property>

            <property>

                <name>yarn.resourcemanager.ha.enabled</name>

                <value>true</value>

            </property>

            <property>

                <name>yarn.resourcemanager.ha.rm-ids</name>

                <value>rm1,rm2</value>

            </property>

            <property>

                <name>yarn.resourcemanager.recovery.enabled</name>

                <value>true</value>

            </property>

            <property>

                <name>yarn.resourcemanager.store.class</name

                <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

            </property>

            <property>

                <name>yarn.resourcemanager.zk-address</name>

                <value>node1:2181,node2:2181,node3:2181</value>

            </property>

            <property>

                <name>yarn.resourcemanager.cluster-id</name>

                <value>yarn-ha</value>

            </property>

            <property>

                <name>yarn.resourcemanager.hostname.rm1</name>

                <value>nn1</value>

            </property>

            <property>

                <name>yarn.resourcemanager.hostname.rm2</name>

                <value>nn2</value>

            </property>

        </configuration>

 

初始化启动集群

    nodeX代表node1和node2 和node3

ALL:  同步配置到所有集群机器

nn01: 初始化ZK集群  ./bin/hdfs zkfc -formatZK

nodeX:  启动 journalnode 服务

            ./sbin/hadoop-daemon.sh start journalnode

nn01: 格式化  ./bin/hdfs  namenode  -format

nn02: nn1数据同步到本地 /var/hadoop/dfs

         Rsync -aSH nn1:/var/hadoop/dfs /var/hadoop/dfs

nn01: 初始化 JNS

        ./bin/hdfs namenode -initializeSharedEdits

nodeX: 停止 journalnode 服务

           ./sbin/hadoop-daemon.sh stop journalnode

 

启动集群

nn01: ./sbin/start-all.sh

nn02: ./sbin/yarn-daemon.sh start resourcemanager

 

查看集群状态

./bin/hdfs haadmin -getServiceState nn1  

./bin/hdfs haadmin -getServiceState nn2

./bin/yarn rmadmin -getServiceState rm1

./bin/yarn rmadmin -getServiceState rm2

 

./bin/hdfs dfsadmin -report

./bin/yarn  node  -list

 

访问集群:

./bin/hadoop  fs -ls  /

./bin/hadoop  fs -mkdir hdfs://nsdcluster/input

 

验证高可用,关闭 active namenode

./sbin/hadoop-daemon.sh stop namenode

./sbin/yarn-daemon.sh stop resourcemanager

 

恢复节点

./sbin/hadoop-daemon.sh start namenode

./sbin/yarn-daemon.sh start resourcemanager

未经允许不得转载:天府数据港官方信息博客 » Hadoop高可用

客官点个赞呗! (0)
分享到:

评论 抢沙发

评论前必须登录!

天府云博 - 做有态度的开发&运维&设计学习分享平台!

联系我们百度云主机