hadoop集群搭建


hadoop配置

版本: VMware Workstation Pro16.2  hadoop2.7  centos7

进入hadoop/etc/hadoop目录

配置core-site.xml


    
        
        fs.defaultFS
        hdfs://hadoop01:8020
    
    
        
        hadoop.tmp.dir
        /home/hadoop/tmp
    
    
        
        ha.zookeeper.quorum
        hadoop01:2181,hadoop02:2181,hadoop03:2181
    
    
        
        ha.zookeeper.session-timeout.ms
        10000
    

 配置hdfs-site.xml


    
        
        dfs.replication
        3
    
    
        dfs.namenode.name.dir
        /home/hadoop/namenode/data
    
    
        
        dfs.datanode.data.dir
        /home/hadoop/datanode/data
    
    
        
        dfs.nameservices
        mycluster
    
    
        
        dfs.ha.namenodes.mycluster
        nn1,nn2
    
    
        
        dfs.namenode.rpc-address.mycluster.nn1
        hadoop01:8020
    
    
        
        dfs.namenode.rpc-address.mycluster.nn2
        hadoop02:8020
    
    
        
        dfs.namenode.http-address.mycluster.nn1
        hadoop01:50070
        
        dfs.namenode.http-address.mycluster.nn2
        hadoop02:50070
    
    
        
        dfs.namenode.shared.edits.dir
        qjournal://hadoop01:8485;hadoop02:8485;hadoop03:8485/mycluster
    
    
        
        dfs.journalnode.edits.dir
        /home/hadoop/journalnode/data
    
    
        
        dfs.ha.fencing.methods
        sshfence
    
    
        
        dfs.ha.fencing.ssh.private-key-files
        /root/.ssh/id_rsa
    
    
        
        dfs.ha.fencing.ssh.connect-timeout
        30000
    
    
        
        dfs.client.failover.proxy.provider.mycluster
        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProviderv
alue>
    
    
        
        dfs.ha.automatic-failover.enabled
        true
    

配置yarn-site.xml


    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        
        yarn.log-aggregation-enable
        true
    
    
        
        yarn.log-aggregation.retain-seconds
        86400
    
    
        
        yarn.resourcemanager.ha.enabled
        true
    
    
        
        yarn.resourcemanager.cluster-id
        my-yarn-cluster
    
    
        
        yarn.resourcemanager.ha.rm-ids
        rm1,rm2
        
        yarn.resourcemanager.hostname.rm1
        hadoop02
    
    
        
        yarn.resourcemanager.hostname.rm2
        hadoop03
    
    
        
        yarn.resourcemanager.webapp.address.rm1
        hadoop02:8088
    
    
        
        yarn.resourcemanager.webapp.address.rm2
        hadoop03:8088
    
    
        
        yarn.resourcemanager.zk-address
        hadoop01:2181,hadoop02:2181,hadoop03:2181
    
    
        
        yarn.resourcemanager.recovery.enabled
        true
    
    
        
        yarn.resourcemanager.store.class
        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue
>
    

配置mapred-site.xml


    
        
        mapreduce.framework.name
        yarn
    

 配置slaves

hadoop01
hadoop02
hadoop03

 分发程序

# 将安装包分发到hadoop02
scp -r Hadoop安装路径  hadoop002:安装路径
# 将安装包分发到hadoop03
scp -r Hadoop安装路径  hadoop003:安装路径

初始化

在 Hadoop01上执行 namenode 初始化命令

hdfs namenode -format

启动集群

在Hadoop01 的${HADOOP_HOME}/sbin 目录下,启动 Hadoop。此时 hadoop02 和 hadoop03 上的相关服务也会被启动:

# 启动dfs服务
start-dfs.sh
# 启动yarn服务
start-yarn.sh

查看集群

使用jps查看服务进程,如图说明配置成功。

在浏览器使用主机名:50070进入 Web-UI 界面进入