Hadoop安装与常用操作命令


一、大纲

1、HDFS集群环境搭建

2、常见问题

3、HDFS Shell命令使用

二、集群环境搭建

下载地址: https://hadoop.apache.org/releases.html

1、初始化目录

在/bigdata/hadoop-3.2.2/下创建目录

mkdir logs secret hadoop_data hadoop_data/tmp hadoop_data/namenode hadoop_data/datanode

2、设置默认认证用户

vi hadoop-http-auth-signature-secret

root

使用simple伪安全配置,需要设置访问用户,具体见core-site.xml。如果需要更安全的认证可以使用kerberos。在hadoop web访问地址后面加 ?user.name=root

比如:http://yuxuan01:8088/cluster?user.name=root

3、修改所有服务器环境变量

vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

export HADOOP_HOME=/bigdata/hadoop-3.2.2

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin

source /etc/profile

4、配置env环境

1)分别在httpfs-env.sh、mapred-env.sh、yarn-env.sh文件前添加JAVA_HOME环境变量

目录:$HADOOP_HOME/etc/hadoop

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

2) 在hadoop-env.sh文件中添加JAVA_HOME和HADOOP_HOME

目录:$HADOOP_HOME/etc/hadoop

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.302.b08-0.el7_9.x86_64/jre

export HADOOP_HOME=/bigdata/hadoop-3.2.2

5、配置用户

在start-dfs.sh和stop-dfs.sh头部配置

HDFS_DATANODE_USER=root

HDFS_DATANODE_SECURE_USER=root

HDFS_NAMENODE_USER=root

HDFS_SECONDARYNAMENODE_USER=root

YARN_RESOURCEMANAGER_USER=root

YARN_NODEMANAGER_USER=root

在start-yarn.sh和stop-yarn.sh头部配置

YARN_RESOURCEMANAGER_USER=root

HADOOP_SECURE_DN_USER=yarn

YARN_NODEMANAGER_USER=root

6、core-site.xml 配置


  
    fs.defaultFS
    hdfs://yuxuan01:9000
  
  
    hadoop.tmp.dir
    /bigdata/hadoop-3.2.2/hadoop_data/tmp
  
  
  
    io.compression.codecs
	org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec
  
  
  
    io.compression.codec.lzo.class
    com.hadoop.compression.lzo.LzoCodec
  
  
  
    hadoop.http.filter.initializers
    org.apache.hadoop.security.AuthenticationFilterInitializer
	
  
  
  
    hadoop.http.authentication.type
    simple
  
  
    hadoop.http.authentication.signature.secret.file
    /bigdata/hadoop-3.2.2/secret/hadoop-http-auth-signature-secret
	
  
  
   hadoop.http.authentication.simple.anonymous.allowed
    false
	
  
  
  
  
    dfs.permissions.enabled
    false
  

  
    hadoop.proxyuser.jack.hosts
    *
  

  
    hadoop.proxyuser.jack.groups
    *
  
  
  
  
    fs.trash.interval
    1440
    
  
  
    fs.trash.checkpoint.interval
    1440
  

7、hdfs-site.xml配置


   
      dfs.namenode.name.dir
      /bigdata/hadoop-3.2.2/hadoop_data/namenode
      
   

   
      dfs.datanode.data.dir
      /bigdata/hadoop-3.2.2/hadoop_data/datanode
      
   

   
      dfs.replication
      3
      
   
   
   
      dfs.secondary.http.address
      yuxuan02:9001
      
   
  
   
      dfs.webhdfs.enabled
      true
   

8、mapred-site.xml配置


   
      mapreduce.framework.name
      yarn
   
   
   
     yarn.app.mapreduce.am.env
     HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*
   
   
     mapreduce.map.env
     HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*
   
   
     mapreduce.reduce.env
     HADOOP_MAPRED_HOME=/bigdata/hadoop-3.2.2/etc/hadoop:/bigdata/hadoop-3.2.2/share/hadoop/common/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/common/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/hdfs/*:/bigdata/hadoop-3.2.2/share/hadoop/mapreduce/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn:/bigdata/hadoop-3.2.2/share/hadoop/yarn/lib/*:/bigdata/hadoop-3.2.2/share/hadoop/yarn/*
   

     
     mapred.map.output.compression.codec  
     com.hadoop.compression.lzo.LzoCodec  
     

     
     mapred.child.env  
     LD_LIBRARY_PATH=/usr/local/hadoop/lzo/lib  
   
   
     
     mapred.child.java.opts  
     -Xmx1048m  
    
   
     
     mapreduce.map.java.opts  
     -Xmx1310m  
    
   
     
     mapreduce.reduce.java.opts  
     -Xmx2620m  
    
   
   
     mapreduce.job.counters.limit
     20000
     Limit on the number of counters allowed per job. The default value is 200.
   

9、yarn-site.xml配置


   
   
      yarn.nodemanager.aux-services
      mapreduce_shuffle
   
   
   
      yarn.resourcemanager.hostname
      yuxuan01
   
   
   
      Amount of physical memory, in MB, that can be allocated for containers.
      yarn.nodemanager.resource.memory-mb
      7192
   
   
   
      The minimum allocation for every container request at the RM,in MBs. 
	  Memory requests lower than this won't take effect,and the specified value will get allocated at minimum.
      yarn.scheduler.minimum-allocation-mb
      1024
   

   
      The maximum allocation for every container request at the RM,in MBs. 
	  Memory requests higher than this won't take effect, and will get capped to this value.
      yarn.scheduler.maximum-allocation-mb
      7192
   

   
      yarn.nodemanager.vmem-check-enabled
	  false
   
   
    
      yarn.app.mapreduce.am.command-opts
      -Xmx2457m
  

10、配置works

设置datanode的服务器,之前文件名是slaves,hadoop3之后改为workers了。目录:$HADOOP_HOME/etc/hadoop

11、同步到其他服务器目录

scp -r /bigdata/hadoop-3.2.2/ root@yuxuan02:/bigdata/

scp -r /bigdata/hadoop-3.2.3/ root@yuxuan03:/bigdata/

12、格式化hadoop

hdfs namenode -format

13、启动

./bin/start-all.sh

jps

14、web页面查看

  1. 首次访问(由于设置了simple安全策略):http://yuxuan01:9870?user.name=root

  2. Job查看:http://yuxuan01:8088/cluster?user.name=root

三、常见问题

1、启动Namenode失败

查看 /bigdata/hadoop-3.2.2/hadoop_data/namenode目录是否存在

工具初始化: ./bin/hadoop namenode -format

2、启动datanode失败

第一种方法:

每次格式化前,要先关闭

./stop-all.sh

然后再格式化

./hdfs namenode -format

最后启动

./start-all.sh

第二种方法:

进入/bigdata/hadoop-3.2.2/hadoop_data/namenode目录(此目录为namenode的dfs.name.dir配置的路径)

rm -rf /bigdata/hadoop-3.2.2/hadoop_data/namenode

然后再格式化

./hdfs namenode -format

最后启动

./start-all.sh

四、HDFS常用Shell命令

http://hadoop.apache.org/docs/r1.2.1/commands-manual.html

用户命令和管理员命令

./hadoop 查看所有命令

./hadoop fs -put hadoop / 假设上传hadoop文件 到/目录

./hadoop fs -lsr /

./hadoop fs -du / 查看文件大小

./hadoop fs -rm /hadoop 删除文件

./hadoop fs -rmr /hadoop 删除文件夹下所有文件

./hadoop fs -mkdir /louis 创建目录

./hadoop dfsadmin -report 报告文件信息和统计信息

./hadoop dfsadmin -safemode enter 只读模式

/hadoop dfsadmin -safemode leave 离开模式

./hadoop fsck /louis -files -blocks 检查文件是否健康

fsck作用

1) 检查文件系统的健康状态

2)查看文件所在的数据块

3)删除一个坏块

4)查找一个缺失的块

hadoop balancer 磁盘均衡器

hadoop archive 文件归档,小文件合并在一起

./hadoop archive -archiveName pack.har -p /loris hadoop arichivdDir 生成归档包

./hadoop fs -lsr /user/louris/arichiveDirpack.har

./hadoop fs -cat /user/louis/archiveDir/pack.har/_index 查看归档包文件

相关