刘一辰的软件工程随笔
石 家 庄 铁 道 大 学
实 验 报 告
课程名称:云计算技术与应用 班级:信1905-2 学号:20194016 姓名:陈涵
实验一 Arduino UNO的串口通信
一、实验目的
利用虚拟机搭建集群部署hadoop
HDFS文件操作以及文件接口编程;
MAPREDUCE并行程序开发、发布与调用。
二、实验内容
一、虚拟机集群搭建部署hadoop
利用VMware、centOS-7、Xshell(secureCrt)等软件搭建集群部署hadoop,具体操作参照
https://www.bilibili.com/video/BV1Kf4y1z7Nw?p=1
2二、HDFS文件操作
在分布式文件系统上验证HDFS文件命令,如下。
hadoop fs [genericOpitions] [-ls [-lsr [-du [-dus [-count [-q] [-mv [-cp [-rm [-skipTrash] [-rmr [-skipTrash] [-expunge] [-put [-copyFromLocal [-moveFromLocal [-get [-ignoreCrc] [-crc] [-getmerge [-cat [-text [-copyToLocal [-ignoreCrc] [-crc] [-moveToLocal [-crc] [-mkidr [-setrep [-R] [-w] [-touchz [-test -[ezd] [-stat [format] [-tail [-f] [-chmod [-R] [-chown [-R] [OWNER] [:[GROUP] PATH...]] //改变文件的拥有者,-R可以递归地改变文件夹内所有文件的拥有者。同样,这个命令只有超级用户才能使用 [-chgrp [-R] GROUP PATH...] //改变文件所属的组,-R可以递归地改变文件夹内所有文件所属的组。这个命令必须是超级用户才能使用 [-help [cmd]] //这是命令的帮助信息 |
2.1 HDFS接口编程
调用HDFS文件接口实现对分布式文件系统中文件的访问,如创建、修改、删除等。
参考代码:
源代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class test {
public static void main(String [] args)
{
try
{
Configuration conf=new Configuration() ;
conf.set("fs.defaultFS", "hdfs://node01:8020");
FileSystem fs=FileSystem.get(conf);
String filename="hdfs://node01:8020/user/input/test.txt";
FSDataOutputStream os=fs.create(new Path(filename));
byte[]buff="hello world!".getBytes();
os.write(buff,0,buff.length);
System.out.println("Create"+filename);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
运行结果:
三、MAPREDUCE并行程序开发
3.1 求每年最高气温
原始数据如下:
|
源代码:
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Temperature {
/**
* 四个泛型类型分别代表:
* KeyIn Mapper的输入数据的Key,这里是每行文字的起始位置(0,11,...)
* ValueIn Mapper的输入数据的Value,这里是每行文字
* KeyOut Mapper的输出数据的Key,这里是每行文字中的“年份”
* ValueOut Mapper的输出数据的Value,这里是每行文字中的“气温”
*/
static class TempMapper extends
Mapper{
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// 打印样本: Before Mapper: 0, 2000010115
System.out.print("Before Mapper: " + key + ", " + value);
String line = value.toString();
String year = line.substring(0, 4);
int temperature = Integer.parseInt(line.substring(8));
context.write(new Text(year), new IntWritable(temperature));
// 打印样本: After Mapper:2000, 15
System.out.println(
"======" +
"After Mapper:" + new Text(year) + ", " + new IntWritable(temperature));
}
}
/**
* 四个泛型类型分别代表:
* KeyIn Reducer的输入数据的Key,这里是每行文字中的“年份”
* ValueIn Reducer的输入数据的Value,这里是每行文字中的“气温”
* KeyOut Reducer的输出数据的Key,这里是不重复的“年份”
* ValueOut Reducer的输出数据的Value,这里是这一年中的“最高气温”
*/
static class TempReducer extends
Reducer{
@Override
public void reduce(Text key, Iterablevalues,
Context context) throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
StringBuffer sb = new StringBuffer();
//取values的最大值
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
sb.append(value).append(", ");
}
// 打印样本: Before Reduce: 2000, 15, 23, 99, 12, 22,
System.out.print("Before Reduce: " + key + ", " + sb.toString());
context.write(key, new IntWritable(maxValue));
// 打印样本: After Reduce: 2000, 99
System.out.println(
"======" +
"After Reduce: " + key + ", " + maxValue);
}
}
public static void main(String[] args) throws Exception {
//输入路径
String dst = "hdfs://node01:8020/user/inputcloud.txt";
//输出路径,必须是不存在的,空文件加也不行。
String dstOut = "hdfs://node01:8020/user/outputcloud";
Configuration hadoopConfig = new Configuration();
hadoopConfig.set("fs.hdfs.impl",
org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()
);
hadoopConfig.set("fs.file.impl",
org.apache.hadoop.fs.LocalFileSystem.class.getName()
);
Job job = new Job(hadoopConfig);
job.setJar("/home/hadoop/mapreudce2-1.0-SNAPSHOT.jar");
//如果需要打成jar运行,需要下面这句
//job.setJarByClass(NewMaxTemperature.class);
//job执行作业时输入和输出文件的路径
FileInputFormat.addInputPath(job, new Path(dst));
FileOutputFormat.setOutputPath(job, new Path(dstOut));
//指定自定义的Mapper和Reducer作为两个阶段的任务处理类
job.setMapperClass(TempMapper.class);
job.setReducerClass(TempReducer.class);
//设置最后输出结果的Key和Value的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//执行job,直到完成
job.waitForCompletion(true);
System.out.println("Finished");
}
}
运行结果:
3.2 词频统计
maven建立quick-start工程。
pom.xml
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> |
源代码:
import java.io.IOException;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WordCountMapper extends Mapper{
@Override
protected void map(LongWritable key, Text value, Mapper.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
//super.map(key, value, context);
//String[] words = StringUtils.split(value.toString());
String[] words = StringUtils.split(value.toString(), " ");
for(String word:words)
{
context.write(new Text(word), new LongWritable(1));
}
}
}
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCountReducer extends Reducer{
@Override
protected void reduce(Text arg0, Iterablearg1,
Reducer.Context context) throws IOException, InterruptedException {
// TODO Auto-generated method stub
//super.reduce(arg0, arg1, arg2);
int sum=0;
for(LongWritable num:arg1)
{
sum += num.get();
}
context.write(arg0,new LongWritable(sum));
}
}
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCountRunner {
public static void main(String[] args) throws IllegalArgumentException, IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(WordCountRunner.class);
job.setJobName("wordcount");
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}
运行结果:
三、实验总结:
本次实验配置了hadoop配置了centos,并根据实验要求以及所给的实验代码进行了实验,先对命令进行了了解,后通过java代码控制hadoop进行hdfs文件管理,再通过java代码寻求气温最高值,将其打包后放到hadoop通过运行jar包得到想要的结果,最后将所给实例代码打包后使用发现下表越界,将主函数中的array[1],array[2]改成了 0和1运行成功了。
TRANSLATE withArabic | Hebrew | Polish |
Bulgarian | Hindi | Portuguese |
Catalan | Hmong Daw | Romanian |
Chinese Simplified | Hungarian | Russian |
Chinese Traditional | Indonesian | Slovak |
Czech | Italian | Slovenian |
Danish | Japanese | Spanish |
Dutch | Klingon | Swedish |
English | Korean | Thai |
Estonian | Latvian | Turkish |
Finnish | Lithuanian | Ukrainian |
French | Malay | Urdu |
German | Maltese | Vietnamese |
Greek | Norwegian | Welsh |
Haitian Creole | Persian |