基于单机的Hadoop伪分布式运行模拟实现即其分析过程


基于单机的 Hadoop 伪分布式运行模拟实现即其分析过程 目 录 基于单机的 Hadoop 伪分布式运行模拟实现................................................................................2 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程(1)..............................................20 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程(2)..............................................32 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程(3)..............................................56 声明:本资料信息来源于互联网,由作者总结而成,仅供学习。 云南大学 周军 2009/11/8 基于单机的 Hadoop 伪分布式运行模拟实现 在单机来模拟 Hadoop 基于分布式运行,最终通过在本机创建多个线程来模 拟。主要就是实现运行 Hadoop 自带的 WordCount 这个例子,具体实现过程将在 下面详细叙述。这种模式也是在一台单机上运行,但用不同的 Java 进程模仿分 布式运行中的各类结点 ( NameNode, DataNode, JobTracker, TaskTracker, Secondary NameNode ),请注意分布式运行中的这几个结点的区别: 从分布式存储的角度来说,集群中的结点由一个 NameNode 和若干个 DataNode 组成, 另有一个 Secondary NameNode 作为 NameNode 的备份。从分 布式应用的角度来说,集群中的结点由一个 JobTracker 和若干个 TaskTracker 组成,JobTracker 负责任务的调度,TaskTracker 负责并行执行任务。 TaskTracker 必须运行在 DataNode 上,这样便于数据的本地计算。JobTracker 和 NameNode 则无须在同一台机器上。 安装 cygwin 步骤如下: 1、首先点击 http://cygwin.com/setup.exe 下载 setup.exe,例如保存到桌面, 点击就可以进行下载安装了。 2、运行下载文件,将会看到: Cygwin Installer 3、然后选择安装路径、安装文件存放路径、连接方式(这里选择 Use IE5 Settings)、下载站点镜像,自动创建下载文件列表, 4、当你看到如下界面时,请确保下面红圈内的包被下载,也即’openssh’被选 择; 或者 选择安装类型,可以单击最顶层的 All 后面的循环样式图标切换安装类型, 是的最顶层 All 行的最后一个单词为 Install,如图: 其实,如果你选择了 Install 安装类型,就已经选择了 openssh 包。 为让你看到 openssh 包,你可以在 Net [图标] Install 下面看到与网络有关的 包,如图所示: 向下滑动滚动条,可以看到 openssh,如图所示: 在 Cirrent 下如果显示版本号,说明该包已经被此次安装选择上了,否则的话会 显示一个 Skip,意思是跳过该包,并不会安装该包的。 4、继续按下一步结束安装。 Cygwin 的配置 安装完成之后,例如我的 Cygwin 安装在 C:\Cygwin\目录下面,进行配置如 下: 设置环境变量: 在系统变量中新建变量【变量名:CYGWIN,变量值:ntsec tty】;编辑添 加变量【变量名:Path,变量值:c:\cygwin\bin;c:\cygwin\usr\bin; 其它的保留】。 配置 ssh 后台守候程序 配置过程如下图所示: Example of using ssh-host-config 如果之前配置过一次,会显示如下所示提示操作,以及我都选择了 no,如图: 接下来,它会询问你是否将 sshd 作为一项服务进行安装,即“*** Query: Do you want to install sshd as a service?”,看我的选择过程: 我选择了是将 sshd 作为一项服务进行安装,接着提示我输入: Enter the value of CYGWIN for the daemon: [ntsec] 其实,我已经在环境变量中设置了 CYGWIN = tty 的,但是似乎不好使,还要看 提示信息的: *** ERROR: Only [no] "check_case:strict" "ntsec" "smbntsec" "traverse" allowed. 提示说,只有"check_case:strict" "ntsec" "smbntsec" "traverse"这四个可 以选择作为一种服务进行安装,而且[no] "check_case:strict"应该表示不存在 吧。 直接输入 ntsec 即可,如下所示: *** Query: Enter the value of CYGWIN for the daemon: [ntsec] ntsec *** Info: The sshd service has been installed under the LocalSystem *** Info: account (also known as SYSTEM). To start the service now, call *** Info: `net start sshd' or `cygrunsrv -S sshd'. Otherwise, it *** Info: will start automatically after the next reboot. *** Info: Host configuration finished. Have fun! SHIYANJUN@cbbd2ce9428e48b ~ 可见,它提示 sshd 已经在本地系统安装好了,可以启动 sshd 了,sshd 是守护 进程。 启动 ssh 后台守候程序 生成 authorization keys 在 ssh-keygen.exe 的过程中有可能产生错误:c:\program 请去除 高级 -环境变量 -用户变量 –HOME 对于上面这种方法是可行的,当然还可以参考下面步骤: 因为基于分布式的多个 Datanode 结点需要向 Namenode 提供任务执行报告信息, 如果每次访问 Namenode 结点都需要密码验证的话就麻烦了,当然我要说的就是 基于无密码认证的方式的配置,可以参考我的其他文章。 生成 RSA 公钥的命令如下: $ ssh-keygen 失败 已解决:去除 高级-》环境变量-》用户变量-》HOME 生成过程如图所示: 上面执行到如下步骤时需要进行设置: Enter file in which to save the key (/home/SHIYANJUN/.ssh/id_rsa): 直接按回车键即可,按照默认的选项将生成的 RSA 公钥保存在 /home/SHIYANJUN/.ssh/id_rsa 文件中,以便结点之间进行通讯认证。 继续执行,又会提示进行输入选择密码短语 passphrase,在如下这里: Enter passphrase (empty for no passphrase): 直接按回车键,而且一定要这么做,因为空密码短语就会在后面执行过程中免去 结点之间通讯进行的认证,直接通过RSA 公钥(事实上,我们使用的是DSA 认证, 当然 RSA 也可以进行认证,继续看后面)认证。 RSA 公钥主要是对结点之间的通讯信息加密的。如果 RSA 公钥生成过程如上图, 说明正确生成了 RSA 公钥。 接着生成 DSA 公钥,使用如下命令: $ ssh-keygen -t dsa 生成过程与前面的 RSA 类似,如图所示: 然后,需要将 DSA 公钥加入到公钥授权文件 authorized_keys 中,使用如下命令: $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 如图所示,没有任何信息输出: 在配置结束后,用 ssh localhost 测试 Hadoop 下载并安装 Hadoop 配置 首先进行 Hadoop 配置: 1、conf/hadoop-env.sh 文件中最基本需要指定 JAVA_HOME,例如我的如下: 打开它你可以看到: # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun 将第二行的注释符号去掉,同时指定在你的机器上JAVA_HOME 的值,如下为我修 改的内容: # The java implementation to use. Required. export JAVA_HOME="D:\Program Files\Java\jdk1.6.0_07" 这里要注意,如果你的 JDK 安装目录中存在空格,需要使用双引号引起来,否则 就会报错。还需注意这个地方的 JDK 是系统正在使用的 JDK。 2、只需要修改 conf/hadoop-site.xml 文件即可,默认情况下,hadoop-site.xml 并没有被配置,如果是基于单机运行,就会按照 hadoop-default.xml 中的基本 配置选项执行任务。 将 hadoop-site.xml 文件修改为如下所示: fs.default.name hdfs://localhost:9000 mapred.job.tracker localhost:9001 dfs.replication 1 参数 fs.default.name 指定 NameNode 的 IP 地址和端口号。缺省值是 file:///, 表示使用本地文件系统, 用于单机非分布式模式。此处我们指定使用 运行于本机 localhost 上的 NameNode。 参数 mapred.job.tracker 指定 JobTracker 的 IP 地址和端口号。缺省值是 local, 表示在本地同一 Java 进程内执行 JobTracker 和 TaskTracker, 用于 单机非分布式模式。此处我们指定使用运行于本机 localhost 上的 JobTracker ( 用一个单独的 Java 进程做 JobTracker )。 参数 dfs.replication 指定 HDFS 中每个 Block 被复制的次数,起数据冗余备 份的作用。在典型的生产系统中,这个数常常设置为 3。 实现过程 1、启动 cygwin,同时使用下面的命令启动 ssh: $ net start sshd 如图所示: 2、Hadoop 处理的数据文件准备 我使用的是 hadoop-0.16.4 版本,直接拷贝到 G:\根目录下面,同时,我的 cygwin 直接安装在 G:\Cygwin 里面。 在目录 G:\hadoop-0.16.4 中创建一个 input 目录,里面准备几个 TXT 文件,我 准备了 7 个,文件中内容就是使用空格分隔的多个英文单词,因为是运行 WordCount 这个例子,后面可以看到我存入了多少内容。 3、运行过程 下面,切换到 G:\hadoop-0.16.4 目录下面 $ cd ../../cygdrive/g/hadoop-0.16.4 其中通过 cygdrive(位于 Cygwin 根目录中)可以直接映射到 Windows 下的各个逻 辑磁盘分区中。 在执行任务中,使用 HDFS,即 Hadoop 的分布式文件系统,因此这时要做的就是 格式化这个文件系统,使用下面命令可以完成: $ bin/hadoop namenode -format 格式化过程如图所示: 此时,应该启动 Namenode、Datanode、SecondaryNamenode、JobTracer,使用 这个命令启动: $ bin/start-all.sh 失败 已解决 估计是 JDK 的问题 启动过程如图所示: 启动完成之后,通过 ps –ef 应该可以看到启动了 5 个新的 java 进程。 如果你没有配置前面 ssh 的无密码认证,或者配置了但是输入了密码短语,那么 到这里就会每启动一个进程就提示输入密码,试想,如果有 N 多进程的话,那岂 不是要命了。 启动 Cygwin,当前它是在 home/yourname 目录下的,如图所示: 切换到根目录下,从而进入 G:\hadoop-0.16.4 目录,并创建一个数据输入目录 input-dir,如图所示: 下面,打开 G:\hadoop-0.16.4\input,在该目录下新建几个记事本文件,例如 我创建了 3 个:input-a.txt、input-b.txt、input-c.txt。三个文件的内容如 下所示: input-a.txt:as after append actor as as add as apache as after add as input-b.txt:bench be bench believe background bench be block input-c.txt:cafe cat communications connection cat cat cat cust cafe 然后,需要把上面我们在本地的 input 目录中准备的文件复制到 HDFS 中的 input 目录中,以便在分布式文件系统管理这些待处理的数据文件,使用下面命令: $ bin/hadoop dfs -put ./input input 错误 已解决 原因:Hadoop 报 Name node is in safe mode 错误 解决:bin/hadoop dfsadmin -safemode leave 执行上述命令如果没有信息输出就复制成功。 现在,才可以执行 Hadoop 自带的 WordCount 列子了,使用下面命令开始提交任 务,进入运行: $ bin/hadoop jar hadoop-0.16.4-examples.jar wordcount input output 最后面两个参数分别为数据输入目录和数据处理完成后的输出目录,这里,不能 在你的 G:\hadoop-0.16.4 目录中存在 output 这个目录,否则会报错的。 运行过程如图所示: 通过上图,可以看出在运行一个 Job 的过程中,WordCount 工具执行任务的进度 情况,非常详细。 最后查看执行任务后,处理数据的结果,使用的命令行如下所示: $ bin/hadoop dfs -cat output/part-00000 #将文件从 HDFS 拷到本地文件系统中再查看: $ bin/hadoop dfs -get output output $ cat output/* #也可以直接查看 $ bin/hadoop dfs -cat output/* $ bin/stop-all.sh #停止 hadoop 进程 输出结果如图所示: 最后,停止 Hadoop 进程,使用如下命令: $ bin/stop-all.sh 如图所示: 以上就是全部的过程了。 故障诊断 (1) 执行 $ bin/start-all.sh 启动 Hadoop 进程后,会启动 5 个 java 进程, 同时会在 /tmp 目录下创建五个 pid 文件记录这些进程 ID 号。通过这五个文 件,可以得知 namenode, datanode, secondary namenode, jobtracker, tasktracker 分别对应于哪一个 Java 进程。当你觉得 Hadoop 工作不正常时, 可以首先查看这 5 个 java 进程是否在正常运行。 (2) 使用 web 接口。访问 http://localhost:50030 可以查看 JobTracker 的 运行状态。访问 http://localhost:50060 可以查看 TaskTracker 的运行状态。 访问 http://localhost:50070 可以查看 NameNode 以及整个分布式文件系统 的状态,浏览分布式文件系统中的文件以及 log 等。 (3) 查看 ${HADOOP_HOME}/logs 目录下的 log 文件,namenode, datanode, secondary namenode, jobtracker, tasktracker 各有一个对应的 log 文件, 每一次运行的计算任务也有对应用 log 文件。分析这些 log 文件有助于找到故 障原因 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程 (1) 在上小节中,成功实现了模拟 Hadoop 工具实例 WrodCount 的伪分布式运行 过程,总结在这个过程中使用的命令如下: $ cd ../../cygdrive/g/hadoop-0.16.4 $ bin/hadoop namenode -format $ bin/start-all.sh $ bin/hadoop dfs -put ./input input $ bin/hadoop jar hadoop-0.16.4-examples.jar wordcount input output $ bin/hadoop dfs -cat output/part-00000 $ bin/stop-all.sh 为了加深对这个过程理解,以及认识 HDFS,再执行该例子,每执行一步,探查 一下都做了哪些工作。 1、格式化 HDFS 首先是格式化 HDFS: $ bin/hadoop namenode -format 我们关注一下运行过程中输出的信息并进行追踪: cygpath: cannot create short name of g:\hadoop-0.16.4\logs 08/09/21 17:32:43 INFO dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/bran ch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 08/09/21 17:32:44 INFO fs.FSNamesystem: fsOwner=SHIYANJUN,None,root,Administrato rs,Users 08/09/21 17:32:44 INFO fs.FSNamesystem: supergroup=supergroup 08/09/21 17:32:44 INFO fs.FSNamesystem: isPermissionEnabled=true 08/09/21 17:32:45 INFO dfs.Storage: Storage directory \tmp\hadoop-SHIYANJUN\dfs\ name has been successfully formatted. 08/09/21 17:32:45 INFO dfs.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at cbbd2ce9428e48b/192.168.151.201 ************************************************************/ 第一行: cygpath: cannot create short name of g:\hadoop-0.16.4\logs 查找日志输出目录没有找到,只要你在 g:\hadoop-0.16.4 目录中创建一个 logs 目录就可以了。 上面第一组“STARTUP_MSG”指定了一些启动信息,可以看到涉及到了如下相关 信息: 启动目标:NameNode 主机信息:cbbd2ce9428e48b/192.168.151.201 命令行参数:[-format] 版本控制:0.16.4 Build 信息: http://svn.apache.org/repos/asf/hadoop/core/branches/bran ch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 可见,在格式化 Hadoop 分布式文件系统(HDFS)的时候,是通过启动 NameNode 来进行完成初始化工作。 而且,在格式化 HDFS 之后,NameNode 进程又终止了运行。 第二组显示了 FS 和 DFS 的一些信息,即本地文件系统和 Hadoop 分布式文件系统 的信息。NameNode 在初始化 HDFS 之前,要对当前系统配置进行查看,包括 FS 的 fsOwner、supergroup 组、是否具备权限,然后执行 HDFS 的格式化操作。 根据输出信息可知,目录\tmp\hadoop-SHIYANJUN\dfs\name 已经被格式化,可 以查看本地文件系统,查看生成的 DFS 及其信息,如图所示: 果然初始化了一个\tmp\hadoop-SHIYANJUN\dfs\name,这里的 \tmp\hadoop-SHIYANJUN\dfs 就是 HDFS,NameNode 对应着 \tmp\hadoop-SHIYANJUN\dfs\name 目录,可以看到该目录下的两个目录,其中 多了几个文件,其中\current 目录下有 edits、fsimage、fstime 和 VERSION 着 四个文件,\image 目录下有 fsimage 一个文件,他们都与 NameNode 在后面的分 布式计算中有用的。 2、启动 Hadoop 进程 启动 Hadoop 进程使用命令: $ bin/start-all.sh 这个启动过程需要做很多工作了,用的时间也比较长。启动输出信息如下所示: starting namenode, logging to /cygdrive/g/hadoop-0.16.4/bin/../logs/hadoop-SHIYA NJUN-namenode-cbbd2ce9428e48b.out localhost: starting datanode, logging to /cygdrive/g/hadoop-0.16.4/bin/../logs/h adoop-SHIYANJUN-datanode-cbbd2ce9428e48b.out localhost: starting secondarynamenode, logging to /cygdrive/g/hadoop-0.16.4/bin/ ../logs/hadoop-SHIYANJUN-secondarynamenode-cbbd2ce9428e48b.out starting jobtracker, logging to /cygdrive/g/hadoop-0.16.4/bin/../logs/hadoop-SHI YANJUN-jobtracker-cbbd2ce9428e48b.out localhost: starting tasktracker, logging to /cygdrive/g/hadoop-0.16.4/bin/../log s/hadoop-SHIYANJUN-tasktracker-cbbd2ce9428e48b.out 从这个启动信息能够看出,启动了 namenode、datanode、secondarynamenode、 jobtracker 和 tasktracker 这 5 个进程,而且将启动的信息登录到日志文件中, 如图所示生成的日志文件: 日志正好对应于上面的五个进程,其中五个进程的日志文件(以.log 作为文件扩 展名)中已经记录了启动的日志信息,看一下 NameNode 的日志信息,如下所示: 2008-09-21 18:10:27,812 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:10:28,375 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-09-21 18:10:28,421 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: localhost/127.0.0.1:9000 2008-09-21 18:10:28,437 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-09-21 18:10:28,640 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=SHIYANJUN,None,root,Administrators,Users 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-21 18:10:31,062 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 2266 msecs 2008-09-21 18:10:31,078 INFO org.apache.hadoop.fs.FSNamesystem: Leaving safemode after 2282 msecs 2008-09-21 18:10:31,078 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-09-21 18:10:31,078 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-09-21 18:10:31,093 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-09-21 18:10:31,359 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:10:31,546 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:10:31,546 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:10:31,546 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:10:33,015 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@94cb8b 2008-09-21 18:10:33,281 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:10:33,296 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-09-21 18:10:33,296 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@15c62bc 2008-09-21 18:10:33,296 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-09-21 18:10:33,359 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-21 18:10:33,390 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2008-09-21 18:10:33,781 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2008-09-21 18:10:33,968 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2008-09-21 18:10:33,968 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting 2008-09-21 18:10:57,312 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-1069829945-192.168.151.201-50010-1221991857296 2008-09-21 18:10:57,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010 2008-09-21 18:11:47,250 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /tmp/hadoop-SHIYANJUN/mapred/system because it does not exist 2008-09-21 18:11:47,250 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 2008-09-21 18:16:27,640 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 127.0.0.1 2008-09-21 18:16:27,640 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 5 Total time for transactions(ms): 0 Number of syncs: 3 SyncTimes(ms): 62 2008-09-21 18:16:30,171 INFO org.apache.hadoop.fs.FSNamesystem: Roll FSImage from 127.0.0.1 2008-09-21 18:16:30,171 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 如果启动过程出现异常,可以查看日志文件来查找启动失败的原因。 你也可以使用如下命令查看当前运行的进程: $ ps -ef 显示结果如下所示: 此时已经启动了 5 个 Java 进程。 你还可以在 tmp\hadoop-SHIYANJUN\dfs\name 目录中看到一个 in_use.lock 文 件,也就是说 NameNode 已经启动了,并且当前正在使用,被上锁了。 在\tmp 目录中有多了一个\hadoop-SYSTEM 目录,它是Hadoop 的系统目录,此时 生成的目录结构如图所示: 其中,tmp\hadoop-SYSTEM\dfs\data 目录是与 DataNode 相对应的, tmp\hadoop-SYSTEM\dfs\namesecondary 目录是与 secondarynamenode 结点对应 的。在 tmp\hadoop-SYSTEM 的某些目录中也生成了一些文件,在此不多说了。 3、将本地待处理数据文件复制到 HDFS 使用如下命令: $ bin/hadoop dfs -put ./input input 这时,在 tmp\hadoop-SYSTEM\dfs\data\current 的目录中,除了 dncp_block_verification.log.curr 和 VERSION 两个文件以外,又生成了很多以“blk”开头的文件,这些就是复制 本地 input 目录中的文件到 HDFS 中以块的形式存储,并且生成元数据, tmp\hadoop-SYSTEM\dfs\data\current 的目录中的文件如下所示: G:\tmp\hadoop-SYSTEM\dfs\data\current 的目录 2008-09-21 18:10 . 2008-09-21 18:10 .. 2008-09-21 18:10 167 VERSION 2008-09-21 18:10 480 dncp_block_verification.log.curr 2008-09-21 18:39 87 blk_7287293315123774920.meta 2008-09-21 18:39 10,109 blk_7287293315123774920 2008-09-21 18:39 23 blk_3670615963974276357.meta 2008-09-21 18:39 1,957 blk_3670615963974276357 2008-09-21 18:39 23 blk_125370523583213471.meta 2008-09-21 18:39 1,987 blk_125370523583213471 2008-09-21 18:39 23 blk_-8983105898459096464.meta 2008-09-21 18:39 1,957 blk_-8983105898459096464 2008-09-21 18:39 23 blk_-6348337313643072566.meta 2008-09-21 18:39 1,985 blk_-6348337313643072566 2008-09-21 18:39 23 blk_-140532538463136818.meta 2008-09-21 18:39 1,957 blk_-140532538463136818 2008-09-21 18:39 23 blk_2961784518440227574.meta 2008-09-21 18:39 1,957 blk_2961784518440227574 16 个文件 22,781 字节 2 个目录 2,220,326,912 可用字节 我在本地的 input 目录中准备了 7 个 TXT 文件,复制到 HDFS 的过程中,每个 TXT 文件对应了 HDFS 中(即 tmp\hadoop-SYSTEM\dfs\data\current 目录中)7 个块 (Block)文件,其中每个块(Block)文件中的内容就是本地 FS 中的对应的 TXT 文 件的内容,例如我拿 blk_125370523583213471 打开可以看到: apache apache bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache hash bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash shirdrn bash apache apache bash bash apache apache bash bash shirdrn apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache fax bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache find apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash bash apache apache bash 4、启动任务的执行 使用下面的命令开始执行: $ bin/hadoop jar hadoop-0.16.4-examples.jar wordcount input output 执行过程信息如下所示: 08/09/21 18:54:11 INFO mapred.FileInputFormat: Total input paths to process : 7 08/09/21 18:54:13 INFO mapred.JobClient: Running job: job_200809211811_0001 08/09/21 18:54:14 INFO mapred.JobClient: map 0% reduce 0% 08/09/21 18:54:34 INFO mapred.JobClient: map 28% reduce 0% 08/09/21 18:54:45 INFO mapred.JobClient: map 28% reduce 4% 08/09/21 18:54:46 INFO mapred.JobClient: map 42% reduce 4% 08/09/21 18:54:47 INFO mapred.JobClient: map 57% reduce 4% 08/09/21 18:54:52 INFO mapred.JobClient: map 57% reduce 9% 08/09/21 18:54:57 INFO mapred.JobClient: map 85% reduce 19% 08/09/21 18:55:02 INFO mapred.JobClient: map 100% reduce 19% 08/09/21 18:55:07 INFO mapred.JobClient: map 100% reduce 28% 08/09/21 18:55:11 INFO mapred.JobClient: map 100% reduce 100% 08/09/21 18:55:12 INFO mapred.JobClient: Job complete: job_200809211811_0001 08/09/21 18:55:12 INFO mapred.JobClient: Counters: 12 08/09/21 18:55:12 INFO mapred.JobClient: Job Counters 08/09/21 18:55:12 INFO mapred.JobClient: Launched map tasks=7 08/09/21 18:55:12 INFO mapred.JobClient: Launched reduce tasks=1 08/09/21 18:55:12 INFO mapred.JobClient: Data-local map tasks=7 08/09/21 18:55:12 INFO mapred.JobClient: Map-Reduce Framework 08/09/21 18:55:12 INFO mapred.JobClient: Map input records=7 08/09/21 18:55:12 INFO mapred.JobClient: Map output records=3649 08/09/21 18:55:12 INFO mapred.JobClient: Map input bytes=21909 08/09/21 18:55:12 INFO mapred.JobClient: Map output bytes=36511 08/09/21 18:55:12 INFO mapred.JobClient: Combine input records=3649 08/09/21 18:55:12 INFO mapred.JobClient: Combine output records=21 08/09/21 18:55:12 INFO mapred.JobClient: Reduce input groups=7 08/09/21 18:55:12 INFO mapred.JobClient: Reduce input records=21 08/09/21 18:55:12 INFO mapred.JobClient: Reduce output records=7 执行完成以后,可以在 tmp\hadoop-SYSTEM\dfs\data\current 目录中看到多了 一个块(Block)文件,我这里是 blk_6547411606566553711 及其对应的元数据 blk_6547411606566553711.meta,打开blk_6547411606566553711 文件,内容如 下所示: apache 1826 baketball 1 bash 1813 fax 2 find 1 hash 1 shirdrn 5 看到了,这就是最终的结果。 执行过程,肯定会把运行信息登录到日志文件中的,日志文件也很庞大而且详细, 如下所示: G:\>tree G:\hadoop-0.16.4\logs /A /F 文件夹 PATH 列表 卷序列号为 D275-ECF3 G:\HADOOP-0.16.4\LOGS | hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.out | hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log | hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.out | hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.log | hadoop-SHIYANJUN-secondarynamenode-cbbd2ce9428e48b.out | hadoop-SHIYANJUN-jobtracker-cbbd2ce9428e48b.out | hadoop-SHIYANJUN-secondarynamenode-cbbd2ce9428e48b.log | hadoop-SHIYANJUN-jobtracker-cbbd2ce9428e48b.log | hadoop-SHIYANJUN-tasktracker-cbbd2ce9428e48b.out | hadoop-SHIYANJUN-tasktracker-cbbd2ce9428e48b.log | +---history | JobHistory.log | 1221994453046_job_200809211811_0001 | job_200809211811_0001_conf.xml | \---userlogs +---task_200809211811_0001_m_000000_0 | stdout | stderr | syslog | +---task_200809211811_0001_m_000001_0 | stdout | stderr | syslog | +---task_200809211811_0001_r_000000_0 | stdout | stderr | syslog | +---task_200809211811_0001_m_000002_0 | stdout | stderr | syslog | +---task_200809211811_0001_m_000003_0 | stdout | stderr | syslog | +---task_200809211811_0001_m_000004_0 | stdout | stderr | syslog | +---task_200809211811_0001_m_000005_0 | stdout | stderr | syslog | \---task_200809211811_0001_m_000006_0 stdout stderr syslog 5、终止 Hadoop 进程 使用如下命令: $ bin/stop-all.sh 终止 Hadoop 进程过程: stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程 (2) 上一篇已经对 WordCount 这个工具例子的运行过程进行了追踪。这里,主要 通过查看日志,了解一下执行那样一个任务的一些细节信息,做一个总结。其实, 分析日志也是一个不错的方式。启动 Hadoop 进程之后,即执行命令:就已经针 对各个进程的启动状态,生成了日志信息(位于 G:\hadoop-0.16.4\logs 目录 下)。 下面,基本基于 5 个进程的启动的时间顺序来分析日志,以 5 个进程为标题, 并围绕这 5 个进程来进行阐述。 一、NameNode 启动 先看 NameNode 的日志文件 hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log 信息如下: 2008-09-21 18:10:27,812 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:10:28,375 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=9000 2008-09-21 18:10:28,421 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: localhost/127.0.0.1:9000 2008-09-21 18:10:28,437 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null 2008-09-21 18:10:28,640 INFO org.apache.hadoop.dfs.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=SHIYANJUN,None,root,Administrators,Users 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-21 18:10:30,828 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-21 18:10:31,062 INFO org.apache.hadoop.fs.FSNamesystem: Finished loading FSImage in 2266 msecs 2008-09-21 18:10:31,078 INFO org.apache.hadoop.fs.FSNamesystem: Leaving safemode after 2282 msecs 2008-09-21 18:10:31,078 INFO org.apache.hadoop.dfs.StateChange: STATE* Network topology has 0 racks and 0 datanodes 2008-09-21 18:10:31,078 INFO org.apache.hadoop.dfs.StateChange: STATE* UnderReplicatedBlocks has 0 blocks 2008-09-21 18:10:31,093 INFO org.apache.hadoop.fs.FSNamesystem: Registered FSNamesystemStatusMBean 2008-09-21 18:10:31,359 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:10:31,546 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:10:31,546 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:10:31,546 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:10:33,015 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@94cb8b 2008-09-21 18:10:33,281 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:10:33,296 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50070 2008-09-21 18:10:33,296 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@15c62bc 2008-09-21 18:10:33,296 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up at: 0.0.0.0:50070 2008-09-21 18:10:33,359 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-21 18:10:33,390 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9000: starting 2008-09-21 18:10:33,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9000: starting 2008-09-21 18:10:33,781 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9000: starting 2008-09-21 18:10:33,968 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9000: starting 2008-09-21 18:10:33,968 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000: starting 通过日志分析,可以看到在这个过程中 NameNode 做了如下工作: 1、启动 NameNode 结点:Names 结点在本机(具体为:localhost/127.0.0.1:9000) 上启动。在真实分布式环境中 NameNode 与 DaraNode 不能在同一主机上,但是 NameNode 可以同 JobTracer 在同一主机上。 2、完成 NameNode 启动之后,需要做很多初始化工作:在本地文件系统 FS 中加 载 FSImage 文件,可以在 tmp\hadoop-SHIYANJUN\dfs\name\image 下看到。 3、记录 HDFS 的状态:通过使用 org.apache.hadoop.dfs.StateChange 来实现 HDFS 状态的获取。 4、本机 FS 注册,并启动 HDFS 容器及其初始化上下文。同时在本机启动 Web-server,建立 Socket 并进行监听(0.0.0.0:50070)。因为本机上设置了 NameNode 结点,在处理任务过程中需要接收 DataNode 结点的必要信息。 5、建立 IPC(Internet Process Connection),为了让进程间通信而开放的命名 管道,可以通过验证在远程管理计算机和查看计算机的共享资源时使用。其中, 利用 IPC 可以与目标主机建立一个空的连接(无需用户名与密码)。NameNode 结 点需要与 DataNode 结点所在主机相互通讯,DataNode 所在主机在执行数据处理 任务而进行注册的时候,必须建立 IPC 连接。NameNode 监听 localhost/127.0.0.1:9000:9000 端口,这个端口是我们在 hadoop-site.xml 中 配置的: fs.default.name localhost:9000 属性 fs.default.name 对应的就是 NameNode。 二、启动 DataNode 当 IPC 建立完成以后,开始启动 DataNode 结点了,可以查看到 DataNode 的日志, 在 hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.log 日志中的信息: 2008-09-21 18:10:52,453 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:10:53,718 INFO org.apache.hadoop.dfs.Storage: Storage directory \tmp\hadoop-SYSTEM\dfs\data is not formatted. 2008-09-21 18:10:53,718 INFO org.apache.hadoop.dfs.Storage: Formatting ... 2008-09-21 18:10:54,328 INFO org.apache.hadoop.dfs.DataNode: Registered FSDatasetStatusMBean 2008-09-21 18:10:54,328 INFO org.apache.hadoop.dfs.DataNode: Opened server at 50010 2008-09-21 18:10:54,343 INFO org.apache.hadoop.dfs.DataNode: Balancing bandwith is 1048576 bytes/s 2008-09-21 18:10:54,671 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:10:54,875 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:10:54,906 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:10:54,906 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:10:56,125 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@1f0aecc 2008-09-21 18:10:56,390 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:10:56,421 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50075 2008-09-21 18:10:56,421 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@5ddb6e 2008-09-21 18:10:57,343 INFO org.apache.hadoop.dfs.DataNode: New storage id DS-1069829945-192.168.151.201-50010-1221991857296 is assigned to data-node 127.0.0.1:50010 2008-09-21 18:10:57,375 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2008-09-21 18:10:57,609 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010In DataNode.run, data = FSDataset{dirpath='g:\tmp\hadoop-SYSTEM\dfs\data\current'} 2008-09-21 18:10:57,609 INFO org.apache.hadoop.dfs.DataNode: using BLOCKREPORT_INTERVAL of 3401027msec Initial delay: 60000msec DataNode 启动过程中做了如下工作: 1、启动 DataNode 结点之前,可以从启动信息中查看,与 NameNode 很相似。 2、格式化 DataNode 的文件系统,即 tmp\hadoop-SYSTEM\dfs\data,这里, DataNode 对应 tmp\hadoop-SYSTEM\dfs 目录下的 data 目录,而 NameNode 结点 对应于 tmp\hadoop-SHIYANJUN\dfs\name 目录,真实分布式计算环境中,它们不 是在同一主机上,这里模拟通过使用 hadoop-SHIYANJUN 和 hadoop-SYSTEM 来分 离。 3、注册,启动 Server,初始化上下文,建立 Socket 并进行监听(0.0.0.0:50075)。 4、启动 DataNode,初始化其文件系统目录信息 FSDataset{dirpath='g:\tmp\hadoop-SYSTEM\dfs\data\current'}。 在 DataNode 注册的过程中,他会向 NameNode 发送请求,同时等待 NameNode 回 应,可以在 NameNode 的日志文件 hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log 中根据时间线索信息查看 到: 2008-09-21 18:10:57,312 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 127.0.0.1:50010 storage DS-1069829945-192.168.151.201-50010-1221991857296 2008-09-21 18:10:57,328 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/127.0.0.1:50010 也就是上面 DataNode 启动了而且要注册,来自 127.0.0.1:50010 的注册请求信 息就是指这个 DataNode,同时,NameNode 向默认机架 ( /default-rack/127.0.0.1:50010)添加一个新的结点。 之后,DataNode 会即使向 NameNode 提交实时报告信息,可以查看 DataNode 的 日志,依据时间线索信息在 hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.log 日志中的信息: 2008-09-21 18:10:58,718 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 0 blocks got processed in 15 msecs 2008-09-21 18:39:20,000 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 0 blocks got processed in 16 msecs 进行了两次报告,表示没有获取到要处理的数据块(Block)。 三、启动 SecondaryNameNode 在 DataNode 启动完成时候,通过 SecondaryNameNode 的日志,可以在 hadoop-SHIYANJUN-secondarynamenode-cbbd2ce9428e48b.log 日志文件中看 到: 2008-09-21 18:11:23,312 INFO org.apache.hadoop.dfs.NameNode.Secondary: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting SecondaryNameNode STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:11:23,765 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SecondaryNameNode, sessionId=null 2008-09-21 18:11:24,906 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:11:25,312 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:11:25,312 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:11:25,312 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:11:26,765 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@16ea269 2008-09-21 18:11:27,234 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:11:27,312 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50090 2008-09-21 18:11:27,312 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@b754b2 2008-09-21 18:11:27,312 INFO org.apache.hadoop.dfs.NameNode.Secondary: Secondary Web-server up at: 0.0.0.0:50090 2008-09-21 18:11:27,328 WARN org.apache.hadoop.dfs.NameNode.Secondary: Checkpoint Directory:\tmp\hadoop-SYSTEM\dfs\namesecondary 2008-09-21 18:11:27,328 WARN org.apache.hadoop.dfs.NameNode.Secondary: Checkpoint Period :3600 secs (60 min) 2008-09-21 18:11:27,328 WARN org.apache.hadoop.dfs.NameNode.Secondary: Log Size Trigger :67108864 bytes (65536 KB) SecondaryNameNode 启动过程中所做的工作如下: 1、启动 SecondaryNameNode 之前,处理一些基本信息,可以查看日志中 STARTUP_MSG。 2、建立 IPC 连接,初始化上下文,建立 Socket 连接,并进行监听(0.0.0.0:50090)。 3、建立 Checkpoint Directory:\tmp\hadoop-SYSTEM\dfs\namesecondary,创 建 org.apache.hadoop.dfs.NameNode.Secondary: Log Size Trigger。 四、启动 JobTracer 然后,JobTracer 要启动了,可以分析其对应的日志信息,在 hadoop-SHIYANJUN-jobtracker-cbbd2ce9428e48b.log 日志文件中可以看到: 2008-09-21 18:11:40,890 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:11:41,515 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=9001 2008-09-21 18:11:41,578 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-21 18:11:41,578 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9001: starting 2008-09-21 18:11:41,578 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 9001: starting 2008-09-21 18:11:41,578 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 3 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 5 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 6 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 8 on 9001: starting 2008-09-21 18:11:41,593 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9001: starting 2008-09-21 18:11:41,875 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:11:42,031 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:11:42,031 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:11:42,031 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:11:43,265 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@b4be3d 2008-09-21 18:11:43,640 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:11:43,718 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50030 2008-09-21 18:11:43,718 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@90832e 2008-09-21 18:11:43,718 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2008-09-21 18:11:44,218 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 9001 2008-09-21 18:11:44,218 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2008-09-21 18:11:47,390 INFO org.apache.hadoop.mapred.JobTracker: Starting RUNNING JobTracer 启动过程中所做的工作如下: 1、启动 JobTracer 之前,处理一些基本信息,可以查看日志中 STARTUP_MSG。 2、建立 IPC 连接,初始化上下文,建立 Socket 连接,并进行监听(0.0.0.0:50030)。 JobTracker 设置的是本地 9001 端口,这个端口号我们在 hadoop-site.xml 中配 置的: mapred.job.tracker localhost:9001 3、启动并运行 JobTracer。 JobTracer 建立了 IPC 连接,此时 NameNode 结点主机已经接收到这个信息,同 时给予响应,可以看 NameNode 的日志文件 hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log,根据时间线索信息: 2008-09-21 18:11:47,250 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /tmp/hadoop-SHIYANJUN/mapred/system because it does not exist 2008-09-21 18:11:47,250 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 因为启动了 JobTracer,可以进行任务的分配和调度了,NameNode 将 HDFS 的状 态变化信息登录到日志。这里,JobTracer 和 NameNode 是在同一主机上配置的, 可以在 tmp\hadoop-SHIYANJUN\mapred\local 目录中看到建立的 JobTracer 目 录。 五、启动 TaskTracker 接着,TaskTracer 就要启动了。可以查看 TaskTracer 的日志文件,可以在 hadoop-SHIYANJUN-tasktracker-cbbd2ce9428e48b.log 文件中查看到: 2008-09-21 18:11:58,718 INFO org.apache.hadoop.mapred.TaskTracker: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting TaskTracker STARTUP_MSG: host = cbbd2ce9428e48b/192.168.151.201 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.16.4 STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 ************************************************************/ 2008-09-21 18:11:59,625 INFO org.mortbay.util.Credential: Checking Resource aliases 2008-09-21 18:12:00,890 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2008-09-21 18:12:00,906 INFO org.mortbay.util.Container: Started HttpContext[/static,/static] 2008-09-21 18:12:00,906 INFO org.mortbay.util.Container: Started HttpContext[/logs,/logs] 2008-09-21 18:12:01,984 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.WebApplicationHandler@b754b2 2008-09-21 18:12:02,187 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2008-09-21 18:12:02,203 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50060 2008-09-21 18:12:02,203 INFO org.mortbay.util.Container: Started org.mortbay.jetty.Server@1e3118a 2008-09-21 18:12:02,281 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=TaskTracker, sessionId= 2008-09-21 18:12:02,515 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=TaskTracker, port=4881 2008-09-21 18:12:02,718 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2008-09-21 18:12:02,718 INFO org.apache.hadoop.mapred.TaskTracker: TaskTracker up at: localhost/127.0.0.1:4881 2008-09-21 18:12:02,718 INFO org.apache.hadoop.mapred.TaskTracker: Starting tracker tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881 2008-09-21 18:12:02,718 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 4881: starting 2008-09-21 18:12:02,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 4881: starting 2008-09-21 18:12:02,718 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 4881: starting 2008-09-21 18:12:02,937 INFO org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for all reduce tasks on tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881 TaskTracer 启动过程中所做的工作如下: 1、启动 TaskTracer 之前,处理一些基本信息,可以查看日志中 STARTUP_MSG。 2、建立 IPC 连接,初始化上下文,建立 Socket 连接,并进行监听(0.0.0.0:50060)。 TaskTracer 设置的是 localhost/127.0.0.1:4881。 3、启动并运行 TaskTracer:TaskTracer 启动了用于进行 Map 任务的线程,准备 对待处理数据进行 Map 任务处理工作。 空闲状态 到目前为止,所有的进行任务的进程都已经处于就绪状态,只要 Master 执行任 务的分配和调度,各个进程就开始进行工作了。 查看 NameNode 的日志文件,在 hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log 文件中,根据时间线索信 息,可以看到: 2008-09-21 18:16:27,640 INFO org.apache.hadoop.fs.FSNamesystem: Roll Edit Log from 127.0.0.1 2008-09-21 18:16:27,640 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 5 Total time for transactions(ms): 0 Number of syncs: 3 SyncTimes(ms): 62 当前 Hadoop 系统所处的状态,NameNode 实时检测。 这时,SecondaryNameNode 已经处于运作状态了,通过 SecondaryNameNode 的日 志,可以在 hadoop-SHIYANJUN-secondarynamenode-cbbd2ce9428e48b.log 日志 文件中,根据时间线索信息,看到: 2008-09-21 18:16:28,546 INFO org.apache.hadoop.dfs.NameNode.Secondary: Downloaded file \tmp\hadoop-SYSTEM\dfs\namesecondary\srcimage.tmp size 16 bytes. 2008-09-21 18:16:28,609 INFO org.apache.hadoop.dfs.NameNode.Secondary: Downloaded file \tmp\hadoop-SYSTEM\dfs\namesecondary\edits.tmp size 312 bytes. 2008-09-21 18:16:29,640 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=SHIYANJUN,None,root,Administrators,Users 2008-09-21 18:16:29,656 INFO org.apache.hadoop.fs.FSNamesystem: supergroup=supergroup 2008-09-21 18:16:29,656 INFO org.apache.hadoop.fs.FSNamesystem: isPermissionEnabled=true 2008-09-21 18:16:29,765 INFO org.apache.hadoop.dfs.NameNode.Secondary: Posted URL 0.0.0.0:50070putimage=1&port=50090&machine=192.168.151.201&token=1221992188000 2008-09-21 18:16:30,031 INFO org.apache.hadoop.dfs.NameNode.Secondary: New Image \tmp\hadoop-SYSTEM\dfs\namesecondary\destimage.tmp retrieved by Namenode. 2008-09-21 18:16:30,328 WARN org.apache.hadoop.dfs.NameNode.Secondary: Checkpoint done. Image Size:16 Edit Size:312 New Image Size:292 从上面可以看到,SecondaryNameNode 进程执行下载任务,下载了 2 个文件: tmp\hadoop-SYSTEM\dfs\namesecondary\srcimage.tmp 和 tmp\hadoop-SYSTEM\dfs\namesecondary\edits.tmp,可以到本地文件系统中查 看到的。 因为在真实的分布式计算环境中,SecondaryNameNode 和 NameNode 不在同一个 主机上,例如在 WordCount 这个工具的例子中,SecondaryNameNode 是在 DataNode 所在的主机上,它在请求下载文件的时候,与NameNode 所在主机多次进行通信, 在通讯过程中需要处理认证问题的。 NameNode 进程继续检测 Hadoop 系统状态: 2008-09-21 18:16:30,171 INFO org.apache.hadoop.fs.FSNamesystem: Roll FSImage from 127.0.0.1 2008-09-21 18:16:30,171 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 0 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 现在,可以输入命令开始执行任务(比如复制任务)了。 复制待处理数据文件 我们使用如下命令: $ bin/hadoop dfs -put ./input input 开始执行复制任务了。 首先,查看 NameNode 的日志文件,在 hadoop-SHIYANJUN-namenode-cbbd2ce9428e48b.log 文件中,根据时间线索信 息,可以看到: 2008-09-21 18:39:45,281 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 3 Total time for transactions(ms): 0 Number of syncs: 0 SyncTimes(ms): 0 2008-09-21 18:39:45,468 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/b.txt. blk_7287293315123774920 2008-09-21 18:39:45,765 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_7287293315123774920 size 10109 2008-09-21 18:39:46,265 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/c.txt. blk_3670615963974276357 2008-09-21 18:39:46,390 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_3670615963974276357 size 1957 2008-09-21 18:39:46,437 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/d.txt. blk_125370523583213471 2008-09-21 18:39:46,546 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_125370523583213471 size 1987 2008-09-21 18:39:46,593 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/e.txt. blk_-8983105898459096464 2008-09-21 18:39:46,718 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_-8983105898459096464 size 1957 2008-09-21 18:39:46,796 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/f.txt. blk_-6348337313643072566 2008-09-21 18:39:46,890 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_-6348337313643072566 size 1985 2008-09-21 18:39:46,953 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/g.txt. blk_-140532538463136818 2008-09-21 18:39:47,062 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_-140532538463136818 size 1957 2008-09-21 18:39:47,109 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/input/a.txt. blk_2961784518440227574 2008-09-21 18:39:47,203 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_2961784518440227574 size 1957 复制的文件共有 7 个,按照 NameNode 指派的复制顺序为:b.txt、c.txt、d.txt、 e.txt、f.txt、g.txt、a.txt,在复制过程中,即使更新 HDFS 的改变状态信息, 可以从上面的日志中看到详细的情况。 再看一下 DataNode 的日志信息,可以在 hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.log 文件中,根据时间线索信 息,可以看到: 2008-09-21 18:39:20,000 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 0 blocks got processed in 16 msecs 2008-09-21 18:39:45,562 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_7287293315123774920 src: /127.0.0.1:1854 dest: /127.0.0.1:50010 2008-09-21 18:39:45,609 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:45,750 INFO org.apache.hadoop.dfs.DataNode: Received block blk_7287293315123774920 of size 10109 from /127.0.0.1 2008-09-21 18:39:45,750 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_7287293315123774920 terminating 2008-09-21 18:39:46,281 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_3670615963974276357 src: /127.0.0.1:1856 dest: /127.0.0.1:50010 2008-09-21 18:39:46,281 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:46,390 INFO org.apache.hadoop.dfs.DataNode: Received block blk_3670615963974276357 of size 1957 from /127.0.0.1 2008-09-21 18:39:46,390 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_3670615963974276357 terminating 2008-09-21 18:39:46,437 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_125370523583213471 src: /127.0.0.1:1857 dest: /127.0.0.1:50010 2008-09-21 18:39:46,453 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:46,546 INFO org.apache.hadoop.dfs.DataNode: Received block blk_125370523583213471 of size 1987 from /127.0.0.1 2008-09-21 18:39:46,546 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_125370523583213471 terminating 2008-09-21 18:39:46,609 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-8983105898459096464 src: /127.0.0.1:1858 dest: /127.0.0.1:50010 2008-09-21 18:39:46,609 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:46,718 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-8983105898459096464 of size 1957 from /127.0.0.1 2008-09-21 18:39:46,734 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-8983105898459096464 terminating 2008-09-21 18:39:46,796 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-6348337313643072566 src: /127.0.0.1:1859 dest: /127.0.0.1:50010 2008-09-21 18:39:46,796 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:46,890 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-6348337313643072566 of size 1985 from /127.0.0.1 2008-09-21 18:39:46,890 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-6348337313643072566 terminating 2008-09-21 18:39:46,968 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-140532538463136818 src: /127.0.0.1:1860 dest: /127.0.0.1:50010 2008-09-21 18:39:46,968 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:47,062 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-140532538463136818 of size 1957 from /127.0.0.1 2008-09-21 18:39:47,078 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-140532538463136818 terminating 2008-09-21 18:39:47,125 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_2961784518440227574 src: /127.0.0.1:1861 dest: /127.0.0.1:50010 2008-09-21 18:39:47,125 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:39:47,218 INFO org.apache.hadoop.dfs.DataNode: Received block blk_2961784518440227574 of size 1957 from /127.0.0.1 2008-09-21 18:39:47,234 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_2961784518440227574 terminating 2008-09-21 18:40:50,859 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_7287293315123774920 2008-09-21 18:43:44,187 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_3670615963974276357 2008-09-21 18:45:01,328 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-8983105898459096464 2008-09-21 18:47:08,421 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_-140532538463136818 2008-09-21 18:47:29,421 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_2961784518440227574 2008-09-21 18:47:56,421 INFO org.apache.hadoop.dfs.DataBlockScanner: Verification succeeded for blk_125370523583213471 在复制文件的过程中,主要是NameNode 和 DataNode 这两个进程在交互进行,完 成待处理数据的复制,并更新 HDFS 的改变状态。 运行 WordCount 工具 使用如下命令开始运行 WordCount 工具,进行词频统计任务: $ bin/hadoop jar hadoop-0.16.4-examples.jar wordcount input output 这次,除了 SecondaryNameNode 进程以外,其他的都已经处于运行状态了。 当然,还是从 NameNode 开始,来进行任务的部署: 2008-09-21 18:54:10,750 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001 because it does not exist 2008-09-21 18:54:10,750 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 24 Total time for transactions(ms): 16 Number of syncs: 16 SyncTimes(ms): 237 2008-09-21 18:54:10,984 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001/job.jar. blk_1451674918492452869 之后,JobTracer 和 TaskTracer 也进入工作状态了。 NameNode 要实时保持与各个进程进行通讯的,可以从它的日志文件看到: 2008-09-21 18:54:11,171 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_1451674918492452869 size 84034 2008-09-21 18:54:11,203 INFO org.apache.hadoop.fs.FSNamesystem: Increasing replication for file /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001/job.jar. New replication is 10 2008-09-21 18:54:11,781 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001/job.split. blk_-4098064083959376195 2008-09-21 18:54:11,921 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_-4098064083959376195 size 859 2008-09-21 18:54:12,375 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001/job.xml. blk_-1411929805036254458 2008-09-21 18:54:12,484 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_-1411929805036254458 size 12003 2008-09-21 18:55:09,968 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/SHIYANJUN/output/_temporary/_task_200809211811_0001_r_000000_0/part-00000. blk_6547411606566553711 2008-09-21 18:55:10,250 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50010 is added to blk_6547411606566553711 size 64 2008-09-21 18:55:11,453 INFO org.apache.hadoop.fs.FSNamesystem: Number of transactions: 59 Total time for transactions(ms): 32 Number of syncs: 47 SyncTimes(ms): 1656 2008-09-21 18:55:11,500 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.delete: blk_1451674918492452869 is added to invalidSet of 127.0.0.1:50010 2008-09-21 18:55:11,500 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.delete: blk_-4098064083959376195 is added to invalidSet of 127.0.0.1:50010 2008-09-21 18:55:11,500 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.delete: blk_-1411929805036254458 is added to invalidSet of 127.0.0.1:50010 2008-09-21 18:55:11,515 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001 because it does not exist 2008-09-21 18:55:13,625 INFO org.apache.hadoop.dfs.StateChange: BLOCK* NameSystem.blockToInvalidate: ask 127.0.0.1:50010 to delete blk_-4098064083959376195 blk_1451674918492452869 blk_-1411929805036254458 看一下 DataNode 的日志信息,可以在 hadoop-SHIYANJUN-datanode-cbbd2ce9428e48b.log 文件中,根据时间线索信 息,可以看到: 2008-09-21 18:54:11,031 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_1451674918492452869 src: /127.0.0.1:2349 dest: /127.0.0.1:50010 2008-09-21 18:54:11,031 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:54:11,171 INFO org.apache.hadoop.dfs.DataNode: Received block blk_1451674918492452869 of size 84034 from /127.0.0.1 2008-09-21 18:54:11,171 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_1451674918492452869 terminating 2008-09-21 18:54:11,796 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-4098064083959376195 src: /127.0.0.1:2350 dest: /127.0.0.1:50010 2008-09-21 18:54:11,796 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:54:11,921 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-4098064083959376195 of size 859 from /127.0.0.1 2008-09-21 18:54:11,937 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-4098064083959376195 terminating 2008-09-21 18:54:12,390 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_-1411929805036254458 src: /127.0.0.1:2351 dest: /127.0.0.1:50010 2008-09-21 18:54:12,406 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:54:12,484 INFO org.apache.hadoop.dfs.DataNode: Received block blk_-1411929805036254458 of size 12003 from /127.0.0.1 2008-09-21 18:54:12,500 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_-1411929805036254458 terminating 2008-09-21 18:54:12,734 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-1411929805036254458 to /127.0.0.1 2008-09-21 18:54:13,015 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_1451674918492452869 to /127.0.0.1 2008-09-21 18:54:13,328 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-4098064083959376195 to /127.0.0.1 2008-09-21 18:54:14,718 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-1411929805036254458 to /127.0.0.1 2008-09-21 18:54:15,000 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_1451674918492452869 to /127.0.0.1 2008-09-21 18:54:26,687 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_7287293315123774920 to /127.0.0.1 2008-09-21 18:54:26,687 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_125370523583213471 to /127.0.0.1 2008-09-21 18:54:41,609 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-6348337313643072566 to /127.0.0.1 2008-09-21 18:54:42,703 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_2961784518440227574 to /127.0.0.1 2008-09-21 18:54:53,953 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_3670615963974276357 to /127.0.0.1 2008-09-21 18:54:54,531 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-8983105898459096464 to /127.0.0.1 2008-09-21 18:55:00,765 INFO org.apache.hadoop.dfs.DataNode: 127.0.0.1:50010 Served block blk_-140532538463136818 to /127.0.0.1 2008-09-21 18:55:10,000 INFO org.apache.hadoop.dfs.DataNode: Receiving block blk_6547411606566553711 src: /127.0.0.1:2433 dest: /127.0.0.1:50010 2008-09-21 18:55:10,000 INFO org.apache.hadoop.dfs.DataNode: Datanode 0 forwarding connect ack to upstream firstbadlink is 2008-09-21 18:55:10,250 INFO org.apache.hadoop.dfs.DataNode: Received block blk_6547411606566553711 of size 64 from /127.0.0.1 2008-09-21 18:55:10,265 INFO org.apache.hadoop.dfs.DataNode: PacketResponder 0 for block blk_6547411606566553711 terminating 2008-09-21 18:55:13,968 INFO org.apache.hadoop.dfs.DataNode: Deleting block blk_-4098064083959376195 file \tmp\hadoop-SYSTEM\dfs\data\current\blk_-4098064083959376195 2008-09-21 18:55:13,968 INFO org.apache.hadoop.dfs.DataNode: Deleting block blk_-1411929805036254458 file \tmp\hadoop-SYSTEM\dfs\data\current\blk_-1411929805036254458 2008-09-21 18:55:13,968 INFO org.apache.hadoop.dfs.DataNode: Deleting block blk_1451674918492452869 file \tmp\hadoop-SYSTEM\dfs\data\current\blk_1451674918492452869 这是 DataNode 结点上数据变化的详细情况。 JobTracer 和 TaskTracer 才是最忙碌的,它们要做很多的工作。 如下列出了 hadoop-SHIYANJUN-jobtracer-cbbd2ce9428e48b.log 文件中, JobTracer 执行任务的信息: 2008-09-21 18:54:13,500 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000000 2008-09-21 18:54:13,546 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000000_0' to tip tip_200809211811_0001_m_000000, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:17,406 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000001 2008-09-21 18:54:17,406 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000001_0' to tip tip_200809211811_0001_m_000001, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:22,953 INFO org.apache.hadoop.mapred.JobInProgress: Choosing normal task tip_200809211811_0001_r_000000 2008-09-21 18:54:22,968 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_r_000000_0' to tip tip_200809211811_0001_r_000000, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:33,359 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000002 2008-09-21 18:54:33,359 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000002_0' to tip tip_200809211811_0001_m_000002, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:33,843 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000003 2008-09-21 18:54:33,843 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000003_0' to tip tip_200809211811_0001_m_000003, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:33,953 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000000_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:33,953 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000000_0' has completed tip_200809211811_0001_m_000000 successfully. 2008-09-21 18:54:34,187 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000001_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:34,187 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000001_0' has completed tip_200809211811_0001_m_000001 successfully. 2008-09-21 18:54:45,953 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000004 2008-09-21 18:54:45,953 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000004_0' to tip tip_200809211811_0001_m_000004, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:46,109 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000002_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:46,109 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000002_0' has completed tip_200809211811_0001_m_000002 successfully. 2008-09-21 18:54:46,234 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000003_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:46,234 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000003_0' has completed tip_200809211811_0001_m_000003 successfully. 2008-09-21 18:54:47,093 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000005 2008-09-21 18:54:47,156 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000005_0' to tip tip_200809211811_0001_m_000005, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:57,093 INFO org.apache.hadoop.mapred.JobInProgress: Choosing cached task tip_200809211811_0001_m_000006 2008-09-21 18:54:57,093 INFO org.apache.hadoop.mapred.JobTracker: Adding task 'task_200809211811_0001_m_000006_0' to tip tip_200809211811_0001_m_000006, for tracker 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:54:57,140 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000004_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:57,140 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000004_0' has completed tip_200809211811_0001_m_000004 successfully. 2008-09-21 18:54:57,218 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000005_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:54:57,218 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000005_0' has completed tip_200809211811_0001_m_000005 successfully. 2008-09-21 18:55:02,484 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_m_000006_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:55:02,484 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_m_000006_0' has completed tip_200809211811_0001_m_000006 successfully. 2008-09-21 18:55:11,484 INFO org.apache.hadoop.mapred.TaskRunner: Saved output of task 'task_200809211811_0001_r_000000_0' to hdfs://localhost:9000/user/SHIYANJUN/output 2008-09-21 18:55:11,484 INFO org.apache.hadoop.mapred.JobInProgress: Task 'task_200809211811_0001_r_000000_0' has completed tip_200809211811_0001_r_000000 successfully. 2008-09-21 18:55:11,562 INFO org.apache.hadoop.mapred.JobInProgress: Job job_200809211811_0001 has completed successfully. 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000000_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000001_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000002_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000003_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000004_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000005_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_m_000006_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' 2008-09-21 18:55:12,562 INFO org.apache.hadoop.mapred.JobTracker: Removed completed task 'task_200809211811_0001_r_000000_0' from 'tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881' TaskTracer 执行,它的日志文件记录了在执行 WordCount 工具进行词频统计过 程中详细信息,包括执行进度信息,可以在 hadoop-SHIYANJUN-tasktracer-cbbd2ce9428e48b.log 文件中,如下所示: 2008-09-21 18:54:13,640 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000000_0 2008-09-21 18:54:17,437 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000001_0 2008-09-21 18:54:23,140 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_r_000000_0 2008-09-21 18:54:31,859 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000000_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/b.txt:0+10109 2008-09-21 18:54:32,015 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000001_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/d.txt:0+1987 2008-09-21 18:54:32,015 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000001_0 is done. 2008-09-21 18:54:32,140 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000000_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/b.txt:0+10109 2008-09-21 18:54:32,140 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000000_0 is done. 2008-09-21 18:54:32,218 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000001_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/d.txt:0+1987 2008-09-21 18:54:32,218 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000001_0 Ignoring status-update since task is 'done' 2008-09-21 18:54:33,390 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000002_0 2008-09-21 18:54:33,968 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000003_0 2008-09-21 18:54:37,031 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.0% reduce > copy > 2008-09-21 18:54:43,156 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.047619052% reduce > copy (1 of 7 at 0.00 MB/s) > 2008-09-21 18:54:44,796 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000002_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/f.txt:0+1985 2008-09-21 18:54:44,796 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000002_0 is done. 2008-09-21 18:54:45,328 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000003_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/a.txt:0+1957 2008-09-21 18:54:45,343 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000003_0 is done. 2008-09-21 18:54:45,968 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000004_0 2008-09-21 18:54:47,156 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000005_0 2008-09-21 18:54:49,234 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.095238104% reduce > copy (2 of 7 at 0.00 MB/s) > 2008-09-21 18:54:55,390 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.19047621% reduce > copy (4 of 7 at 0.00 MB/s) > 2008-09-21 18:54:56,437 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000004_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/c.txt:0+1957 2008-09-21 18:54:56,453 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000004_0 is done. 2008-09-21 18:54:56,609 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000005_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/e.txt:0+1957 2008-09-21 18:54:56,625 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000005_0 is done. 2008-09-21 18:54:57,093 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction: task_200809211811_0001_m_000006_0 2008-09-21 18:55:01,437 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.19047621% reduce > copy (4 of 7 at 0.00 MB/s) > 2008-09-21 18:55:01,765 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_m_000006_0 1.0% hdfs://localhost:9000/user/SHIYANJUN/input/g.txt:0+1957 2008-09-21 18:55:01,765 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_m_000006_0 is done. 2008-09-21 18:55:07,453 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.28571433% reduce > copy (6 of 7 at 0.00 MB/s) > 2008-09-21 18:55:10,312 INFO org.apache.hadoop.mapred.TaskTracker: task_200809211811_0001_r_000000_0 0.81237113% reduce > reduce 2008-09-21 18:55:10,312 INFO org.apache.hadoop.mapred.TaskTracker: Task task_200809211811_0001_r_000000_0 is done. 2008-09-21 18:55:12,593 INFO org.apache.hadoop.mapred.TaskTracker: Received 'KillJobAction' for job: job_200809211811_0001 2008-09-21 18:55:12,593 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000003_0 done; removing files. 2008-09-21 18:55:12,609 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000000_0 done; removing files. 2008-09-21 18:55:12,609 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000004_0 done; removing files. 2008-09-21 18:55:12,625 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_r_000000_0 done; removing files. 2008-09-21 18:55:12,625 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000006_0 done; removing files. 2008-09-21 18:55:12,640 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000002_0 done; removing files. 2008-09-21 18:55:12,656 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000005_0 done; removing files. 2008-09-21 18:55:12,656 INFO org.apache.hadoop.mapred.TaskRunner: task_200809211811_0001_m_000001_0 done; removing files. 一个 Job 就是这样进行执行的。 通过单机的 Hadoop 伪分布式运行理解 Hadoop 运行过程 (3) 任务开始启动,到执行完成以后,会导出一些便于查看的文件,比如:任务 配置和任务执行情况,通过读这些日志文件,能加深对 Hadoop 的认识。 在 logs\history 目录下可以看到,当前执行完成的任务的详细配置信息和 任务执行情况信息。 其中,job_200809211811_0001_conf.xml 文件,是此次任务执行,包括 Hadoop 配置信息在内的任务配置信息: mapred.tasktracker.dns.nameserverdefaultdfs.replication.interval3 mapred.mapper.classorg.apache.hadoop.examples.WordCount ipc.client.maxidletime120000 mapred.input.dirhdfs://localhost:9000/user/SHIYANJUN/in mapred.submit.replication10 dfs.safemode.extension30000 mapred.working.dirhdfs://localhost:9000/user/SHIYANJUN< ipc.server.tcpnodelayfalse dfs.web.ugiwebuser,webgroup fs.checkpoint.dir${hadoop.tmp.dir}/dfs/namesecondarymapred.job.split.file/tmp/hadoop-SHIYANJUN/mapred/syste dfs.permissions.supergroupsupergroup dfs.datanode.http.address0.0.0.0:50075mapred.output.compression.typeRECORD dfs.replication.min1 mapred.child.java.opts-Xmx200m dfs.datanode.dns.nameserverdefault keep.failed.task.filesfalse dfs.http.address0.0.0.0:50070 mapred.output.value.classorg.apache.hadoop.io.IntWritab io.bytes.per.checksum512 mapred.task.tracker.report.address127.0.0.1:0dfs.blockreport.intervalMsec3600000 fs.hdfs.implorg.apache.hadoop.dfs.DistributedFileSystem dfs.data.dir${hadoop.tmp.dir}/dfs/datafs.ramfs.implorg.apache.hadoop.fs.InMemoryFileSystemdfs.block.size67108864 fs.hftp.implorg.apache.hadoop.dfs.HftpFileSystemfs.checkpoint.period3600 user.nameSHIYANJUN mapred.child.tmp./tmp mapred.local.dir.minspacekill0 map.sort.classorg.apache.hadoop.mapred.MergeSorterhadoop.logfile.count10 ipc.client.connection.maxidletime1000mapred.output.dirhdfs://localhost:9000/user/SHIYANJUN/o io.map.index.skip0 mapred.tasktracker.expiry.interval600000io.seqfile.lazydecompresstrue mapred.output.compressfalse dfs.datanode.dns.interfacedefault mapred.reduce.parallel.copies5 fs.checkpoint.size67108864 mapred.job.namewordcount dfs.max.objects0 local.cache.size10737418240 mapred.userlog.limit.kb0 fs.file.implorg.apache.hadoop.fs.LocalFileSystemmapred.task.tracker.http.address0.0.0.0:50060mapred.task.timeout600000 fs.kfs.implorg.apache.hadoop.fs.kfs.KosmosFileSystemmapred.reducer.classorg.apache.hadoop.examples.WordCoun fs.s3.buffer.dir${hadoop.tmp.dir}/s3 hadoop.job.ugiSHIYANJUN,None,root,Administrators,Users< ipc.client.kill.max10 mapred.job.tracker.persist.jobstatus.dir/jobtracker/job dfs.client.block.write.retries3 dfs.datanode.du.reserved0 mapred.reduce.tasks.speculative.executiontruedfs.replication.max512 dfs.balance.bandwidthPerSec1048576 mapred.max.tracker.failures4 mapred.tasktracker.dns.interfacedefaultmapred.map.tasks7 mapred.job.tracker.persist.jobstatus.hours0fs.s3.sleepTimeSeconds10 fs.default.namelocalhost:9000 mapred.output.key.classorg.apache.hadoop.io.Textipc.client.timeout60000 tasktracker.http.threads40 hadoop.rpc.socket.factory.class.defaultorg.apache.hadoo mapred.reduce.tasks1 dfs.datanode.address0.0.0.0:50010 dfs.heartbeat.interval3 dfs.replication.considerLoadtrue dfs.default.chunk.view.size32768 io.file.buffer.size4096 dfs.client.buffer.dir${hadoop.tmp.dir}/dfs/tmp< mapred.tasktracker.map.tasks.maximum2mapred.task.profile.maps0-2 hadoop.native.libtrue fs.s3.block.size67108864 dfs.replication1 mapred.jar/tmp/hadoop-SHIYANJUN/mapred/local/jobTracker mapred.job.tracker.http.address0.0.0.0:50030

fs.inmemory.size.mb75
mapred.tasktracker.reduce.tasks.maximum2io.compression.codecsorg.apache.hadoop.io.compress.Defa io.seqfile.compress.blocksize1000000 dfs.namenode.decommission.interval300mapred.job.trackerlocalhost:9001 io.skip.checksum.errorsfalse mapred.combiner.classorg.apache.hadoop.examples.WordCou ipc.server.listen.queue.size128 mapred.reduce.max.attempts4 fs.s3.maxRetries4 fs.trash.interval0 mapred.local.dir.minspacestart0 dfs.datanode.du.pct0.98f fs.s3.implorg.apache.hadoop.fs.s3.S3FileSystem< io.seqfile.sorter.recordlimit1000000 io.sort.mb100 mapred.local.dir${hadoop.tmp.dir}/mapred/local< io.sort.factor10 mapred.task.profilefalse job.end.retry.interval30000 mapred.jobtracker.completeuserjobs.maximum100mapred.task.profile.reduces0-2 webinterface.private.actionsfalse hadoop.tmp.dir/tmp/hadoop-${user.name}mapred.map.output.compression.typeRECORDmapred.output.compression.codecorg.apache.hadoop.io.com mapred.temp.dir${hadoop.tmp.dir}/mapred/temp

mapred.map.output.compression.codecorg.apache.hadoop.io mapred.compress.map.outputfalse mapred.userlog.retain.hours24 mapred.system.dir${hadoop.tmp.dir}/mapred/systemdfs.safemode.threshold.pct0.999f job.end.retry.attempts0 ipc.client.idlethreshold4000 mapred.reduce.copy.backoff300 mapred.map.tasks.speculative.executiontruedfs.permissionstrue mapred.inmem.merge.threshold1000 dfs.namenode.handler.count10 hadoop.logfile.size10000000 dfs.namenode.logging.levelinfo dfs.secondary.http.address0.0.0.0:50090mapred.job.tracker.persist.jobstatus.activefalsemapred.job.tracker.handler.count10 dfs.name.dir${hadoop.tmp.dir}/dfs/nameipc.client.connect.max.retries10 mapred.min.split.size0 mapred.map.max.attempts4 ipc.client.tcpnodelayfalse jobclient.output.filterFAILED dfs.df.interval60000
从上面这个文件中,首先,可以了解到 Hadoop 的基本配种信息中的默认配置信 息,从而进一步了解并学习如何配置一个 Hadoop 执行任务。 另外一个文件,1221994453046_job_200809211811_0001 文件就是记录整个任务 的详细信息的,如下所示: Job JOBID="job_200809211811_0001" JOBNAME="wordcount" USER="SHIYANJUN" SUBMIT_TIME="1221994453031" JOBCONF="/tmp/hadoop-SHIYANJUN/mapred/system/job_200809211811_0001/job.xml" Job JOBID="job_200809211811_0001" LAUNCH_TIME="1221994453375" TOTAL_MAPS="7" TOTAL_REDUCES="1" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000000" TASK_ATTEMPT_ID="task_200809211811_0001_m_000000_0" START_TIME="1221994457390" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000000" TASK_ATTEMPT_ID="task_200809211811_0001_m_000000_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994473281" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000000" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994473281" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:1684,Map-Reduce Framework.Map input bytes:10109,Map-Reduce Framework.Map output bytes:16845,Map-Reduce Framework.Combine input records:1684,Map-Reduce Framework.Combine output records:3" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000001" TASK_ATTEMPT_ID="task_200809211811_0001_m_000001_0" START_TIME="1221994457750" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000001" TASK_ATTEMPT_ID="task_200809211811_0001_m_000001_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994472015" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000001" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994472015" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:331,Map-Reduce Framework.Map input bytes:1987,Map-Reduce Framework.Map output bytes:3312,Map-Reduce Framework.Combine input records:331,Map-Reduce Framework.Combine output records:6" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000002" TASK_ATTEMPT_ID="task_200809211811_0001_m_000002_0" START_TIME="1221994473843" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000002" TASK_ATTEMPT_ID="task_200809211811_0001_m_000002_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994485937" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000002" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994485937" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:330,Map-Reduce Framework.Map input bytes:1985,Map-Reduce Framework.Map output bytes:3306,Map-Reduce Framework.Combine input records:330,Map-Reduce Framework.Combine output records:4" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000003" TASK_ATTEMPT_ID="task_200809211811_0001_m_000003_0" START_TIME="1221994474500" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000003" TASK_ATTEMPT_ID="task_200809211811_0001_m_000003_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994485343" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000003" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994485343" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:326,Map-Reduce Framework.Map input bytes:1957,Map-Reduce Framework.Map output bytes:3262,Map-Reduce Framework.Combine input records:326,Map-Reduce Framework.Combine output records:2" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000004" TASK_ATTEMPT_ID="task_200809211811_0001_m_000004_0" START_TIME="1221994487078" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000004" TASK_ATTEMPT_ID="task_200809211811_0001_m_000004_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994496453" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000004" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994496453" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:326,Map-Reduce Framework.Map input bytes:1957,Map-Reduce Framework.Map output bytes:3262,Map-Reduce Framework.Combine input records:326,Map-Reduce Framework.Combine output records:2" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000005" TASK_ATTEMPT_ID="task_200809211811_0001_m_000005_0" START_TIME="1221994487781" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000005" TASK_ATTEMPT_ID="task_200809211811_0001_m_000005_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994497078" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000005" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994497078" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:326,Map-Reduce Framework.Map input bytes:1957,Map-Reduce Framework.Map output bytes:3262,Map-Reduce Framework.Combine input records:326,Map-Reduce Framework.Combine output records:2" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000006" TASK_ATTEMPT_ID="task_200809211811_0001_m_000006_0" START_TIME="1221994497375" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" MapAttempt TASK_TYPE="MAP" TASKID="tip_200809211811_0001_m_000006" TASK_ATTEMPT_ID="task_200809211811_0001_m_000006_0" TASK_STATUS="SUCCESS" FINISH_TIME="1221994501765" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_m_000006" TASK_TYPE="MAP" TASK_STATUS="SUCCESS" FINISH_TIME="1221994501765" COUNTERS="Map-Reduce Framework.Map input records:1,Map-Reduce Framework.Map output records:326,Map-Reduce Framework.Map input bytes:1957,Map-Reduce Framework.Map output bytes:3262,Map-Reduce Framework.Combine input records:326,Map-Reduce Framework.Combine output records:2" ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809211811_0001_r_000000" TASK_ATTEMPT_ID="task_200809211811_0001_r_000000_0" START_TIME="1221994464125" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" ReduceAttempt TASK_TYPE="REDUCE" TASKID="tip_200809211811_0001_r_000000" TASK_ATTEMPT_ID="task_200809211811_0001_r_000000_0" TASK_STATUS="SUCCESS" SHUFFLE_FINISHED="1221994509828" SORT_FINISHED="1221994509843" FINISH_TIME="1221994511421" HOSTNAME="tracker_cbbd2ce9428e48b:localhost/127.0.0.1:4881" Task TASKID="tip_200809211811_0001_r_000000" TASK_TYPE="REDUCE" TASK_STATUS="SUCCESS" FINISH_TIME="1221994511421" COUNTERS="Map-Reduce Framework.Reduce input groups:7,Map-Reduce Framework.Reduce input records:21,Map-Reduce Framework.Reduce output records:7" Job JOBID="job_200809211811_0001" FINISH_TIME="1221994511484" JOB_STATUS="SUCCESS" FINISHED_MAPS="7" FINISHED_REDUCES="1" FAILED_MAPS="0" FAILED_REDUCES="0" COUNTERS="Job Counters .Launched map tasks:7,Job Counters .Launched reduce tasks:1,Job Counters .Data-local map tasks:7,Map-Reduce Framework.Map input records:7,Map-Reduce Framework.Map output records:3649,Map-Reduce Framework.Map input bytes:21909,Map-Reduce Framework.Map output bytes:36511,Map-Reduce Framework.Combine input records:3649,Map-Reduce Framework.Combine output records:21,Map-Reduce Framework.Reduce input groups:7,Map-Reduce Framework.Reduce input records:21,Map-Reduce Framework.Reduce output records:7" 上面主要描述了,一个Job 是如何进行分割的;一个Job 是有哪些 Task 完成的; 这些 Tasks 执行任务的信息,比如执行时间、任务完成状况等等。
还剩62页未读

继续阅读

下载pdf到电脑,查找使用更方便

pdf的实际排版效果,会与网站的显示效果略有不同!!

需要 20 金币 [ 分享pdf获得金币 ] 2 人已下载

下载pdf