第二周作业,读取文件出问题

执行命令
 
我已经把日志文件上传到hdfs的目录里了,现在也能看到,但是当我执行helloworld的时候仍然报错,说找不到文件
[zhao@z1 week2]$ hadoop fs -ls /homework/week2/input/1/Sep-2013/Sep-2013-0
Found 1 items
-rw-r--r-- 3 zhao supergroup 33338757 2015-09-19 16:23 /homework/week2/input/1/Sep-2013/Sep-2013-0/dongxicheng.org.log.1
下面是错误信息,另外,如果不嵌套目录,直接在目录下面放一个文件的话不会报错。
但是作业提示上面是可以嵌套的,求解答
 
[zhao@z1 week2]$ hadoop jar wordcount.jar WordCount.WordCountDriver /homework/week2/input/1/Sep-2013 /homework/weeke/output/6
15/09/20 01:07:38 INFO client.RMProxy: Connecting to ResourceManager at z1/10.170.122.59:8032
15/09/20 01:07:38 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/20 01:07:39 INFO input.FileInputFormat: Total input paths to process : 10
15/09/20 01:07:39 INFO mapreduce.JobSubmitter: number of splits:10
15/09/20 01:07:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442636737404_0014
15/09/20 01:07:40 INFO impl.YarnClientImpl: Submitted application application_1442636737404_0014
15/09/20 01:07:40 INFO mapreduce.Job: The url to track the job: http://z1:8088/proxy/applicati ... 0014/
15/09/20 01:07:40 INFO mapreduce.Job: Running job: job_1442636737404_0014
15/09/20 01:07:49 INFO mapreduce.Job: Job job_1442636737404_0014 running in uber mode : false
15/09/20 01:07:49 INFO mapreduce.Job: map 0% reduce 0%
15/09/20 01:08:05 INFO mapreduce.Job: Task Id : attempt_1442636737404_0014_m_000001_0, Status : FAILED
Error: java.io.FileNotFoundException: Path is not a file: /homework/week2/input/1/Sep-2013/Sep-2013-1
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:70)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1222)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1210)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1200)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:271)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:238)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:231)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1498)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:302)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:298)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:85)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /homework/week2/input/1/Sep-2013/Sep-2013-1
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:70)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1832)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1812)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1784)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:542)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:362)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:254)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1220)
... 20 more

 

mengmeng - 大数据工程师

赞同来自: fish colincheng

conf.set("mapreduce.input.fileinputformat.input.dir.recursive",true)

colincheng - 大数据工程师@易宝支付

赞同来自: 编程小梦 生命一如夏花

一般情况下,传递给MapReduce和Hive的input文件夹中不能包含子目录,否则就会报错。但后来增加了递归遍历Input目录的功能,这个貌似是从0.23开始的,具体不清楚,反正在0.20中是不支持的。   默认情况下,mapreduce.input.fileinputformat.input.dir.recursive为flase.   设置mapreduce.input.fileinputformat.input.dir.recursive=true,这个参数是客户端参数,可以在MapReduce中设置,也可以在mapred-site.xml中设置,无所谓。   同样的问题在HIVE中,也有可能会遇到,总之同理就行了。

奔跑的大象

赞同来自:

貌似这样也可以解决:
/homework/week2/input/1/Sep-2013/Sep-2013-0/*

要回复问题请先登录注册