flume使用file channel报错

flume使用spoolDir监控目录获取数据,sink使用hdfs,channle使用file。因为channel使用memory的话如果发生错误日志数据会丢失,而file不会丢失,因此才决定用file!   但是当放入小文件时(测试<=40M),可以正常把数据存储到HDFS。但是当放入80M(我的测试数据)则报错,错误语句为: ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.SinkRunner$PollingRunner.run:158)  - Unable to deliver event. Exception follows. java.lang.IllegalStateException: Log is closed     at com.google.common.base.Preconditions.checkState(Preconditions.java:145)     at org.apache.flume.channel.file.Log.getFlumeEventQueue(Log.java:591)     at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.<init>(FileChannel.java:442)     at org.apache.flume.channel.file.FileChannel.createTransaction(FileChannel.java:359)     at org.apache.flume.channel.BasicChannelSemantics.getTransaction(BasicChannelSemantics.java:122)     at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:356)     at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)     at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)     at java.lang.Thread.run(Thread.java:745) 我的配置文件内容是:
# Name the components on this agent  
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1  
  
a1.sources.r1.type = spooldir  
a1.sources.r1.spoolDir = D:/work  
a1.sources.r1.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder 
a1.sources.r1.deserializer.maxBlobLength=200000000										  
a1.sources.r1.batchsize = 1    
a1.sources.r1.fileHeader = true
#a1.sources.r1.fileHeaderKey = fileName
a1.sources.r1.deletePolicy=immediate

a1.sinks.k1.type = hdfs  
a1.sinks.k1.hdfs.path = hdfs://192.168.100.224:9000/home/hadoop/test 
a1.sinks.k1.hdfs.fileType = DataStream  
a1.sinks.k1.hdfs.writeFormat = Text   
a1.sinks.k1.hdfs.batchSize = 1  
a1.sinks.k1.hdfs.rollInterval = 60  
a1.sinks.k1.hdfs.rollcount = 1  
a1.sinks.k1.hdfs.rollsize = 100000000  
#a1.sinks.k1.hdfs.filePrefix = %{fileName} 
a1.sinks.k1.hdfs.filePrefix =%Y%m%d
a1.sinks.k1.hdfs.fileSuffix = .log
a1.sinks.k1.hdfs.useLocalTimeStamp = true  
a1.sinks.k1.hdfs.idleTimeout = 60000    
a1.sinks.k1.serializer.appendNewline = false
a1.sinks.k1.hdfs.connect-timeout=80000
a1.sinks.k1.hdfs.callTimeout=120000

# Use a channel which buffers events in memory  
#a1.channels.c1.type = memory  
#a1.channels.c1.capacity = 1000  
#a1.channels.c1.transactionCapacity = 100  

a1.channels.c1.type=file
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 
a1.channels.c1.checkpointDir=D:/work/checkpoint
a1.channels.c1.dataDirs=D:/work/datadirs
a1.channels.c1.backupCheckpointDir=D:/work/backupcheckpoint
a1.channels.channel1.keep-alive = 3
#a1.channels.channel1.write-timeout = 30
#a1.channels.channel1.checkpoint-timeout = 600
    
# Bind the source and sink to the channel  
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1  
 

wangxiaolei

赞同来自:

建议使用flume1.7及以上版本,改用Taildir Source,支持断点续传,支持正则表达式多文件匹配监控。channel可以选用file,性能相对与memeory要差一些,建议尝试使用kafka channel实现高容错。

fish - Hadooper

赞同来自:

把channel改成memory看是否问题会消失。

要回复问题请先登录注册