Support Questions
Find answers, ask questions, and share your expertise

flume data ingestion to hdfs

Highlighted

flume data ingestion to hdfs

New Contributor

Hi

I keep gettting this error

14 Dec 2016 09:44:03,218 WARN [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSEventSink.process:461) - HDFS IO error java.io.IOException: Callable timed out after 10000 ms on file: hdfs://tmp/flumetest/FlumeData.1481737433180.tmp at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:720) at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:266) at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:541) at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:424) at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68) at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException at java.util.concurrent.FutureTask.get(FutureTask.java:205) at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:713) ... 6 more 14 Dec 2016 09:44:08,219 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.HDFSSequenceFile.configure:63) - writeFormat = Text, UseRawLocalFileSystem = false 14 Dec 2016 09:44:08,251 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:265) - Creating hdfs://tmp/flumetest/FlumeData.1481737448220.tmp

Below is the flume agent config through ambari

agent.sources = pstream agent.channels = memoryChannel

agent.channels.memoryChannel.type = memory

agent.sources.pstream.channels = memoryChannel

agent.sources.pstream.type = exec

agent.sources.pstream.command = tail -f /etc/passwd

agent.sinks = hdfsSink agent.sinks.hdfsSink.type = hdfs agent.sinks.hdfsSink.channel = memoryChannel agent.sinks.hdfsSink.hdfs.path = hdfs://tmp/flumetest

agent.sinks.hdfsSink.hdfs.fileType = SequenceFile

agent.sinks.hdfsSink.hdfs.writeFormat = Text

the hdfs path is writable

please help

thanks

4 REPLIES 4
Highlighted

Re: flume data ingestion to hdfs

Super Guru

what version of Hadoop? Flume? JDK?

Older version issues? What is the timeout set to?

https://issues.apache.org/jira/browse/FLUME-2429

What user is running Flume, do they have write permissions? it seems that a firewall or permissions are blocking it.

Try with debugging

-Dflume.root.logger=DEBUG,console

Look at this article: http://www.thecloudavenue.com/2013/03/analyse-tweets-using-flume-hadoop-and.html

Also consider trying the same thing in Apache NiFi.

Highlighted

Re: flume data ingestion to hdfs

New Contributor

what version of Hadoop? Flume? JDK? // hadoop 2.7.1.2.3 ; flume 1.5.2.2.3 jdk 1.8

Older version issues? What is the timeout set to? // it is default timeout, i did set timeout to 1000000 but same error

All the flume agent configuration was set up through Ambari

Firewall is turned off and the hdfs dir is at 777 permission

any other thoughts ?

Highlighted

Re: flume data ingestion to hdfs

Super Guru

what are local permissions on the flume and current directory? can you run tail -f /etc/passwd with that user?

It should be, but is HDFS server configured in the configuration for Flume so it points to the correct server?

What server does flume agent run on? It has Hadoop / HDFS client? you can do hdfs dfs -put /etc/password /tmp/flumetest/

is anything in the hdfs://tmp/flumetest/FlumeData.1481737448220.tmp file?

Highlighted

Re: flume data ingestion to hdfs

New Contributor

what are local permissions on the flume and current directory? can you run tail -f /etc/passwd with that user? // it is under root user

It should be, but is HDFS server configured in the configuration for Flume so it points to the correct server? //yes

What server does flume agent run on? It has Hadoop / HDFS client? you can do hdfs dfs -put /etc/password /tmp/flumetest/ //the flume agent runs on all the hdfs cluster (3 servers) it is configured through Ambari, i can put passwd file using hdfs dfs -put /etc/password /tmp/flumetest/

hdfs dfs -ls /tmp/flumetest/ Found 1 items -rw-r--r-- 3 root hdfs 2379 2016-12-14 11:22 /tmp/flumetest/passwd

is anything in the hdfs://tmp/flumetest/FlumeData.1481737448220.tmp file? // no