Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Flume + HDFS IO error + ConnectException

avatar
Contributor

Hi,
I'm working with Cloudera Manager CDH 5.4.2, also installed Flume, I can not save the information that I get from Twitter,

When I run the flume agent, it starts okay but ends up in error when it attempts writing the new event data into hdfs.

 I got the follow error:

 

 INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://192.168.109.6:8020/user/flume/tweets/2015/06/03/06//FlumeData.1433311217583.tmp

 

WARN org.apache.flume.sink.hdfs.HDFSEventSink: HDFS IO error
java.net.ConnectException: Call From cluster-05.xxxx.com/192.168.109.6 to cluster-05.xxxx.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused

 

The configuration that I did was :

 

flume-conf.property:

 

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://192.168.109.6:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

I using the follown pluggins:

  • flume-sources-1.0-SNAPSHOT.jar
  • twitter4j-core-2.2.6.jar
  • twitter4j-media-support-2.2.6.jar
  • twitter4j-stream-2.2.6.jar

(I replace the version of the twitter4j-*-3.0.3.jar for the twitter4j-*-2.2.6.jar)

 

also the directory using hdfs user

hadoop fs -ls /user/flume : 

drwxrwxrwx - flume flume  /user/flume/tweets

 

core-site.xml ( at /hadoop/conf ) i Add:

 

< property >
< name >fs.default.name< / name >
< value >hdfs://localhost:8020< / value >
< /property >

 

I also run hadoop dfsadmin -safemode leave on the host where I left the Flume Agent as HDFS user 

 

 

I really appreciate your help, on this issue.

 Regards,

AR

1 ACCEPTED SOLUTION

avatar
Contributor

I find the solution myself, and I left you here, .. in case anyone has the same error..

my error was ( because i was in a cluster ) 

I should point into the hadoop host..  so .. I change the address.. here 

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://192.168.109.6:8020/user/flume/tweets/%Y/%m/%d/%H/

 

and everything was running smoothly

thanks

View solution in original post

3 REPLIES 3

avatar
Contributor

I find the solution myself, and I left you here, .. in case anyone has the same error..

my error was ( because i was in a cluster ) 

I should point into the hadoop host..  so .. I change the address.. here 

TwitterAgent.sinks.HDFS.hdfs.path = hdfs://192.168.109.6:8020/user/flume/tweets/%Y/%m/%d/%H/

 

and everything was running smoothly

thanks

avatar

Hai, as you explaned above you changed some address for solving HDFS IO error and i not see any chage in address which you given in solution can you explain clear what you done for solving above error

 

avatar
New Contributor

In my case the problem was the port number which was incorrect. I ensured that I used the Namenode port.