Created 12-24-2016 08:13 AM
Hi I am trying to fetch the data from twitter to my hdfs and while running my flume-ng agent, i am getting below logs.
could you please assist me
16/12/24 00:06:22 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider starting 16/12/24 00:06:22 INFO node.PollingPropertiesFileConfigurationProvider: Reloading configuration file:/etc/flume-ng/conf/flume.conf 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.path 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.path = hdfs://master:8020/user/cloudera/tweets/ 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: channel.Memchannel.capacity 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.channel.Memchannel.capacity = 10000 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.channel 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.channel = Memchannel 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.writeFormat 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.writeFormat = Text 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: channel.Memchannel.type 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.channel.Memchannel.type = memory 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.rollCount 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.rollCount = 10000 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: channel.Memchannel.transactionalCapacity 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.channel.Memchannel.transactionalCapacity = 100 16/12/24 00:06:22 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: TwitterAgent 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.batchsize 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.batchsize = 1000 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.rollSize 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.rollSize = 0 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.hdfs.filetype 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.hdfs.filetype = DataStream 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Invalid property specified: sink.HDFS.type 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Configuration property ignored: TwitterAgent.sink.HDFS.type = hdfs 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Agent configuration for 'TwitterAgent' does not contain any valid channels. Marking it as invalid. 16/12/24 00:06:22 WARN conf.FlumeConfiguration: Agent configuration invalid for agent 'TwitterAgent'. It will be removed. 16/12/24 00:06:22 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [] 16/12/24 00:06:22 WARN node.AbstractConfigurationProvider: No configuration found for this host:TwitterAgent 16/12/24 00:06:22 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{} channels:{} }
Created 12-24-2016 05:22 PM
@Praveen PentaReddyMost likely in your twitter.conf you have "TwitterAgent.sink.HDFS.channels" instead of "sinks". There will be multiple properties and they should all be TwitterAgent.sinks not "sink".
If that doesn't work you may want to post your twitter.conf and flume.env.sh so we can help narrow it down.
I would also strongly consider looking at using NiFi for handling the movement of data from Twitter to HDFS, here is a detailed tutorial on doing that: https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.h...
Created 12-24-2016 05:22 PM
@Praveen PentaReddyMost likely in your twitter.conf you have "TwitterAgent.sink.HDFS.channels" instead of "sinks". There will be multiple properties and they should all be TwitterAgent.sinks not "sink".
If that doesn't work you may want to post your twitter.conf and flume.env.sh so we can help narrow it down.
I would also strongly consider looking at using NiFi for handling the movement of data from Twitter to HDFS, here is a detailed tutorial on doing that: https://community.hortonworks.com/articles/1282/sample-hdfnifi-flow-to-push-tweets-into-solrbanana.h...
Created 12-24-2016 05:53 PM
As said by Devin, in your flume.conf file you must have below incorrectly specified properties.
TwitterAgent.sink.HDFS.hdfs.path
TwitterAgent.channel.Memchannel.capacity
TwitterAgent.sink.HDFS.channel
TwitterAgent.sink.HDFS.hdfs.writeFormat
TwitterAgent.channel.Memchannel.type
TwitterAgent.sink.HDFS.hdfs.rollCount
TwitterAgent.channel.Memchannel.transactionalCapacity
TwitterAgent.sink.HDFS.hdfs.batchsize
TwitterAgent.sink.HDFS.hdfs.rollSize
TwitterAgent.sink.HDFS.hdfs.filetype
TwitterAgent.sink.HDFS.type
In each of above configuration you need to use TwitterAgent.sinks.HDFS.... and TwitterAgent.channels.Memchannel... etc. Since property syntax in wrong , flume is ignoring them and no channel was finally configured for TwitterAgent agent. Hence this agent was marked as incorrect and hence flume is unable to fetch data.
Created 12-24-2016 08:18 PM
Thanks after making correction in the conf file below is what i got it. However, i could not see the data into Hdfs
16/12/24 11:48:34 INFO conf.FlumeConfiguration: Processing:HDFS 16/12/24 11:48:34 INFO conf.FlumeConfiguration: Processing:HDFS 16/12/24 11:48:35 WARN conf.FlumeConfiguration: Could not configure source Twitter due to: No Channels configured for Twitter org.apache.flume.conf.ConfigurationException: No Channels configured for Twitter at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSources(FlumeConfiguration.java:574) at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:346) at org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.access$000(FlumeConfiguration.java:213) at org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:127) at org.apache.flume.conf.FlumeConfiguration.<init>(FlumeConfiguration.java:109) at org.apache.flume.node.PropertiesFileConfigurationProvider.getFlumeConfiguration(PropertiesFileConfigurationProvider.java:189) at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:89) at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 16/12/24 11:48:35 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [TwitterAgent] 16/12/24 11:48:35 INFO node.AbstractConfigurationProvider: Creating channels 16/12/24 11:48:35 INFO channel.DefaultChannelFactory: Creating instance of channel Memchannel type memory 16/12/24 11:48:35 INFO node.AbstractConfigurationProvider: Created channel Memchannel 16/12/24 11:48:35 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs 16/12/24 11:48:35 INFO node.AbstractConfigurationProvider: Channel Memchannel connected to [HDFS] 16/12/24 11:48:35 INFO node.Application: Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1f8dbaa6 counterGroup:{ name:null counters:{} } }} channels:{Memchannel=org.apache.flume.channel.MemoryChannel{name: Memchannel}} } 16/12/24 11:48:35 INFO node.Application: Starting Channel Memchannel 16/12/24 11:48:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: Memchannel: Successfully registered new MBean. 16/12/24 11:48:36 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: Memchannel started 16/12/24 11:48:36 INFO node.Application: Starting Sink HDFS 16/12/24 11:48:36 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: HDFS: Successfully registered new MBean. 16/12/24 11:48:36 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
Created 12-25-2016 07:19 AM
Any suggestions?
Created 12-25-2016 01:59 PM
@Praveen PentaReddy Can you please mark an answer as accepted, and then create a new question with your new problem? This way it is indexed and other people with a similar new problem will be able to benefit? Please post your Flume conf file as well, i believe you might have a bind property off in it which is why the sink is not working.
Thanks.
Created 12-27-2016 07:30 AM
agent.sources = Twitter agent.channels = MemChannel agent.sinks = HDFS
agent.sources.Twitter.type = com.orienit.kalyan.flume.source.KalyanTwitterSource agent.sources.Twitter.channels = MemChannel agent.sources.Twitter.consumerKey = xxxx agent.sources.Twitter.consumerSecret = xxx agent.sources.Twitter.accessToken = xxxx agent.sources.Twitter.accessTokenSecret = xxxx agent.sources.Twitter.keywords = hadoop,spark,kafka,flume,spark steaming,NIFI,Bigdata,hortonworks,oozie,sqoop,hive,mapreduce,pig,scala
agent.sinks.HDFS.type = hdfs agent.sinks.HDFS.channel = MemChannel agent.sinks.HDFS.hdfs.path =/flume/tweets/%y/%m/%d/%H/%M agent.sinks.HDFS.hdfs.fileType = DataStream agent.sinks.HDFS.hdfs.writeFormat = Text agent.sinks.HDFS.hdfs.batchSize = 100 agent.sinks.HDFS.hdfs.rollSize = 0 agent.sinks.HDFS.hdfs.rollCount = 100 agent.sinks.HDFS.hdfs.useLocalTimeStamp = true
agent.channels.MemChannel.type = memory agent.channels.MemChannel.capacity = 1000 agent.channels.MemChannel.transactionCapacity = 100
You can try this configuration Praveen
Created 12-27-2016 07:34 AM
what is the difference between both of the configuration which i posted and the one which you gave me?