Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Not able to flume twitter data in to hdfs

Solved Go to solution

Not able to flume twitter data in to hdfs

Contributor

Hello team,

I'm a programming enthusiast.I have downloaded twitter stream before but now i'm not able to do so.I'm using apache-flume-1.4 on my hadoop 2.3.0 and cdh 5.0.0.

No matter how many times i've tried ,it is throwing the same error,

hadoop@ubuntu:~/hadoop/apache-flume-1.4.0-cdh5.0.0-bin$ ./bin/flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/local.conf Dflume.root.logger=DEBUG,console -n TwitterAgent


Info: Sourcing environment configuration script /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/bin/hadoop) for HDFS access
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Including HBASE libraries found via (/home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/bin/hbase) for HBASE access
Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath
+ exec /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xms100m -Xmx200m -Dcom.sun.management.jmxremote -cp '/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/*:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar:/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/etc/hadoop:/home/ha.....

And the .conf file is as follows:

TwitterAgent.sources= Twitter 
TwitterAgent.channels= MemChannel 
TwitterAgent.sinks=HDFS 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource 
TwitterAgent.sources.Twitter.channels=MemChannel 
 
TwitterAgent.sources.Twitter.consumerKey=Pw63cpjptT59uT6w 
TwitterAgent.sources.Twitter.consumerSecret=    n8awrhKf7S576DcILPk5Ddfp1LQUU 
TwitterAgent.sources.Twitter.accessToken=163543326-s0Rqm5y4UC2WV7HPOuiOE9fPZZ56eWO95P 
TwitterAgent.sources.Twitter.accessTokenSecret=    CLwyJJ1jY4atf7iaiaR96Z1PmVvKF0iOXsP8E 
 
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data,Trump 
 
TwitterAgent.sinks.HDFS.channel=MemChannel 
TwitterAgent.sinks.HDFS.type=hdfs 
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/tweety 
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000 
TwitterAgent.sinks.HDFS.hdfs.rollSize=0 
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000 
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600 
TwitterAgent.channels.MemChannel.type=memory 
TwitterAgent.channels.MemChannel.capacity=10000 
TwitterAgent.channels.MemChannel.transactionCapacity=100

And flume-env.sh file as follows:

# Enviroment variables can be set here.

JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

# Note that the Flume conf directory is always included in the classpath.
FLUME_CLASSPATH="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar"

And the .bashrc file:

export FLUME_HOME="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin"
export PATH="$FLUME_HOME/bin:$PATH"
export FLUME_CLASSPATH="$CLASSPATH:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar "

Please i want to know on which part i'm doing it wrong.

Any valuable suggestion is much appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Not able to flume twitter data in to hdfs

New Contributor

Able to flume twitter feeds in the sandbox after spending lot of time.

Following steps helped in resolving this:

1. Added below entry in /etc/hosts file

199.59.148.138 stream.twitter.com

2. updating datetime in sandbox

sudo ntpdate ntp.ubuntu.com

3. Adjusting hdfs path to point to 8020 port

TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/

4 REPLIES 4

Re: Not able to flume twitter data in to hdfs

@karthik sai

Looks like you are using CDH distro therefore I would recommend you to run sam test on HDP cluster with Flume and let us know if you still face any issue.

Re: Not able to flume twitter data in to hdfs

Contributor

So, apache-flume-1.4 can still bring the data? or shall i upgrade my flume to 1.6 or higher?

Re: Not able to flume twitter data in to hdfs

@karthik sai

Hi Karthik, I was saying if you can install the Hortonworks Hadoop cluster or probably a Sandbox machine along with flume would help us to understand your issue while you run the same flume example on that.

Here is the download link of sandbox.

http://hortonworks.com/downloads/#sandbox

Re: Not able to flume twitter data in to hdfs

New Contributor

Able to flume twitter feeds in the sandbox after spending lot of time.

Following steps helped in resolving this:

1. Added below entry in /etc/hosts file

199.59.148.138 stream.twitter.com

2. updating datetime in sandbox

sudo ntpdate ntp.ubuntu.com

3. Adjusting hdfs path to point to 8020 port

TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/

Don't have an account?
Coming from Hortonworks? Activate your account here