Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Not able to flume twitter data in to hdfs

avatar
Rising Star

Hello team,

I'm a programming enthusiast.I have downloaded twitter stream before but now i'm not able to do so.I'm using apache-flume-1.4 on my hadoop 2.3.0 and cdh 5.0.0.

No matter how many times i've tried ,it is throwing the same error,

hadoop@ubuntu:~/hadoop/apache-flume-1.4.0-cdh5.0.0-bin$ ./bin/flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/local.conf Dflume.root.logger=DEBUG,console -n TwitterAgent


Info: Sourcing environment configuration script /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/bin/hadoop) for HDFS access
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Including HBASE libraries found via (/home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/bin/hbase) for HBASE access
Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath
Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath
+ exec /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xms100m -Xmx200m -Dcom.sun.management.jmxremote -cp '/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/*:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar:/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/etc/hadoop:/home/ha.....

And the .conf file is as follows:

TwitterAgent.sources= Twitter 
TwitterAgent.channels= MemChannel 
TwitterAgent.sinks=HDFS 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource 
TwitterAgent.sources.Twitter.channels=MemChannel 
 
TwitterAgent.sources.Twitter.consumerKey=Pw63cpjptT59uT6w 
TwitterAgent.sources.Twitter.consumerSecret=    n8awrhKf7S576DcILPk5Ddfp1LQUU 
TwitterAgent.sources.Twitter.accessToken=163543326-s0Rqm5y4UC2WV7HPOuiOE9fPZZ56eWO95P 
TwitterAgent.sources.Twitter.accessTokenSecret=    CLwyJJ1jY4atf7iaiaR96Z1PmVvKF0iOXsP8E 
 
TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data,Trump 
 
TwitterAgent.sinks.HDFS.channel=MemChannel 
TwitterAgent.sinks.HDFS.type=hdfs 
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/tweety 
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize=1000 
TwitterAgent.sinks.HDFS.hdfs.rollSize=0 
TwitterAgent.sinks.HDFS.hdfs.rollCount=10000 
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600 
TwitterAgent.channels.MemChannel.type=memory 
TwitterAgent.channels.MemChannel.capacity=10000 
TwitterAgent.channels.MemChannel.transactionCapacity=100

And flume-env.sh file as follows:

# Enviroment variables can be set here.

JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote"

# Note that the Flume conf directory is always included in the classpath.
FLUME_CLASSPATH="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar"

And the .bashrc file:

export FLUME_HOME="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin"
export PATH="$FLUME_HOME/bin:$PATH"
export FLUME_CLASSPATH="$CLASSPATH:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar "

Please i want to know on which part i'm doing it wrong.

Any valuable suggestion is much appreciated.

Thanks in advance.

1 ACCEPTED SOLUTION

avatar
New Contributor

Able to flume twitter feeds in the sandbox after spending lot of time.

Following steps helped in resolving this:

1. Added below entry in /etc/hosts file

199.59.148.138 stream.twitter.com

2. updating datetime in sandbox

sudo ntpdate ntp.ubuntu.com

3. Adjusting hdfs path to point to 8020 port

TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/

View solution in original post

4 REPLIES 4

avatar
Super Guru
@karthik sai

Looks like you are using CDH distro therefore I would recommend you to run sam test on HDP cluster with Flume and let us know if you still face any issue.

avatar
Rising Star

So, apache-flume-1.4 can still bring the data? or shall i upgrade my flume to 1.6 or higher?

avatar
Super Guru

@karthik sai

Hi Karthik, I was saying if you can install the Hortonworks Hadoop cluster or probably a Sandbox machine along with flume would help us to understand your issue while you run the same flume example on that.

Here is the download link of sandbox.

http://hortonworks.com/downloads/#sandbox

avatar
New Contributor

Able to flume twitter feeds in the sandbox after spending lot of time.

Following steps helped in resolving this:

1. Added below entry in /etc/hosts file

199.59.148.138 stream.twitter.com

2. updating datetime in sandbox

sudo ntpdate ntp.ubuntu.com

3. Adjusting hdfs path to point to 8020 port

TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/