Created 05-23-2016 12:59 PM
Hello team,
I'm a programming enthusiast.I have downloaded twitter stream before but now i'm not able to do so.I'm using apache-flume-1.4 on my hadoop 2.3.0 and cdh 5.0.0.
No matter how many times i've tried ,it is throwing the same error,
hadoop@ubuntu:~/hadoop/apache-flume-1.4.0-cdh5.0.0-bin$ ./bin/flume-ng agent -n TwitterAgent -c conf -f /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/local.conf Dflume.root.logger=DEBUG,console -n TwitterAgent Info: Sourcing environment configuration script /home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf/flume-env.sh Info: Including Hadoop libraries found via (/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/bin/hadoop) for HDFS access Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath Info: Including HBASE libraries found via (/home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/bin/hbase) for HBASE access Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-api-1.7.5.jar from classpath Info: Excluding /home/hadoop/hadoop/hbase-0.96.1.1-cdh5.0.0/lib/slf4j-log4j12-1.7.5.jar from classpath Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar from classpath Info: Excluding /home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar from classpath + exec /usr/lib/jvm/java-7-openjdk-amd64/bin/java -Xms100m -Xmx200m -Dcom.sun.management.jmxremote -cp '/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/conf:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/*:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar:/home/hadoop/hadoop/hadoop-2.3.0-cdh5.0.0/etc/hadoop:/home/ha.....
And the .conf file is as follows:
TwitterAgent.sources= Twitter TwitterAgent.channels= MemChannel TwitterAgent.sinks=HDFS TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.sources.Twitter.channels=MemChannel TwitterAgent.sources.Twitter.consumerKey=Pw63cpjptT59uT6w TwitterAgent.sources.Twitter.consumerSecret= n8awrhKf7S576DcILPk5Ddfp1LQUU TwitterAgent.sources.Twitter.accessToken=163543326-s0Rqm5y4UC2WV7HPOuiOE9fPZZ56eWO95P TwitterAgent.sources.Twitter.accessTokenSecret= CLwyJJ1jY4atf7iaiaR96Z1PmVvKF0iOXsP8E TwitterAgent.sources.Twitter.keywords= hadoop,election,sports, cricket,Big data,Trump TwitterAgent.sinks.HDFS.channel=MemChannel TwitterAgent.sinks.HDFS.type=hdfs TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:9000/tweety TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream TwitterAgent.sinks.HDFS.hdfs.writeformat=Text TwitterAgent.sinks.HDFS.hdfs.batchSize=1000 TwitterAgent.sinks.HDFS.hdfs.rollSize=0 TwitterAgent.sinks.HDFS.hdfs.rollCount=10000 TwitterAgent.sinks.HDFS.hdfs.rollInterval=600 TwitterAgent.channels.MemChannel.type=memory TwitterAgent.channels.MemChannel.capacity=10000 TwitterAgent.channels.MemChannel.transactionCapacity=100
And flume-env.sh file as follows:
# Enviroment variables can be set here. JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64 # Give Flume more memory and pre-allocate, enable remote monitoring via JMX JAVA_OPTS="-Xms100m -Xmx200m -Dcom.sun.management.jmxremote" # Note that the Flume conf directory is always included in the classpath. FLUME_CLASSPATH="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar"
And the .bashrc file:
export FLUME_HOME="/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin" export PATH="$FLUME_HOME/bin:$PATH" export FLUME_CLASSPATH="$CLASSPATH:/home/hadoop/hadoop/apache-flume-1.4.0-cdh5.0.0-bin/lib/flume-sources-1.0-SNAPSHOT.jar "
Please i want to know on which part i'm doing it wrong.
Any valuable suggestion is much appreciated.
Thanks in advance.
Created 07-11-2017 12:27 AM
Able to flume twitter feeds in the sandbox after spending lot of time.
Following steps helped in resolving this:
1. Added below entry in /etc/hosts file
199.59.148.138 stream.twitter.com
2. updating datetime in sandbox
sudo ntpdate ntp.ubuntu.com
3. Adjusting hdfs path to point to 8020 port
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/
Created 05-23-2016 01:09 PM
Looks like you are using CDH distro therefore I would recommend you to run sam test on HDP cluster with Flume and let us know if you still face any issue.
Created 05-23-2016 01:09 PM
So, apache-flume-1.4 can still bring the data? or shall i upgrade my flume to 1.6 or higher?
Created 05-23-2016 01:22 PM
Hi Karthik, I was saying if you can install the Hortonworks Hadoop cluster or probably a Sandbox machine along with flume would help us to understand your issue while you run the same flume example on that.
Here is the download link of sandbox.
Created 07-11-2017 12:27 AM
Able to flume twitter feeds in the sandbox after spending lot of time.
Following steps helped in resolving this:
1. Added below entry in /etc/hosts file
199.59.148.138 stream.twitter.com
2. updating datetime in sandbox
sudo ntpdate ntp.ubuntu.com
3. Adjusting hdfs path to point to 8020 port
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://sandbox.hortonworks.com:8020/user/maria_dev/tweets/%Y/%m/%d/%H/