Support Questions
Find answers, ask questions, and share your expertise

Sentiment Analysis - empty tweets_text table

Hello together,

aim doing the tutorial, and did everything like explained. now iam at that point that i created all the tables. dictionary time_zone_map and tweets_text. but my tweets_text is empty. i connected the data flow and i have my data in the banana dashboard. which step i missed ? how can i get my twitter feeds now into my table ?

Greets,

Martin

14 REPLIES 14

Expert Contributor

Can you confirm you have tweet data on HDFS, in the /tmp/tweets_staging/ directory (I believe that is used in the article).

hdfs dfs -ls /tmp/tweets_staging

to check (as hdfs user)

You also need to run the Add JAR command when creating the table (note: the file path might be different):

ADD JAR /usr/hdp/2.5.3.0-37/hive2/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar

Hey,

no after the normal steps there was no data. then i saved the data in solr banana dashboard at the top right as a json file and uploaded this file to the dirctory and created the table again. but then it was full of rubbish. but i did this without the Add Jar Command ( its not in the tutorial at this point).

So now i did again with the Add Jar cmd. i didnt get a error when creating the table but when i select on the table now this is the output:

{"trace":"java.lang.Exception: Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.\n\njava.lang.Exception: Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.\n\tat org.apache.ambari.view.hive2.actor.message.job.FetchFailed.\u003cinit\u003e(FetchFailed.java:28)\n\tat org.apache.ambari.view.hive2.actor.OperationController.fetchResultActorRef(OperationController.java:200)\n\tat org.apache.ambari.view.hive2.actor.OperationController.handleMessage(OperationController.java:135)\n\tat org.apache.ambari.view.hive2.actor.HiveActor.onReceive(HiveActor.java:38)\n\tat akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)\n\tat akka.actor.Actor$class.aroundReceive(Actor.scala:467)\n\tat akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)\n\tat akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)\n\tat akka.actor.ActorCell.invoke(ActorCell.scala:487)\n\tat akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)\n\tat akka.dispatch.Mailbox.run(Mailbox.scala:220)\n\tat akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)\n\tat scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\n","message":"Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.","status":500}

if i try to create the table from the json file without the add jar cmd this is the result of the select on the table:

{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat org.apache.ambari.view.hive2.actor.HiveActor.onReceive(HiveActor.java:38)\n\tat akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)\n\tat akka.actor.Actor$class.aroundReceive(Actor.scala:467)\n\tat akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)\n\tat akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)\n\tat akka.actor.ActorCell.invoke(ActorCell.scala:487)\n\tat akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)\n\tat akka.dispatch.Mailbox.run(Mailbox.scala:220)\n\tat akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)\n\tat scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\nCaused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:411)\n\tat org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:233)\n\tat org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:780)\n\tat org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:478)\n\tat org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:692)\n\tat org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1557)\n\tat org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1542)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)\n\tat org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520)\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427)\n\tat org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)\n\tat org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1762)\n\tat org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:406)\n\t... 13 more\nCaused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:412)\n\tat org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:174)\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:501)\n\t... 17 more\n","message":"Failed to fetch next batch for the Resultset","status":500}

ok now i see i didnt save the date through banana. it only saves the layout of the dashboard. so it cant work-_-

but how can i save the data then. which one in the tutorial is the step to save the data ? cause when i was at this step:

sudo -u hdfs hadoop fs -chown -R maria_dev /tmp/tweets_staging
	sudo -u hdfs hadoop fs -chmod -R 777/tmp/tweets_staging

i got the message that this directory doesnt exist, so i created this directory. is this correct ?

Expert Contributor

First things first - there should be data getting written to /tmp/tweets_staging on HDFS.

If the Banana UI dashboard is working, and you are getting tweets showing up in real-time - your Ni-Fi flow is working, but not the branch that writes to HDFS. There is also a branch that writes to the OS filesystem. What is showing up in Banana, is the branch that writes into Solr, and gets indexed.

Check the Ni-Fi flow for errors, as Hive will just create an empty table if you run the command (assuming you do not get the errors above).

Moving onto Hive, are you doing this in the Hive view in Ambari. Can you also sanity check Hive - create a dummy table -

create table example1 (ID int, col1 string)

insert into example1 values (1,"hello")

select * from example1

Is Hive working as expected?

Yes Hive is working as expected. the example table is ok. And i also get the real-time data shown in banana and in solar where i can run querys on it. Only not to hdfs. What is this branch which should write the data to the hdfs folder /tmp/tweets_staging ?

Ok now i know what you mean. i downloaded the template in the tutorial for NiFi. when i check the template now and check the process put hdfs it sais failure, 104 queued.

Expert Contributor

At a guess the failure is likely the processor is not pointing to the right directory, and permissions might not be correct. So check this by (as hdfs):

hdfs dfs -mkdir /tmp/tweets_staging

hdfs dfs -chmod 777 /tmp/tweets_staging

you could even test that you can right to HDFS. For example......create a dummy file in /tmp on Linux.

hdfs dfs -put /tmp/dummyfile /tmp/

The processor in Ni-FI needs to be pointing to the /tmp/tweets_staging/ directory

ok i will check this points now.

since i installed NiFi, i have also the red symbols and cant restart the services. It is the SNameNode from HDFS, Falcon, Storm, Ambari Infra and Atlas. Check the image.

Could this also be a problem according to this ?

13183-2017-03-03-11h20-34.png

ok i did the 3 steps. the result is in the picture. the first 2 steps seem ok. at the third he said no such file or directory.

13185-2017-03-03-12h18-34.png

13186-2017-03-03-12h09-13.png

13184-2017-03-03-12h12-05.png

i really dont know what to do now.

Where can i find the nifi logs, or better how can i open them ?

in the nifi config under "advanced nifi-env" it says:

#The directory for NiFi log files export NIFI_LOG_DIR="{{nifi_log_dir}}"

but how do i get there and open them ?

Expert Contributor
@Martin van Husen

Can you just check the /tmp/ directory has the correct permissions....

hdfs dfs -chmod -R 777 /tmp

The components showing in Red in Ambari should have no impact.

Seems not. Result is now:

chmod changing permissions of /tmp/yarn: permission denied. user=root is not the owner of the inode=yarn.

now i understand less. why yarn ?

i found another post, with exactly the same problem i have. He changed the core-site.xml file and it worked afterwards.

https://community.hortonworks.com/questions/65057/failed-to-write-to-parent-hdfs-directory.html

Do you know how i can edit this file in the sandbox ?

i dont find the way to the path of the file.

Expert Contributor

I would like to make sure you can write to /tmp/tweets_staging directory.

On linux as root

echo hello > /tmp/hello.txt

as hdfs:

hdfs dfs -put /tmp/tweets_staging/

Yes, codec is an issue on certain versions of the sandbox. As per the article, you can remove the string from the parameter in Ambari.

; ;