Created 03-02-2017 03:57 PM
Hello together,
aim doing the tutorial, and did everything like explained. now iam at that point that i created all the tables. dictionary time_zone_map and tweets_text. but my tweets_text is empty. i connected the data flow and i have my data in the banana dashboard. which step i missed ? how can i get my twitter feeds now into my table ?
Greets,
Martin
Created 03-02-2017 04:05 PM
Can you confirm you have tweet data on HDFS, in the /tmp/tweets_staging/ directory (I believe that is used in the article).
hdfs dfs -ls /tmp/tweets_staging
to check (as hdfs user)
You also need to run the Add JAR command when creating the table (note: the file path might be different):
ADD JAR /usr/hdp/2.5.3.0-37/hive2/lib/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar
Created 03-02-2017 04:20 PM
Hey,
no after the normal steps there was no data. then i saved the data in solr banana dashboard at the top right as a json file and uploaded this file to the dirctory and created the table again. but then it was full of rubbish. but i did this without the Add Jar Command ( its not in the tutorial at this point).
So now i did again with the Add Jar cmd. i didnt get a error when creating the table but when i select on the table now this is the output:
{"trace":"java.lang.Exception: Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.\n\njava.lang.Exception: Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.\n\tat org.apache.ambari.view.hive2.actor.message.job.FetchFailed.\u003cinit\u003e(FetchFailed.java:28)\n\tat org.apache.ambari.view.hive2.actor.OperationController.fetchResultActorRef(OperationController.java:200)\n\tat org.apache.ambari.view.hive2.actor.OperationController.handleMessage(OperationController.java:135)\n\tat org.apache.ambari.view.hive2.actor.HiveActor.onReceive(HiveActor.java:38)\n\tat akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)\n\tat akka.actor.Actor$class.aroundReceive(Actor.scala:467)\n\tat akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)\n\tat akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)\n\tat akka.actor.ActorCell.invoke(ActorCell.scala:487)\n\tat akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)\n\tat akka.dispatch.Mailbox.run(Mailbox.scala:220)\n\tat akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)\n\tat scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\n","message":"Cannot fetch result for job. Job with id: 428 for instance: AUTO_HIVE_INSTANCE has either not started or has expired.","status":500}
Created 03-02-2017 04:28 PM
if i try to create the table from the json file without the add jar cmd this is the result of the select on the table:
{"trace":"org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\norg.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)\n\tat org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)\n\tat org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)\n\tat org.apache.ambari.view.hive2.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:79)\n\tat org.apache.ambari.view.hive2.actor.HiveActor.onReceive(HiveActor.java:38)\n\tat akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)\n\tat akka.actor.Actor$class.aroundReceive(Actor.scala:467)\n\tat akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)\n\tat akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)\n\tat akka.actor.ActorCell.invoke(ActorCell.scala:487)\n\tat akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)\n\tat akka.dispatch.Mailbox.run(Mailbox.scala:220)\n\tat akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)\n\tat scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)\n\tat scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)\n\tat scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)\n\tat scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)\nCaused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:411)\n\tat org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:233)\n\tat org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:780)\n\tat org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:478)\n\tat org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:692)\n\tat org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1557)\n\tat org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1542)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)\n\tat org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)\n\tat org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520)\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427)\n\tat org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)\n\tat org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1762)\n\tat org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:406)\n\t... 13 more\nCaused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with \u0027}\u0027 at 2 [character 3 line 1]\n\tat org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:412)\n\tat org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:174)\n\tat org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:501)\n\t... 17 more\n","message":"Failed to fetch next batch for the Resultset","status":500}
Created 03-02-2017 04:48 PM
ok now i see i didnt save the date through banana. it only saves the layout of the dashboard. so it cant work-_-
but how can i save the data then. which one in the tutorial is the step to save the data ? cause when i was at this step:
sudo -u hdfs hadoop fs -chown -R maria_dev /tmp/tweets_staging
sudo -u hdfs hadoop fs -chmod -R 777/tmp/tweets_staging
i got the message that this directory doesnt exist, so i created this directory. is this correct ?
Created 03-02-2017 04:53 PM
First things first - there should be data getting written to /tmp/tweets_staging on HDFS.
If the Banana UI dashboard is working, and you are getting tweets showing up in real-time - your Ni-Fi flow is working, but not the branch that writes to HDFS. There is also a branch that writes to the OS filesystem. What is showing up in Banana, is the branch that writes into Solr, and gets indexed.
Check the Ni-Fi flow for errors, as Hive will just create an empty table if you run the command (assuming you do not get the errors above).
Moving onto Hive, are you doing this in the Hive view in Ambari. Can you also sanity check Hive - create a dummy table -
create table example1 (ID int, col1 string)
insert into example1 values (1,"hello")
select * from example1
Is Hive working as expected?
Created 03-02-2017 05:27 PM
Yes Hive is working as expected. the example table is ok. And i also get the real-time data shown in banana and in solar where i can run querys on it. Only not to hdfs. What is this branch which should write the data to the hdfs folder /tmp/tweets_staging ?
Created 03-02-2017 05:32 PM
Ok now i know what you mean. i downloaded the template in the tutorial for NiFi. when i check the template now and check the process put hdfs it sais failure, 104 queued.
Created 03-02-2017 09:41 PM
At a guess the failure is likely the processor is not pointing to the right directory, and permissions might not be correct. So check this by (as hdfs):
hdfs dfs -mkdir /tmp/tweets_staging
hdfs dfs -chmod 777 /tmp/tweets_staging
you could even test that you can right to HDFS. For example......create a dummy file in /tmp on Linux.
hdfs dfs -put /tmp/dummyfile /tmp/
The processor in Ni-FI needs to be pointing to the /tmp/tweets_staging/ directory
Created on 03-03-2017 10:24 AM - edited 08-19-2019 02:01 AM
ok i will check this points now.
since i installed NiFi, i have also the red symbols and cant restart the services. It is the SNameNode from HDFS, Falcon, Storm, Ambari Infra and Atlas. Check the image.
Could this also be a problem according to this ?
Created on 03-03-2017 12:02 PM - edited 08-19-2019 02:01 AM
ok i did the 3 steps. the result is in the picture. the first 2 steps seem ok. at the third he said no such file or directory.
i really dont know what to do now.
Created 03-03-2017 12:23 PM
Where can i find the nifi logs, or better how can i open them ?
in the nifi config under "advanced nifi-env" it says:
#The directory for NiFi log files export NIFI_LOG_DIR="{{nifi_log_dir}}"
but how do i get there and open them ?
Created 03-03-2017 03:25 PM
Can you just check the /tmp/ directory has the correct permissions....
hdfs dfs -chmod -R 777 /tmp
The components showing in Red in Ambari should have no impact.
Created 03-03-2017 03:40 PM
Seems not. Result is now:
chmod changing permissions of /tmp/yarn: permission denied. user=root is not the owner of the inode=yarn.
now i understand less. why yarn ?
i found another post, with exactly the same problem i have. He changed the core-site.xml file and it worked afterwards.
https://community.hortonworks.com/questions/65057/failed-to-write-to-parent-hdfs-directory.html
Do you know how i can edit this file in the sandbox ?
i dont find the way to the path of the file.
Created 03-03-2017 09:19 PM
I would like to make sure you can write to /tmp/tweets_staging directory.
On linux as root
echo hello > /tmp/hello.txt
as hdfs:
hdfs dfs -put /tmp/tweets_staging/
Yes, codec is an issue on certain versions of the sandbox. As per the article, you can remove the string from the parameter in Ambari.