Created 07-26-2017 11:16 AM
about tutorial: https://hortonworks.com/tutorial/analyzing-social-media-and-customer-sentiment-with-apache-nifi-and-...
I have been following above tutorial and everything worked fine but when calculating whether a tweet was positive, neutral, or negative using this next Hive command :
create table IF NOT EXISTS tweets_sentiment stored as orc as select tweet_id, case when sum( polarity ) > 0 then 'positive' when sum( polarity ) < 0 then 'negative' else 'neutral' end as sentiment from l3 group by tweet_id;
I got the following error message:
tanks a million
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1501058816878_0021_1_01, diagnostics=[Task failed, taskId=task_1501058816878_0021_1_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1] at org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:424) at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:183) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:113) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) ... 18 more ], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1] at org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:424) at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:183) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:113) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) ... 18 more ], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1] at org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:424) at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:183) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:113) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) ... 18 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {"tweet_id":890076549655003136,"created_unixtime":1501045760917,"created_time":"Wed Jul 26 05:09:20 +0000 2017","lang":"en","displayname":"2ne1legend21","time_zone":"","msg":"RT hyung_rose_bts BTS_twt The cup couldnt have said it better at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:563) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1] at org.openx.data.jsonserde.JsonSerDe.onMalformedJson(JsonSerDe.java:424) at org.openx.data.jsonserde.JsonSerDe.deserialize(JsonSerDe.java:183) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.readRow(MapOperator.java:149) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.access$200(MapOperator.java:113) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:554) ... 18 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1501058816878_0021_1_01 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1501058816878_0021_1_02, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:2, Vertex vertex_1501058816878_0021_1_02 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (less...)
Created 07-26-2017 02:05 PM
If you read carefully the exception message, it says:
Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1]
This is likely to be caused by some CR-LF characters inside your "msg" object. Hive interprets each line as a full JSON. If you JSON contains newlines, Hive cannot parse it.
Thus you have to clean/reformat you data before being able to analyze it with Hive.
Created 07-26-2017 02:05 PM
If you read carefully the exception message, it says:
Row is not a valid JSON Object - JSONException: Unterminated string at 237 [character 238 line 1]
This is likely to be caused by some CR-LF characters inside your "msg" object. Hive interprets each line as a full JSON. If you JSON contains newlines, Hive cannot parse it.
Thus you have to clean/reformat you data before being able to analyze it with Hive.
Created 07-29-2017 05:33 AM
Dear @Marco Gaido,
I deleted my Json files in hdfs and then I ingested some other tweets in json format. Now, hive is working well.
thanks a lot