Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

HiveSQL gives error, while querying twitter data,

avatar
New Contributor
Im getting error, with below query in Hive, please advise what could be the issue?

SELECT id, cast ( from_unixtime( unix_timestamp(concat( '2016 ', substring(created_at,5,15)), 'yyyy MMM dd hh:mm:ss')) as timestamp) ts, text, username.time_zone FROM mytweets_raw;

 

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1601795655318_0005_1_00, diagnostics=[Task failed, taskId=task_1601795655318_0005_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1601795655318_0005_1_00_000000_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
        ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:973)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
        ... 19 more
Caused by: org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2dbf2d26; line: 1, column: 2]
        at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:179)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:936)
        ... 20 more
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character ('O' (code 79)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: java.io.ByteArrayInputStream@2dbf2d26; line: 1, column: 2]
        at org.codehaus.jackson.JsonParser._constructError(JsonParser.java:1433)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportError(JsonParserMinimalBase.java:521)
        at org.codehaus.jackson.impl.JsonParserMinimalBase._reportUnexpectedChar(JsonParserMinimalBase.java:442)
        at org.codehaus.jackson.impl.Utf8StreamParser._handleUnexpectedValue(Utf8StreamParser.java:2090)
        at org.codehaus.jackson.impl.Utf8StreamParser._nextTokenNotInObject(Utf8StreamParser.java:606)
        at org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:492)
        at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:169)
        ... 21 more


TwitterAgent.sources = Twitter
    TwitterAgent.channels = MemChannel
    TwitterAgent.sinks = HDFS
    
    TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
    TwitterAgent.sources.Twitter.channels = MemChannel
    TwitterAgent.sources.Twitter.consumerKey = xxxxxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.consumerSecret = xxxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.accessToken = xxxxxxxxxxxxxxxx
    TwitterAgent.sources.Twitter.accessTokenSecret = xxxxxxxxxxxxxxxxxxxxx
    
    TwitterAgent.sources.Twitter.keywords = Hortonworks, Hadoop, Big Data, open source, Apache Software Foundation, Cloudera, Impala
    
    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
    TwitterAgent.sinks.HDFS.hdfs.path = /user/cloudera/data/tweets_raw
    
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
    TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
    TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 600
    
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 10000
    TwitterAgent.channels.MemChannel.transactionCapacity = 100
    TwitterAgent.channels.MemChannel.byteCapacity=6912212

    Hive DDL
    
    CREATE EXTERNAL TABLE IF NOT EXISTS Mytweets_raw (
       id BIGINT,
       created_at STRING,
       source STRING,
       favorited BOOLEAN,
       retweet_count INT,
       retweeted_status STRUCT<
          text:STRING,
          username:STRUCT<screen_name:STRING,name:STRING>>,
       entities STRUCT<
          urls:ARRAY<STRUCT<expanded_url:STRING>>,
          user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
          hashtags:ARRAY<STRUCT<text:STRING>>>,
       text STRING,
       username STRUCT<
          screen_name:STRING,
          name:STRING,
          friends_count:INT,
          followers_count:INT,
          statuses_count:INT,
          verified:BOOLEAN,
          utc_offset:INT,
          time_zone:STRING>,
       in_reply_to_screen_name STRING
    ) 
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    LOCATION '/user/cloudera/data/tweets_raw';

 

1 REPLY 1

avatar
Master Collaborator

The error seems to indicate that the source JSON is malformed. Check where the data is stored and look at the JSON structure. Each row should be one, self-contained JSON. Please post screenshot here.

 

Also, did you add the necessary jar to Hive: hive-serdes-1.0-SNAPSHOT.jar. I assume you are following this example: https://github.com/cloudera/cdh-twitter-example

 

Finally, you can try a different serDe as shown in this topic: https://community.cloudera.com/t5/Support-Questions/hive-table-error/td-p/127271

 

Or try this solution on stackoverflow: https://stackoverflow.com/questions/32416555/twitter-sentiment-analysis