About I_am_the_Bug

I_am_the_Bug · ‎04-17-2018

I have managed to solve my problem, it was a silly little mistake that I was making. I created JSON table using: ADD JAR hdfs://hwmaster01.com/user/root/hive-serdes-1.0-SNAPSHOT.jar; CREATE TABLE tweets_pqt ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweeted_status STRUCT< text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, text STRING, user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING>, in_reply_to_screen_name STRING ) ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' After putting that above mentioned JAR file in Cloudera Manager's "Hive Auxiliary JARs Directory" (on un-managed machine you can find that under "hive.aux.jars.path" property in hive--site.xml) Then I created the Parquet table with same structure as above with a little change: ADD JAR hdfs://hwmaster01.com/user/root/hive-serdes-1.0-SNAPSHOT.jar; CREATE TABLE tweets_pqt ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweeted_status STRUCT< text:STRING, user:STRUCT<screen_name:STRING,name:STRING>, retweet_count:INT>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, text STRING, user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:INT, time_zone:STRING>, in_reply_to_screen_name STRING ) --ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe' STORED AS PARQUET; I inserted into parquet table succesfully the moment I commented that line.

I_am_the_Bug · ‎04-17-2018

I am having the same issue while selecting and inserting data from a JSON SerDe table to Parquet Table. I also have required "hive-hcatalog-core-1.1.0-cdh5.14.0.jar" in hive_aux path (without that jar the JSON table didn't even read the data properly. I lookind into Spark logs and found this: Job aborted due to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure: Lost task 0.3 in stage 6.0 (TID 12, hwslave02.com, executor 1): java.lang.RuntimeException: Error processing row: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":985840392984780801,"created_at":"Mon Apr 16 11:20:40 +0000 2018","source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>","favorited":false,"retweeted_status":{"text":"Os dejamos la #OpinionReal\nde @DbenavidesMReal\n\nLeedla amigos! 😉\n\n#HalaMadrid #MadridistaReal \n\nhttps://t.co/pVthhZThxF","user":{"screen_name":"RMadridistaReal","name":"#MadridistaReal"},"retweet_count":15},"entities":{"urls":[{"expanded_url":"http://madridistareal.com/opinionreal-isco-sobresale-en-un-madrid-firme/"}],"user_mentions":[{"screen_name":"RMadridistaReal","name":"#MadridistaReal"},{"screen_name":"DbenavidesMReal","name":"Dani Benavides"}],"hashtags":[{"text":"OpinionReal"},{"text":"HalaMadrid"},{"text":"MadridistaReal"}]},"text":"RT @RMadridistaReal: Os dejamos la #OpinionReal\nde @DbenavidesMReal\n\nLeedla amigos! 😉\n\n#HalaMadrid #MadridistaReal \n\nhttps://t.co/pVthhZThxF","user":{"screen_name":"mariadelmadrid","name":"Carmen Madridista","friends_count":4991,"followers_count":3661,"statuses_count":55872,"verified":false,"utc_offset":7200,"time_zone":"Madrid"},"in_reply_to_screen_name":null} at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:154) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:48) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.processNextRecord(HiveMapFunctionResultList.java:27) at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:95) at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) at org.apache.spark.rdd.AsyncRDDActions$$anonfun$foreachAsync$1$$anonfun$apply$15.apply(AsyncRDDActions.scala:120) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022) at org.apache.spark.SparkContext$$anonfun$38.apply(SparkContext.scala:2022) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"id":985840392984780801,"created_at":"Mon Apr 16 11:20:40 +0000 2018","source":"<a href=\"http://twitter.com/download/android\" rel=\"nofollow\">Twitter for Android</a>","favorited":false,"retweeted_status":{"text":"Os dejamos la #OpinionReal\nde @DbenavidesMReal\n\nLeedla amigos! 😉\n\n#HalaMadrid #MadridistaReal \n\nhttps://t.co/pVthhZThxF","user":{"screen_name":"RMadridistaReal","name":"#MadridistaReal"},"retweet_count":15},"entities":{"urls":[{"expanded_url":"http://madridistareal.com/opinionreal-isco-sobresale-en-un-madrid-firme/"}],"user_mentions":[{"screen_name":"RMadridistaReal","name":"#MadridistaReal"},{"screen_name":"DbenavidesMReal","name":"Dani Benavides"}],"hashtags":[{"text":"OpinionReal"},{"text":"HalaMadrid"},{"text":"MadridistaReal"}]},"text":"RT @RMadridistaReal: Os dejamos la #OpinionReal\nde @DbenavidesMReal\n\nLeedla amigos! 😉\n\n#HalaMadrid #MadridistaReal \n\nhttps://t.co/pVthhZThxF","user":{"screen_name":"mariadelmadrid","name":"Carmen Madridista","friends_count":4991,"followers_count":3661,"statuses_count":55872,"verified":false,"utc_offset":7200,"time_zone":"Madrid"},"in_reply_to_screen_name":null} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.processRow(SparkMapRecordHandler.java:141) ... 16 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.ParquetHiveRecord at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:149) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:717) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:98) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497) ... 17 more Driver stacktrace: The JSON table contains twitter data. My Guess is, while converting to parquet it cannot maintain file's nested schema. Does anyone got a solution or cause of problem to this?

Online	Offline
Last Visited	‎05-28-2018 01:59 AM

Member Since	‎12-18-2017 02:53 AM
Last Visited	‎05-28-2018 01:59 AM
Posts	2
Kudos received	1

Cloudera Community

Re: return code 3 from org.apache.hadoop.hive.ql.e...

Re: return code 3 from org.apache.hadoop.hive.ql.e...