Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Error in querying Hive table created in parquet format

Error in querying Hive table created in parquet format

Expert Contributor

hello - i'm getting error in querying Hive table (data in Parquet format) :

select * from powerpoll_k1 where year=2017 and month=12 and day=11 limit 5;

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 102.0 failed 4 times, most recent failure: Lost task 0.3 in stage 102.0 (TID 20602, msc02-jag-dn-016.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)

Table DDL is shown below :

CREATE EXTERNAL TABLE `powerpoll_k1`(`topic_k` varchar(255), `partition_k` int, `offset_k` bigint, `timestamp_k` timestamp, `deviceid` bigint, `devicename` varchar(50), `deviceip` varchar(128), `peerid` int, `objectid` int, `objectname` varchar(256), `objectdesc` varchar(256), `oid` varchar(50), `pduoutlet` varchar(50), `pluginid` int, `pluginname` varchar(255), `indicatorid` int, `indicatorname` varchar(255), `format` int, `snmppollvalue` varchar(128) COMMENT 'value in kafka avsc', `time` double, `clustername` varchar(50) COMMENT 'rpp or power', `peerip` varchar(50))COMMENT 'external table at /apps/hive/warehouse/amp.db/sevone_power'PARTITIONED BY (`year` int, `month` int, `day` int)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'WITH SERDEPROPERTIES ('serialization.format' = '1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION 'hdfs://application/apps/hive/warehouse/amp.db/power'TBLPROPERTIES ('transient_lastDdlTime' = '1513022286')

any ideas on what the issue is ?

2 REPLIES 2
Highlighted

Re: Error in querying Hive table created in parquet format

Expert Contributor

@bkosaraju, @mqureshi - any ideas on this ?

Re: Error in querying Hive table created in parquet format

Expert Contributor

fyi .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query