Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Error in querying Hive table created in parquet format

Expert Contributor

hello - i'm getting error in querying Hive table (data in Parquet format) :

select * from powerpoll_k1 where year=2017 and month=12 and day=11 limit 5;

Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 102.0 failed 4 times, most recent failure: Lost task 0.3 in stage 102.0 (TID 20602, msc02-jag-dn-016.uat.gdcs.apple.com): java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainIntegerDictionaryat org.apache.parquet.column.Dictionary.decodeToLong(Dictionary.java:52)at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getLong(OnHeapColumnVector.java:274)at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:246)at org.apache.spark.sql.execution.SparkPlan$$anonfun$4.apply(SparkPlan.scala:240)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)at org.apache.spark.scheduler.Task.run(Task.scala:86)at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)

Table DDL is shown below :

CREATE EXTERNAL TABLE `powerpoll_k1`(`topic_k` varchar(255), `partition_k` int, `offset_k` bigint, `timestamp_k` timestamp, `deviceid` bigint, `devicename` varchar(50), `deviceip` varchar(128), `peerid` int, `objectid` int, `objectname` varchar(256), `objectdesc` varchar(256), `oid` varchar(50), `pduoutlet` varchar(50), `pluginid` int, `pluginname` varchar(255), `indicatorid` int, `indicatorname` varchar(255), `format` int, `snmppollvalue` varchar(128) COMMENT 'value in kafka avsc', `time` double, `clustername` varchar(50) COMMENT 'rpp or power', `peerip` varchar(50))COMMENT 'external table at /apps/hive/warehouse/amp.db/sevone_power'PARTITIONED BY (`year` int, `month` int, `day` int)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'WITH SERDEPROPERTIES ('serialization.format' = '1')STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'LOCATION 'hdfs://application/apps/hive/warehouse/amp.db/power'TBLPROPERTIES ('transient_lastDdlTime' = '1513022286')

any ideas on what the issue is ?

2 REPLIES 2

Expert Contributor

@bkosaraju, @mqureshi - any ideas on this ?

Expert Contributor

fyi .. i re-checked this & the issue seems to be when i include the column - deviceid bigint - in the query

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.