Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎05-01-2019

Incorrect query result in Spark-Submit

I encountered a scenario where I am getting incorrect hive query results in program executed using Spark-submit. When I execute the same query using Hue and on command line of Spark-shell using HiveContext I get correct results and it matches, but when I execute the same using the Spark program usinf Spark-Submit it gave incorrect results. The table properties are listed below and the table has around 248M rows. Any particular reason for this to happen? Any workaround or config settings that needs to be done?

 

Table Type:         MANAGED_TABLE       NULL 
120Table Parameters:NULLNULL
121 auto.purge          true                
122 parquet.compress    SNAPPY              
124 NULLNULL
125# Storage InformationNULLNULL
126SerDe Library:      org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDeNULL
127InputFormat:        org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormatNULL
128OutputFormat:       org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormatNULL