Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.
I encountered a scenario where I am getting incorrect hive query results in program executed using Spark-submit. When I execute the same query using Hue and on command line of Spark-shell using HiveContext I get correct results and it matches, but when I execute the same using the Spark program usinf Spark-Submit it gave incorrect results. The table properties are listed below and the table has around 248M rows. Any particular reason for this to happen? Any workaround or config settings that needs to be done? Table Type: MANAGED_TABLE NULL 120 Table Parameters: NULL NULL 121 auto.purge true 122 parquet.compress SNAPPY 124 NULL NULL 125 # Storage Information NULL NULL 126 SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL 127 InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat NULL 128 OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL
... View more