Created 07-31-2018 02:15 PM
Using Spark-sql with Spark-2.2.0, the following query results in an error:
Query (as printed by spark exception in the console):
CREATE EXTERNAL TABLE IF NOT EXISTS `databaseName`.`tableName` (some field names . . .) PARTITIONED BY (`tenant` STRING, `year` STRING, `month` STRING, `day` STRING, `hour` STRING, `minute` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '~' LINES TERMINATED BY '
^^^
' STORED AS ORC LOCATION 'hdfs://clusterName:8020/StorageLocation/'
Error: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orc'(line 1, pos 0)
This error does not occur when using HiveQL using Hive CLI or when running this query in Hive View via Ambari, or even through hive jdbc. Why does this cause an error in Spark-SQL?
Created 08-01-2018 07:18 AM
This validation is intentionally added in spark with SPARK-15279. As it doesn't make sense to provide DELIMITERS for ORC | PARQUET files.
Created 07-31-2018 03:46 PM
I believe that the answer is the SQL with
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '~' LINES TERMINATED BY '^^^'
Is simply unsupported HiveQL - and this should be unsupported as it is not used by the ORC format.
Created 08-01-2018 07:18 AM
This validation is intentionally added in spark with SPARK-15279. As it doesn't make sense to provide DELIMITERS for ORC | PARQUET files.