Support Questions

Find answers, ask questions, and share your expertise

Why Row Format Delimited does not work with Spark SQL ORC Format?

avatar

Using Spark-sql with Spark-2.2.0, the following query results in an error:

Query (as printed by spark exception in the console):

CREATE EXTERNAL TABLE IF NOT EXISTS `databaseName`.`tableName` (some field names . . .) PARTITIONED BY (`tenant` STRING, `year` STRING, `month` STRING, `day` STRING, `hour` STRING, `minute` STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '~' LINES TERMINATED BY '

^^^

' STORED AS ORC LOCATION 'hdfs://clusterName:8020/StorageLocation/'

Error: org.apache.spark.sql.catalyst.parser.ParseException: Operation not allowed: ROW FORMAT DELIMITED is only compatible with 'textfile', not 'orc'(line 1, pos 0)

This error does not occur when using HiveQL using Hive CLI or when running this query in Hive View via Ambari, or even through hive jdbc. Why does this cause an error in Spark-SQL?

1 ACCEPTED SOLUTION

avatar
Expert Contributor

This validation is intentionally added in spark with SPARK-15279. As it doesn't make sense to provide DELIMITERS for ORC | PARQUET files.

View solution in original post

2 REPLIES 2

avatar

I believe that the answer is the SQL with

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '~' LINES TERMINATED BY '^^^'

Is simply unsupported HiveQL - and this should be unsupported as it is not used by the ORC format.

avatar
Expert Contributor

This validation is intentionally added in spark with SPARK-15279. As it doesn't make sense to provide DELIMITERS for ORC | PARQUET files.