Support Questions
Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Creating ORC table - using new ORC file, hive and a column partitioning

New Contributor

I 'm trying to create an ORC-table, which can store orc-files as 'native' OrcFileFormat. Unfortunately, I 'm having an issue with using partitions and saving orc-files as 'native' OrcFileFormat.

With Partitioning

1. For partitioning, I'm using the 'PARTITION BY COLUMN' clause. This results into the following query:

     day DATE,
     datetime TIMESTAMP,
     description STRING,
     eventtype STRING
PARTITIONED BY (eventtype )

2. Further on, I'm using (Py)Spark to insert data into the table. In the spark context, I added some configurations, such as spark.sql.orc.impl=native.

# Create SparkContext with configurations for new OrcFormat
session_builder = pyspark.sql.SparkSession.builder \
    .enableHiveSupport() \
    .appName("test-orc") \
    .config("spark.sql.hive.convertMetastoreOrc", "true") \
    .config("spark.sql.orc.cache.stripe.details.size", "10000") \
    .config("spark.sql.orc.enabled", "true") \
    .config("spark.sql.orc.filterPushdown", "true") \
    .config("spark.sql.orc.splits.include.file.footer", "true") \
    .config("spark.sql.hive.metastorePartitionPruning", "true") \
    .config("spark.sql.orc.impl","native") \
    .config("spark.sql.orc.enableVectorizedReader", "true")   
spark = session_builder.getOrCreate()

# event_df is a dataframe with eventdata
event_df = spark.sql("select * from tempTable")

3. Next, I'm using the following command to check the OrcFileFormat:

java -jar /opt/orc-test/orc-tools-1.4.3-uber.jar meta hdfs://root/apps/hive/warehouse/livedm.db/events 2> /dev/null | grep Version | head -n1

Which results into:

File Version: 0.12 with HIVE_8732

Unfortunately, orc-files have been stored as 'hive' orcFileFormat

Without Partitioning

When I 'm repeating the earlier steps, without using any partitioning on step 1, the orc-files are successfully being stored as 'native' orcFileFormat. By using the command from step 3, the following are being returned:

File Version: 0.12 with ORC_135


How can I use partitioning (properly) when creating an Orc-table, and store data as 'native ' orcFileFormat?


New Contributor

Solved it by replacing the "USING HIVE OPTIONS (fileFormat 'ORC')"-clause with "USING ORC"-clause.

New Contributor

I´m really wondering about if there something in which works well in cloudera platform without being annoying!