Member since
04-24-2019
4
Posts
0
Kudos Received
0
Solutions
04-29-2019
01:32 PM
Solved it by replacing the "USING HIVE OPTIONS (fileFormat 'ORC')"-clause with "USING ORC"-clause.
... View more
04-26-2019
03:32 AM
I 'm trying to create an ORC-table, which can store orc-files as 'native' OrcFileFormat. Unfortunately, I 'm having an issue with using partitions and saving orc-files as 'native' OrcFileFormat. With Partitioning 1. For partitioning, I'm using the 'PARTITION BY COLUMN' clause. This results into the following query: CREATE TABLE livedm.events (
day DATE,
datetime TIMESTAMP,
description STRING,
eventtype STRING
)
USING HIVE OPTIONS (fileFormat 'ORC')
PARTITIONED BY (eventtype ) 2. Further on, I'm using (Py)Spark to insert data into the table. In the spark context, I added some configurations, such as spark.sql.orc.impl=native. # Create SparkContext with configurations for new OrcFormat
session_builder = pyspark.sql.SparkSession.builder \
.enableHiveSupport() \
.appName("test-orc") \
.config("spark.sql.hive.convertMetastoreOrc", "true") \
.config("spark.sql.orc.cache.stripe.details.size", "10000") \
.config("spark.sql.orc.enabled", "true") \
.config("spark.sql.orc.filterPushdown", "true") \
.config("spark.sql.orc.splits.include.file.footer", "true") \
.config("spark.sql.hive.metastorePartitionPruning", "true") \
.config("spark.sql.orc.impl","native") \
.config("spark.sql.orc.enableVectorizedReader", "true")
spark = session_builder.getOrCreate()
# event_df is a dataframe with eventdata
event_df = spark.sql("select * from tempTable")
event_df.write.mode("append").format("org.apache.spark.sql.execution.datasources.orc").insertInto("livedm.events") 3. Next, I'm using the following command to check the OrcFileFormat: %sh
java -jar /opt/orc-test/orc-tools-1.4.3-uber.jar meta hdfs://root/apps/hive/warehouse/livedm.db/events 2> /dev/null | grep Version | head -n1 Which results into: File Version: 0.12 with HIVE_8732 Unfortunately, orc-files have been stored as 'hive' orcFileFormat Without Partitioning When I 'm repeating the earlier steps, without using any partitioning on step 1, the orc-files are successfully being stored as 'native' orcFileFormat. By using the command from step 3, the following are being returned: File Version: 0.12 with ORC_135 Question How can I use partitioning (properly) when creating an Orc-table, and store data as 'native ' orcFileFormat?
... View more
Labels:
- Labels:
-
Apache Hive