About dhyun

dhyun · ‎01-31-2018

I added the comment in the above

dhyun · ‎01-31-2018

In SPARK-20901 `Feature Parity for ORC with Parquet`, you can see the issue links marked as `is blocked by`. Among them, the following issues are what you want to see for ORC library, - SPARK-21422 Depend on Apache ORC 1.4.0 - SPARK-22300 Update ORC to 1.4.1 In addition to that, the following will convert Hive ORC table into Spark data sources tables to use Apache ORC 1.4.1. - SPARK-22279 Turn on spark.sql.hive.convertMetastoreOrc by default

dhyun · ‎01-16-2018

As of now, Apache JIRA is `Maintenance in progress`. So, I cannot give you the link. The umbrella ORC JIRA is https://issues.apache.org/jira/browse/SPARK-20901.

dhyun · ‎01-16-2018

If you can wait for it, Apache Spark 2.3 will be released with Apache ORC 1.4.1. There are many ORC patch in Hive. Apache Spark cannot sync it promptly. So, in Apache Spark, we decide to use the latest ORC 1.4.1 library instead of upgrading Hive 1.2.1 library. From Apache Spark 2.3, Hive ORC table is converted into ORC data sources tables by default and uses ORC 1.4.1 library to read it. Not only your issue but also vectorization on ORC are supported. Anyway, again, HDP 2.6.3+ is already shipped with ORC 1.4.1 with vectorization, too.

dhyun · ‎01-15-2018

Hi, @Rajiv Chodisetti . It's related to HIVE-13232 (fixed in Hive 1.3.0, 2.0.1, 2.1.0), but all Apache Spark still uses Hive 1.2.1 library. Could you try HDP 2.6.3+ (2.6.4 is the latest one). HDP Spark 2.2 has that fixed hive library.

dhyun · ‎01-02-2018

Let's ping the maintainer of SHC. Ping, @wyang Could you help @Eric Hanson?

dhyun · ‎01-02-2018

Hi, @Eric Hanson . SHC seems to work for both Spark 1.6.3 and Spark 2.2. Could you share your specific problem with SHC here?

dhyun · ‎12-05-2017

I see. Yes, Ranger and Parquet does. I believe you can find a way for your requirements!

dhyun · ‎12-04-2017

@Felipe Melo Does it solve your problem?

dhyun · ‎12-04-2017

In addition to that, STS supports Spark SQL syntax since v2.0.0. If you want to use Spark SQL Syntax with SQL 2003 support, it's a good choice. Also, you can use Spark-specific syntax like `CACHE TABLE`, too.

Online	Offline
Last Visited	‎12-30-2018 02:30 AM

Member Since	‎12-27-2016 09:30 PM
Last Visited	‎12-30-2018 02:30 AM
Posts	73
Kudos received	34

Cloudera Community

Re: Is there a issue with saving ORC data with Spa...

Re: spark thrift server not started

Re: Spark ORC Stripe Size

Re: Accessing spark dataframe in spark-shell throu...

Re: Using Kryo Serializer with Spark

Re: Spark ORC Stripe Size

Re: Spark ORC Stripe Size

Re: Spark ORC Stripe Size

Re: Spark ORC Stripe Size

Re: Spark ORC Stripe Size

Re: Read Hbase tables using Spark 2

Re: Read Hbase tables using Spark 2

Re: Securing Parquet Files Column-wise

Re: Securing Parquet Files Column-wise

Re: Accessing spark dataframe in spark-shell throu...