Member since
12-27-2016
73
Posts
34
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
24394 | 03-23-2018 09:21 PM | |
2083 | 02-05-2018 07:08 PM | |
8415 | 01-15-2018 07:21 PM | |
1897 | 12-01-2017 06:35 PM | |
5155 | 03-09-2017 06:21 PM |
08-24-2018
08:29 PM
@Manikandan Jeyabal. Are you using the official Apache Spark? New ORC vectorized reader is added at Apache Spark 2.3.0. Please see SPARK-16060.
... View more
03-26-2018
01:03 AM
Great! Thank you for sharing your experience too. Your summary and
understanding is correct. For Hive, since Hive 1.2.1 ORC writer and reader is too
old, so it has some bugs of course. In general, it will read a
new data correctly. For the best performance and safety, the latest Hive
is recommended. Hive 2.3.0 starts to use Apache ORC.
For Apache ORC library, Apache
Spark 2.3 was released with Apache ORC 1.4.1 due to some reasons. Please
use with the latest one, Apache ORC 1.4.3, if possible. There is a known issue, SPARK-23340.
... View more
03-24-2018
01:35 AM
Oh, is it? I'll try to reproduce your situation. Could you share more information about your sw stack? Apache Spark 2.3 on Hadoop 2.7 and Kafka? Could you confirm that you are using new OrcFileFormat by setting `spark.sql.orc.impl=native`? The above bugs are fixed on new OrcFileFormat only.
... View more
03-23-2018
09:21 PM
2 Kudos
Although it seems that you are hitting output format issue, ORC is tested properly after SPARK-22781. As one example, `FileNotFoundException` might occur because of empty dataframe. (SPARK-15474) There are more ORC issue before Apache Spark 2.3. Please see SPARK-20901 for the full list.
... View more
03-23-2018
09:16 PM
1 Kudo
Hi, @Sanjay Gurnani Officially, Apache Spark 2.2.1 Structured Streaming document doesn't mention ORC properly. Apache Spark 2.3 document starts to include ORC. - http://spark.apache.org/docs/2.2.1/structured-streaming-programming-guide.html
... View more
02-27-2018
04:10 PM
Hi, @prasad raju
Unfortunately, ORC doesn't support BZip2, so Hive and Spark doesn't.
- ORC Source Code
- HIVE-5067
... View more
02-13-2018
04:45 AM
Thank you for confirming.
... View more
02-11-2018
09:57 PM
1 Kudo
Hi, @Mai Nakagawa You are using a mismatched jar file as you saw in your first exception message. because LLAP or Hive classes are not found. This document is about HDP 2.6.1 using Spark 2.1.1. Since HDP 2.6.3, `spark-llap` for Spark 2.2 is built-in. Please use it. $ ls -al /usr/hdp/2.6.3.0-235/spark_llap/spark-llap-assembly-1.0.0.2.6.3.0-235.jar -rw-r--r-- 1 root root 61306448 Oct 30 02:39 /usr/hdp/2.6.3.0-235/spark_llap/spark-llap-assembly-1.0.0.2.6.3.0-235.jar
... View more
02-06-2018
05:22 PM
It's a memory size for Spark executor (worker). And, there is additional overhead in Spark executor. You need to set a proper value by yourself. Of course, in YARN environment, the memory (+ overhead) should be smaller than the limitation of YARN container. So, Spark shows you the error message. It's an application property. For normal Spark jobs, users are responsible because each app can set their `spark.executor.memory` with `spark-submit`. For Spark Thrift Server, admins should manage that properly when they adjust YARN configuration. For more information, please see this. http://spark.apache.org/docs/latest/configuration.html#application-properties
... View more
02-05-2018
07:08 PM
Hi, @Michael Bronson `spark.executor.memory` seems to be 10240. Please change it in your Ambari, `spark-thrift-conf`.
... View more