Member since
11-16-2015
195
Posts
36
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2057 | 10-23-2019 08:44 PM | |
2149 | 09-18-2019 09:48 AM | |
8209 | 09-18-2019 09:37 AM | |
1877 | 07-16-2019 10:58 AM | |
2685 | 04-05-2019 12:06 AM |
05-31-2019
09:29 AM
And I found a solution by pointint job.local.dir to directory with the code: spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('job.local.dir', 'file:/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() Now all works
... View more
04-10-2019
08:31 AM
Hi, Any update on this? Is this issue resolved? if yes, please let us know the solution, we are also facing the same issue.
... View more
04-05-2019
01:03 AM
1 Kudo
Thank you for the great explanation @AutoIN. This solved my problem. On our CDSW cluster we have 2 nodes with a master and a slave. As described, I was able to figure out that the available cpu and memeory on both hosts are badly distributed. As an example, I'm able to spin an engine with a lot of vcpus but with little memory and vice versa. I was just not aware that a session can't share resources across nodes. Thank you very much!
... View more
03-08-2019
05:10 AM
I am facing the same issue and can anyone please suggest how to resolve this. On running two spark application , one remains at accepted state while other is running. What is the configuration that needs to be done for this to be working? Following is the configuration for dynamic resource pool config: Please help!
... View more
07-05-2018
08:34 PM
1 Kudo
@Rod No, it is unsupported (as of writing) in both CDH5 and CDH6. https://www.cloudera.com/documentation/enterprise/6/release-notes/topics/rg_cdh_600_unsupported_features.html#spark Spark SQL CLI is not supported
... View more
06-11-2018
09:23 PM
1 Kudo
Just wanted to complete the thread here. This is now documented in the known issues section of the Spark2.3 documentation followed by workarounds to mitigate the error. Thx. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_known_issues.html#concept_kgn_j3g_5db In CDS 2.3 release 2, Spark jobs fail when lineage is enabled because Cloudera Manager does not automatically create the associated lineage log directory (/var/log/spark2/lineage) on all required cluster hosts. Note that this feature is enabled by default in CDS 2.3 release 2.
Implement one of the following workarounds to continue running Spark jobs.
Workaround 1 - Deploy the Spark gateway role on all hosts that are running the YARN NodeManager role
Cloudera Manager only creates the lineage log directory on hosts with Spark 2 roles deployed on them. However, this is not sufficient because the Spark driver can run on any host that is running a YARN NodeManager. To ensure Cloudera Manager creates the log directory, add the Spark 2 gateway role to every cluster host that is running the YARN NodeManager role.
For instructions on how to add a role to a host, see the Cloudera Manager documentation: Adding a Role Instance
Workaround 2 - Disable Spark Lineage Collection
To disable the feature, log in to Cloudera Manager and go to the Spark 2 service. Click Configuration. Search for the Enable Lineage Collection property and uncheck the checkbox to disable lineage collection. Click Save Changes.
... View more
05-02-2018
03:44 AM
1 Kudo
@jirapong this is a known issue which we've recently seen in CDS 2.3 On Spark 2.3 the nativeLoader (SnappyNativeLoader’s) parentClassLoader is now an ExecutorClassLoader , whereas the parentClassLoader was a Launcher$ExtClassLoader prior to Spark 2.3. This created incompatibility with the snappy version (snappy-java.1.0.4.1) packaged with CDH. We are currently working on a solution in a future release, but there are two workarounds: 1) Use a later version of the Snappy library, which works with the above-mentioned class loader change, for example, snappy-java-1.1.4. Place the new snappy-java library on a local file system (for example /var/snappy). Then run your spark application with the user classpath options as shown below: spark2-shell --jars /var/snappy/snappy-java-1.1.4.jar --conf spark.userClassspathFirst=true --conf spark.executor.extraClassPath="./snappy-java-1.1.4.jar" 2) Instead of using Snappy, you can set the compression by changing the codec to LZ4 or UNCOMPRESSED (which you've already tested).
... View more
05-01-2018
08:26 PM
@Swasg by any chance are you using the package name in the spark-shell? Something like spark-shell --packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11-2.3.0 The error suggests that the format should be in the form of 'groupId:artifactId:version' but in your case it's 'groupId:artifactId-version'. If you are using the package in the command line or somewhere in your configuration, please modify it to: org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.3.0
... View more
05-01-2018
05:19 AM
4 Kudos
@rams the error is correct as the syntax in pyspark varies from that of scala. For reference here are the steps that you'd need to query a kudu table in pyspark2 Create a kudu table using impala-shell # impala-shell CREATE TABLE test_kudu (id BIGINT PRIMARY KEY, s STRING) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU; insert into test_kudu values (100, 'abc'); insert into test_kudu values (101, 'def'); insert into test_kudu values (102, 'ghi'); Launch pyspark2 with the artifacts and query the kudu table # pyspark2 --packages org.apache.kudu:kudu-spark2_2.11:1.4.0 ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT /_/ Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) SparkSession available as 'spark'. >>> kuduDF = spark.read.format('org.apache.kudu.spark.kudu').option('kudu.master',"nightly512-1.xxx.xxx.com:7051").option('kudu.table',"impala::default.test_kudu").load() >>> kuduDF.show(3) +---+---+ | id| s| +---+---+ |100|abc| |101|def| |102|ghi| +---+---+ For records, the same thing can be achieved using the following commands in spark2-shell # spark2-shell --packages org.apache.kudu:kudu-spark2_2.11:1.4.0 Spark context available as 'sc' (master = yarn, app id = application_1525159578660_0011). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0.cloudera3-SNAPSHOT scala> import org.apache.kudu.spark.kudu._ import org.apache.kudu.spark.kudu._ scala> val df = spark.sqlContext.read.options(Map("kudu.master" -> "nightly512-1.xx.xxx.com:7051","kudu.table" -> "impala::default.test_kudu")).kudu scala> df.show(3) +---+---+ | id| s| +---+---+ |100|abc| |101|def| |102|ghi| +---+---+
... View more
04-19-2018
06:49 PM
Thanks a lot! Finally, Sqoop.. 🙂
... View more