About jagadeesan

jagadeesan · ‎08-08-2025

@Malrashed, If you are using Cloudera Runtime 7.1.9 then you can use either CDS 3.3 or CDS 3.5 Powered by Apache Spark as an add-on service. For more details, you can refer to this document. Please note, CDS 3.3 Powered by Apache Spark 3.3.x and CDS 3.5 Powered by Apache Spark 3.5.x are distributed as a parcel (Refer here for additional download details). There are no external Custom Service Descriptors (CSD) for Livy for Spark3 or Spark3 because they are already part of Cloudera Manager 7.11.3. In Cloudera Runtime 7.1.9, Spark 2 is the default. If you need to use Spark 3, it must be added as an add-on service. Note that Spark 2 is deprecated in Cloudera Runtime 7.1.9. Starting with Cloudera Runtime 7.3.x, Spark 3 becomes the default version

jagadeesan · ‎07-11-2025

@moekraft Starting with Cloudera Runtime 7.3.x, Spark 3 is the default and integrated Spark version, and Spark 2 has been removed and is no longer supported. >> Does Spark still get installed separately or is it included with the base runtime? -- As a result, you do not need to install a separate Spark 3 parcel for CDP Private Cloud Base 7.3.x. The Spark 3 runtime is bundled within the Cloudera Runtime parcel itself so you won’t find a separate, compatible Spark 3 parcel in the support matrix or parcel repository for this version. To proceed, simply use the Spark service that comes bundled with the Cloudera Runtime 7.3.x. After initial Cloudera Runtime 7.3.x installation, you can use the Add a Spark3 Service wizard to add and configure new service instances directly via Cloudera Manager >> Apparently Spark 3 is supported by 7.3.1 and Spark 3.5 by 7.3.1 SP1. - Cloudera Runtime 7.3.1.100 CHF 1 bundled with Spark 3.4.x. Please refer the list of the official component versions for Cloudera Runtime 7.3.1.100 CHF 1 - Cloudera Runtime 7.3.1.200 SP1 and latest bundled with Spark 3.5.x. Please refer the list of the official component versions for Cloudera Runtime 7.3.1.200 SP1. >> Should the support matrix be updated to reflect support for Spark 3? - It's for CDS version support matrix, from Cloudera Runtime 7.3.x onwards you don't need CDS for Spark3, so you can refer the below release notes: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/private-release-notes/topics/rt-whats-new-spa...

jagadeesan · ‎07-09-2025

@people Starting with Cloudera Runtime 7.3.x, Spark 3 is the default and integrated Spark version, and Spark 2 has been removed and is no longer supported. As a result, you do not need to install a separate Spark 3 parcel for CDP Private Cloud Base 7.3.x. The Spark 3 runtime is bundled within the Cloudera Runtime parcel itself so you won’t find a separate, compatible Spark 3 parcel in the support matrix or parcel repository for this version. That’s why the attempt to deploy a separate Spark3 parcel triggers the conflict message you are seeing. To proceed, simply use the Spark service that comes bundled with the Cloudera Runtime 7.3.x. After initial Cloudera Runtime 7.3.x installation, you can use the Add a Spark3 Service wizard to add and configure new service instances directly via Cloudera Manager For more details you can refer the below release notes: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/private-release-notes/topics/rt-whats-new-spark.html

jagadeesan · ‎02-27-2025

@euklidasIt looks like memory is not enough for the join operation, especially when you've got billions of rows. Because it's loading all that data into one driver process causes these errors. Instead of toPandas(), write the output to a file or limit data collection this ensures the driver is not overwhelmed.

jagadeesan · ‎04-26-2024

Flume, Storm, Druid, Falcon, Mahout, Ambari, Pig, Sentry, and Navigator have changed or been removed in CDP with replaced components . For Storm can be replaced with Cloudera Streaming Analytics (CSA) powered by Apache Flink. Contact your Cloudera account team for more information about moving from Storm to CSA. You can refer comparing Storm and Flink also Migrating from Storm to Flink.

jagadeesan · ‎04-18-2024

You can refer the Cloudera CDP documentation where it shows Spark MLlib using one of the Spark example applications. Also you can refer this article for Twitter Sentiment Analysis using Spark ML and Spark Streaming in Scala and it's Github source code here.

jagadeesan · ‎04-19-2023

@skasireddy Can you please make sure you have copied hbase-site.xml from the remote HBase cluster to /etc/spark/conf/yarn-conf/ or /etc/spark/conf/ on the Edge node from where you are trying to connect your spark application?

jagadeesan · ‎08-05-2022

Steps for creating jdbc hive interpreter: As the keytab location is not consistent in CDP and changes as services are restarted, we should copy keytab to a consistent location. As we use proxyuser option with hive beeline, zeppelin user must be configured to allow impersonate of another user. Follow the below config steps to create a JDBC interpreter in Zeppelin. - Copy keytab from the current process directory : # cp $(ls -1drt /var/run/cloudera-scm-agent/process/*-ZEPPELIN_SERVER | tail -1)/zeppelin.keytab /var/tmp # chown zeppelin:zeppelin /var/tmp/zeppelin.keytab - Configure core-site to allow proxyuser for zeppelin CM UI > HDFS > Configurations > Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml hadoop.proxyuser.zeppelin.hosts=* hadoop.proxyuser.zeppelin.groups=* - Restart required services(must restart Hadoop and hive_on_tez service) - Configure interpreter in zeppelin with below additional properties : hive.driver org.apache.hive.jdbc.HiveDriver hive.proxy.user.property hive.server2.proxy.user hive.url jdbc:hive2://xxxxxxxxxxxx.com:2181/default;principal=hive/_HOST@XXXXXX.XXXXX.COM;serviceDiscoveryMode=zooKeeper;ssl=true;zooKeeperNamespace=hiveserver2 hive.user hive zeppelin.jdbc.keytab.location /var/tmp/zeppelin.keytab zeppelin.jdbc.principal zeppelin/xxxxxxxxxxxxx.com@XXXXXX.XXXXX.COM - Make sure hive.url/keytab/principal configs are set as per the environment. - Create notebook/paragraph and verify user impersonation and hive access %jdbc(hive) select current_user()

jagadeesan · ‎08-02-2022

Hi @paulo_klein Apache Zeppelin on Cloudera Data Platform supports the following interpreters: JDBC (supports Hive, Phoenix) OS Shell Markdown Livy (supports Spark, Spark SQL, PySpark, PySpark3, and SparkR) AngularJS As you would like to create Hive tables using Zeppelin, you can use the JDBC interpreter to access Hive. The %jdbc the interpreter supports access to Apache Hive data. The interpreter connects to Hive via Thrift. For more details, you can refer to this documentation which describes how to use the Apache Zeppelin JDBC interpreter to access Apache Hive.

jagadeesan · ‎08-02-2022

@Asim- JDBC also you need HWC for Managed tables. Here is the example for Spark2, but as mentioned earlier Spark3 we don't have any other way to connect Hive ACID tables from Apache Spark other than HWC and it is not yet a supported feature for Spark3.2 / CDS 3.2 in CDP 7.1.7. Marking this thread close, if you have any issues related to external tables kindly start a new Support-Questions thread for better tracking of the issue and documentation. Thanks

Online	Offline
Last Visited	‎11-22-2025 11:24 AM

Member Since	‎11-12-2018 10:00 AM
Last Visited	‎11-22-2025 11:24 AM
Posts	218
Kudos received	179

Cloudera Community

Re: Migrating workloads from Spark 2 to Spark 3

Re: Looking for a supported version of Spark 3 for...

Re: Spark 3 Parcel Compatibility with CDP Private ...

Re: Apache Storm support in Cloudera

Re: Complete example for using spark MLlib for twi...

Re: Migrating workloads from Spark 2 to Spark 3

Re: Looking for a supported version of Spark 3 for...

Re: Spark 3 Parcel Compatibility with CDP Private ...

Re: Keep getting "ConnectionRefused" or "OOM" erro...

Re: Apache Storm support in Cloudera

Re: Complete example for using spark MLlib for twi...

Re: Unable to connect remote Hadoop cluster using ...

Re: CDP - Zeppeling: Spark + Livy + Hive - HWC

Re: CDP - Zeppeling: Spark + Livy + Hive - HWC

Re: Spark3 connection to HIVE ACID Tables