About Bharati

Bharati · ‎12-12-2024

Admins can enforce application lifetime SLAs at a service level by setting yarn.scheduler.capacity.<queue-path>.maximum-application-lifetime ” and “yarn.scheduler.capacity.root.<queue-path>.default-application-lifetime” in capacity-scheduler.xml. CM > Yarn > Configuration > Capacity Scheduler Configuration Advanced Configuration Snippet (Safety Valve) reference link is https://blog.cloudera.com/enforcing-application-lifetime-slas-yarn/ How are you setting this parameter? Can you share the entire stacktrace of the failure?

Bharati · ‎10-26-2023

CDP 7.1.7 supports Spark 3.2.3 not Spark 3.3 https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/cds-3/topics/spark-3-requirements.html

Bharati · ‎08-25-2023

Spark 3.3 can be installed on CDP 7.1.8 and higher. Here is the document for pre-req and installation steps https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/cds-3/topics/spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.8/cds-3/topics/spark-install-spark-3-parcel.html

Bharati · ‎05-24-2023

At the moment both uber rss and apache uniffle are not supported in CDP. Dynamic resource allocation requires an external shuffle service that runs on each worker node as an auxiliary service of NodeManager. This service is started automatically; no further steps are needed. spark.shuffle.service.enabled=true enables the external shuffle service. The external shuffle service preserves shuffle files written by executors so that the executors can be deallocated without losing work. Must be enabled if dynamic allocation is enabled.

Bharati · ‎10-19-2022

AFAIU It should not matter. You can choose the rolling restart option as well. So CM can decide the sequence of the broker restarts.

Bharati · ‎05-09-2022

@bluespring One should not be deleting the offline/online partitions that may cause in data loss or under-replicated partitions. You may reassign the partitions to new hosts following the document below: https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/kafka-managing/topics/kafka-manage-cli-reassign-overview.html

Bharati · ‎04-28-2022

@clouderaskme The latest CDP 7.1.7 comes with the default Spark 2.4 version. https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/runtime-release-notes/topics/rt-pvc-runtime-component-versions.html Spark 2.4 supports Python 2.7 and 3.4-3.7. https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/release-guide/topics/cdpdc-os-requirements.html

Bharati · ‎04-28-2022

@clouderaskme Please review the documents below that provides the details on requirements for Spark3.2, 3.1 and 3.0 https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/cds-3/topics/spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/cds-3/topics/spark-spark-3-requirements.html https://docs.cloudera.com/cdp-private-cloud-base/7.1.4/cds-3/topics/spark-spark-3-requirements.html Cloudera Distributed Spark 3.2 requires Python 3.6+ and requires CDP 7.1.7 and higher Cloudera Distributed Spark 3.1 requires Python 3.6+ and requires CDP 7.1.7 and higher Cloudera Distributed Spark 3.0 requires Python 3.4 or higher. and requires CDP7.1.3, 7.1.4 and 7.1.5

Bharati · ‎04-19-2022

@Sayed016 Thank you for your question. From the error stack of CM logs, it looks like it tries to copy the system database and information_schema. You need to exclude the system database and information_schema Add the following exclusion on a Hive replication: Databases: (?!information_schema|sys\b).+ Tables: [\w].+

Bharati · ‎02-16-2022

Please follow the steps below: SSH to Cloudera Manager host where the Spark 3 CSDs are deployed Find the following files and use a file manager (for example mc) or an editor to open them as zip files and edit the contents of "descriptor/service.sdl". Probably the easiest way is to open the jar files with vim: $ vim /opt/cloudera/csd/SPARK3_ON_YARN-3.2.0.3.2.7170.0-49.jar $ vim /opt/cloudera/csd/LIVY_FOR_SPARK3-0.6.3000.3.2.7170.0-49.jar In the descriptor/service.sdl files, prefix the version with something that is higher than the CM version number, so instead of: "version" : "3.0.7110.0", add "version" : "7.5.4.3.0.7110.0", 4. Restart CM server and wait until it comes back up: $ service cloudera-scm-server restart Spark 3 can now be installed. Once installed, deploy client config and restart all services that have stale configs (most importantly YARN).

Online	Offline
Last Visited	‎12-18-2024 11:49 PM

Member Since	‎05-31-2017 12:10 PM
Last Visited	‎12-18-2024 11:49 PM
Posts	38
Kudos received	10

Cloudera Community

Re: Is it possible to install spark 3.2 + on CDH 7...

Re: Support for external shuffle services

Re: Delete orphaned kafka partitions

Re: spark 2 and spark 3 on cdp

Re: Hive - Replication issue through Cloudera Mana...

Re: Pyspark yarn-client application lifetime

Re: CDS 3.3 support in CDP 7.1.7

Re: Is it possible to install spark 3.2 + on CDH 7...

Re: Support for external shuffle services

Re: Adding New Kafka Brokers: Question

Re: Delete orphaned kafka partitions

Re: spark 2 and spark 3 on cdp

Re: spark 2 and spark 3 on cdp

Re: Hive - Replication issue through Cloudera Mana...

Re: Installing spark3 (CDS 3.2)