About vafs

vafs · ‎11-03-2025

Hello @yoonli, Thanks for contacting our Cloudera Community and sharing your question. Something I have to mention is that Cloudera does not support standalone Spark cluster, we only work with YARN clusters. Anyway, taking a quick look on this issue, I see that you're not mentioning any principal or keytab, when you use Kerberos, it will always try to run kinit, but if there are no principal and keytab, it will fail. Something you can try is using the Simple Auth method, you can add these two settings on your spark-defaults.conf: spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider spark.hadoop.security.authentication=simple hadoop.security.authentication=simple https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

vafs · ‎10-18-2025

Hello @donaldo71, This looks like SQL is getting the deadlock because of the many records in once. You can try couple of this. First, enable the retry option on the PutSql or PutdatabaseRacord processors. If the retries helped previously, this could also help in your case. Also, decrease the concurrency and batch sizes to try to decrease the SQL load. Additionally, on the SQL side, if you can, use row versioning isolation to reduce locking: ALTER DATABASE DBNAME SET READ_COMMITTED_SNAPSHOT ON;

vafs · ‎10-16-2025

Understood. Hopefully the missing NAR's pointed on the previous update help you figure the issue.

vafs · ‎10-15-2025

Hello @pnac03, Have you checked if the truststore has the CA cert from the NiFi Registry imported? keytool -list -keystore /path/to/truststore.jks If is not listed there, you will need to import it : keytool -importcert -alias 3SCDemo-CA -file /tmp/ca-cert.pem -keystore /path/to/truststore.jks

vafs · ‎10-15-2025

Hello @mbraunerde, I see you mentioned that you're using NiFi 2.5.0, I think that version is not provided for Cloudera on the CFM right? Even the most recent CFM does not have NiFi 2.5.0, the latest is CFM 4.10 with NiFi 2.3.0. I ask because the Cloudera provided CFM do have Parquet already included: PutParquet https://docs.cloudera.com/cfm/4.10.0/release-notes/topics/cfm-supported-processors.html Now, if you want to add it on a custom NiFi install, you should import those NAR already loaded but also you need nifi-standard-services-api-nar and nifi-record-serialization-services-nar. You can take a look here: https://mvnrepository.com/artifact/org.apache.nifi/nifi-standard-services-api-nar https://mvnrepository.com/artifact/org.apache.nifi/nifi-record-serialization-services-nar

vafs · ‎10-14-2025

Hello @AlokKumar, Thanks for using Cloudera Community. As I understand, what you need is to add one more step in your flow: HandleHttpRequest-> MergeContent -> ExecuteScript (Groovy)-> HandleHttpResponse Since you have JSON fields and files, you're getting multiple FlowFiles. So this extra MergeContent phase will combine the JSON and the file into a single FlowFile On the MergeContent, set Merge Strategy as “Defragment” and set Correlation Attribute Name as http.request.id. that is unique from each HandleHttpRequest

vafs · ‎10-06-2025

Hello @Brenda99, The question is very wide, there are many things that can help to improve the performance. Some basic recomendations are documented here: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/tuning-spark/topics/spark-admin_spark_tuning.html Take a look on the documentation, that could help you. Also, it will worth to talk with the team in charge of your account to found deeper performance tuning analysis.

vafs · ‎09-18-2025

Yes, you're right. Looks like Java Kerberos makes the applications to not always have an application name that we can use here. I was reading about other option that makes the processes to fallback from one to another enctype. But that will need to have "allow_weak_crypto = true" and as you mentioned that is not possible in your scenario. Not sure if what you need is possible somehow.

vafs · ‎09-15-2025

Hello @asand3r, Glad to see you on the community. Directly on NiFi you cannot specify the those encryptions per processor. What comes to my mind is to configure per realm user, this should work. In the krb5.conf you can tell specifically for each realm user, something like this: [appdefaults] hdfs = { default_tgs_enctypes = arcfour-hmac-md5 aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 permitted_enctypes = arcfour-hmac-md5 aes256-cts-hmac-sha1-96 aes128-cts-hmac-sha1-96 } This will target any application using a principal with 'hdfs' in its name. You may need to be more specific in some cases, for example, using the full principal name. In your NiFi HDFS processors, you'll need to set the Kerberos Principal property to a value that matches the [appdefaults] section.

vafs · ‎09-15-2025

Hello @Jack_sparrow That should be possible. You don't need to manually specify partitions or HDFS paths; Spark handles this automatically when you use a DataFrameReader. First, you will need to read the source table using "spark.read.table()". Since table is a Hive partitioned table, Spark will automatically discover and read all 100 partitions in parallel, as long as you have enough executors and cores available. Then, Spark creates a logical plan to read the data. Repartition the data is next, To ensure you have exactly 10 output partitions and to control the parallelism for the write operation, you can use the "repartition(10)" method. This will shuffle the data to create 10 new partitions, which will be processed by 10 different tasks. And then, write the table. Use "write.saveAsTable()". You must specify the format using ".format("parquet")."

Online	Online
Last Visited	‎12-03-2025 10:32 AM

Member Since	‎08-08-2024 10:35 AM
Last Visited	‎12-03-2025 10:32 AM
Posts	55
Kudos received	9

Cloudera Community

Re: Cloudera Flow Management - Kubernetes Operator...

Re: Migrating from NiFi 1.21 to 2.6.0 – InvokeGRPC...

Re: How to run spark df.write inside UDF called in...

Re: How to user attribute in Listfile

Re: Submit job spark standalone emit error kerbero...

Re: How to avoid deadlocks in Nifi when put someth...

Re: NiFi 2.5.0 missing parquet integration

Re: Error connecting to NiFi Registry from NiFi UI...

Re: NiFi 2.5.0 missing parquet integration

Re: How to handle a json body and file in the same...

Re: Need Cloudera CDP Consultant - Spark Job Perfo...

Re: NiFi: how to select specific Kerberos encrypti...

Re: NiFi: how to select specific Kerberos encrypti...

Re: Spark optimum solution