Member since
08-08-2024
44
Posts
2
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
225 | 09-08-2025 01:12 PM | |
346 | 08-22-2025 03:01 PM |
10-18-2025
01:37 PM
Hello @donaldo71, This looks like SQL is getting the deadlock because of the many records in once. You can try couple of this. First, enable the retry option on the PutSql or PutdatabaseRacord processors. If the retries helped previously, this could also help in your case. Also, decrease the concurrency and batch sizes to try to decrease the SQL load. Additionally, on the SQL side, if you can, use row versioning isolation to reduce locking: ALTER DATABASE DBNAME SET READ_COMMITTED_SNAPSHOT ON;
... View more
10-16-2025
08:57 PM
Thank you for the quick reply @vafs ! As I have mentioned above, the truststore that I've referenced belongs to the NiFi Registry. I had copied the truststore from my NiFi Registry instance into my NiFi instance to eliminate any certificate mismatches. But yet it didn't work. I however tried reimporting the ca certificate into the truststore as you suggested, but I still encounter the same issue while trying to version my flows nifi@nifikop-0-node:/opt/nifi/nifi-current$ keytool -importcert -alias 3SCDemo-CA -file /tmp/ca-cert.pem -keystore /tmp/nifi-registry-truststore.jks Enter keystore password: Certificate already exists in keystore under alias <ca-cert> Do you still want to add it? [no]: yes Certificate was added to keystore nifi@nifikop-0-node:/opt/nifi/nifi-current$
... View more
10-16-2025
09:42 AM
Understood. Hopefully the missing NAR's pointed on the previous update help you figure the issue.
... View more
10-14-2025
11:41 AM
Hello @AlokKumar, Thanks for using Cloudera Community. As I understand, what you need is to add one more step in your flow: HandleHttpRequest-> MergeContent -> ExecuteScript (Groovy)-> HandleHttpResponse Since you have JSON fields and files, you're getting multiple FlowFiles. So this extra MergeContent phase will combine the JSON and the file into a single FlowFile On the MergeContent, set Merge Strategy as “Defragment” and set Correlation Attribute Name as http.request.id. that is unique from each HandleHttpRequest
... View more
10-06-2025
01:18 PM
Hello @Brenda99, The question is very wide, there are many things that can help to improve the performance. Some basic recomendations are documented here: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/tuning-spark/topics/spark-admin_spark_tuning.html Take a look on the documentation, that could help you. Also, it will worth to talk with the team in charge of your account to found deeper performance tuning analysis.
... View more
09-21-2025
02:55 AM
I guess, that my problem has not solution from NiFi side and we just need to correct HDFS settings to accept other encryption types in addition to arcfour-hmac-md5.
... View more
09-15-2025
09:44 PM
Hello @Jack_sparrow That should be possible. You don't need to manually specify partitions or HDFS paths; Spark handles this automatically when you use a DataFrameReader. First, you will need to read the source table using "spark.read.table()". Since table is a Hive partitioned table, Spark will automatically discover and read all 100 partitions in parallel, as long as you have enough executors and cores available. Then, Spark creates a logical plan to read the data. Repartition the data is next, To ensure you have exactly 10 output partitions and to control the parallelism for the write operation, you can use the "repartition(10)" method. This will shuffle the data to create 10 new partitions, which will be processed by 10 different tasks. And then, write the table. Use "write.saveAsTable()". You must specify the format using ".format("parquet")."
... View more
09-15-2025
10:55 AM
Thank you for replying, that's the exact solution I eventually settled on. Best, Shelly
... View more
09-11-2025
03:03 PM
I create a resource as a file, because the python-env resources are specifically for managing Python packages in requirements.txt, according to the documentation. Thanks!
... View more
09-11-2025
11:04 AM
Hello @Jack_sparrow, Spark should automatically do it, you can control that with these settings: Input splits are controlled by spark.sql.files.maxPartitionBytes (default 128MB). If smaller, more splits or parallel tasks will be executed. spark.sql.files.openCostInBytes (default 4MB) influences how Spark coalesces small files. Shuffle parallelism spark.sql.shuffle.partitions (default 200). Configiure around 2–3 times per total executor cores. Also, make sure df.write.parquet() doesn’t set everything into few files only. For that, you can use .repartition(n) to increase the parallelism before writing.
... View more