About vafs

vafs · ‎11-04-2025

Hello @enguias, Sometimes this error could be for an old algorithm such as SHA-1. And most possible if using Java 11 that is more strict with the policies. One option is to use connection without TLS, if possible and acceptable in your environment: jdbc:sqlserver://<host>:<port>;databaseName=<DB>;encrypt=disable Or, update the certificate to SHA-256 with TLS 1.2. This is the best option if you prefer to maintain the encryption. Maybe this documentation from SQL can help too: https://learn.microsoft.com/en-us/sql/database-engine/configure-windows/configure-sql-server-encryption?view=sql-server-ver17

vafs · ‎10-18-2025

Hello @donaldo71, This looks like SQL is getting the deadlock because of the many records in once. You can try couple of this. First, enable the retry option on the PutSql or PutdatabaseRacord processors. If the retries helped previously, this could also help in your case. Also, decrease the concurrency and batch sizes to try to decrease the SQL load. Additionally, on the SQL side, if you can, use row versioning isolation to reduce locking: ALTER DATABASE DBNAME SET READ_COMMITTED_SNAPSHOT ON;

vafs · ‎10-16-2025

Understood. Hopefully the missing NAR's pointed on the previous update help you figure the issue.

vafs · ‎10-14-2025

Hello @AlokKumar, Thanks for using Cloudera Community. As I understand, what you need is to add one more step in your flow: HandleHttpRequest-> MergeContent -> ExecuteScript (Groovy)-> HandleHttpResponse Since you have JSON fields and files, you're getting multiple FlowFiles. So this extra MergeContent phase will combine the JSON and the file into a single FlowFile On the MergeContent, set Merge Strategy as “Defragment” and set Correlation Attribute Name as http.request.id. that is unique from each HandleHttpRequest

vafs · ‎10-06-2025

Hello @Brenda99, The question is very wide, there are many things that can help to improve the performance. Some basic recomendations are documented here: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/tuning-spark/topics/spark-admin_spark_tuning.html Take a look on the documentation, that could help you. Also, it will worth to talk with the team in charge of your account to found deeper performance tuning analysis.

asand3r · ‎09-21-2025

I guess, that my problem has not solution from NiFi side and we just need to correct HDFS settings to accept other encryption types in addition to arcfour-hmac-md5.

vafs · ‎09-15-2025

Hello @Jack_sparrow That should be possible. You don't need to manually specify partitions or HDFS paths; Spark handles this automatically when you use a DataFrameReader. First, you will need to read the source table using "spark.read.table()". Since table is a Hive partitioned table, Spark will automatically discover and read all 100 partitions in parallel, as long as you have enough executors and cores available. Then, Spark creates a logical plan to read the data. Repartition the data is next, To ensure you have exactly 10 output partitions and to control the parallelism for the write operation, you can use the "repartition(10)" method. This will shuffle the data to create 10 new partitions, which will be processed by 10 different tasks. And then, write the table. Use "write.saveAsTable()". You must specify the format using ".format("parquet")."

ShellyIsGolden · ‎09-15-2025

Thank you for replying, that's the exact solution I eventually settled on. Best, Shelly

ariajesus · ‎09-11-2025

I create a resource as a file, because the python-env resources are specifically for managing Python packages in requirements.txt, according to the documentation. Thanks!

vafs · ‎09-11-2025

Hello @Jack_sparrow, Spark should automatically do it, you can control that with these settings: Input splits are controlled by spark.sql.files.maxPartitionBytes (default 128MB). If smaller, more splits or parallel tasks will be executed. spark.sql.files.openCostInBytes (default 4MB) influences how Spark coalesces small files. Shuffle parallelism spark.sql.shuffle.partitions (default 200). Configiure around 2–3 times per total executor cores. Also, make sure df.write.parquet() doesn’t set everything into few files only. For that, you can use .repartition(n) to increase the parallelism before writing.

Online	Offline
Last Visited	‎01-12-2026 08:24 AM

Member Since	‎08-08-2024 10:35 AM
Last Visited	‎01-12-2026 08:24 AM
Posts	68
Kudos received	12

Cloudera Community

Re: Apache Nifi first access failed

Re: Cloudera Flow Management - Kubernetes Operator...

Re: Migrating from NiFi 1.21 to 2.6.0 – InvokeGRPC...

Re: How to run spark df.write inside UDF called in...

Re: How to user attribute in Listfile

Re: DBCPConnectionPool 1.28.1 : Cannot create Pool...

Re: How to avoid deadlocks in Nifi when put someth...

Re: NiFi 2.5.0 missing parquet integration

Re: How to handle a json body and file in the same...

Re: Need Cloudera CDP Consultant - Spark Job Perfo...

Re: NiFi: how to select specific Kerberos encrypti...

Re: Spark optimum solution

Re: Nifi DBCPConnectionPool service not setting s...

Re: Import custom files in dag-file Airflow jobs C...

Re: PySpark Queries