Member since
08-08-2024
55
Posts
9
Kudos Received
4
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 40 | 12-02-2025 08:17 AM | |
| 162 | 11-27-2025 10:02 AM | |
| 634 | 09-08-2025 01:12 PM | |
| 695 | 08-22-2025 03:01 PM |
09-11-2025
04:33 PM
Hello @ShellyIsGolden, Glad to see you in our community. Welcome! ChatGPT was not that wrong (😆), in fact that makes sense. The PostgreSQL documentation refers that method as possible: String url = "jdbc:postgresql://localhost:5432/postgres?options=-c%20search_path=test,public,pg_catalog%20-c%20statement_timeout=90000"; https://jdbc.postgresql.org/documentation/use/#connection-parameters Have you tested the JDBC connection outside of NiFi? Maybe with psql command like this: psql -d postgresql://myurl:5432/mydatabase?options=-c%20search_path=myschema,public&stringtype=unspecified Also, check with your PG team to see if that connect string is possible and test more on that side.
... View more
09-11-2025
02:37 PM
Hello @ariajesus, Welcome to our community. Glad to see you here. How did you create the resource? As a File Resource or as a Python Environment? Here are the steps how you can create it: https://docs.cloudera.com/data-engineering/1.5.4/use-resources/topics/cde-create-python-virtual-env.html
... View more
09-11-2025
11:04 AM
Hello @Jack_sparrow, Spark should automatically do it, you can control that with these settings: Input splits are controlled by spark.sql.files.maxPartitionBytes (default 128MB). If smaller, more splits or parallel tasks will be executed. spark.sql.files.openCostInBytes (default 4MB) influences how Spark coalesces small files. Shuffle parallelism spark.sql.shuffle.partitions (default 200). Configiure around 2–3 times per total executor cores. Also, make sure df.write.parquet() doesn’t set everything into few files only. For that, you can use .repartition(n) to increase the parallelism before writing.
... View more
09-09-2025
09:26 AM
Hello @Jack_sparrow, Glad to see you again in the forums. 1. The resource allocation is something kind of complicated to tell, because it depends in a lot of factors. Take in mind how big is your data, how much memory you have on the cluster, do not forget the overhead and other things. There is very useful information here: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/tuning-spark/topics/spark-admin-tuning-resource-allocation.html Under that parent section there is more tuning suggestions on each topic. 2. From the second option I understand that you want to read the data separately using each type. That should be possible with something like this: if input_path.endswith(".parquet"): df = spark.read.parquet(input_path) elif input_path.endswith(".orc"): df = spark.read.orc(input_path) elif input_path.endswith(".txt") or input_path.endswith(".csv"): df = spark.read.text(input_path) # o .csv con opciones else: raise Exception("Unsupported file format") Then, you can handle each data in a separate way. 3. The data movement should avoid going to the driver, to avoid issues and extra work, so collect() or .toPandas() are not the best options. If you want to move data without transformations, distcp should be a good option. To write you can use this: df.write.mode("overwrite").parquet("ofs://ozone/path/out") And other suggestions can be tuning the partitions with "spark.sql.files.maxPartitionBytes" and change the compression to snappy using "spark.sql.parquet.compression.codec".
... View more
09-08-2025
01:12 PM
1 Kudo
Hello @Jack_sparrow, Glad to see you on the Community. As far as I know, df.write is not possible to be used on an rdd.foreach or rdd.foreachpartition. The reason is because df.write is a driver-side action, it triggers a Spark job. rdd.foreach or rdd.foreachpartition are executors, and executors cannot trigger jobs. Check these references: https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-foreachpartition-foreach https://stackoverflow.com/questions/46964250/nullpointerexception-creating-dataset-dataframe-inside-foreachpartition-foreach https://sparkbyexamples.com/spark/spark-foreachpartition-vs-foreach-explained The option that looks like it works for you is this: df.write.partitionBy Something like this: df.write.partitionBy("someColumn").parquet("/path/out")
... View more
08-27-2025
01:26 PM
Hi @MattWho, I think you tagged the wrong person. @yoonli, take a look on @MattWho update.
... View more
08-26-2025
11:06 AM
Hello @yoonli, I was checking the configuration and comparing it with other threads and it looks fine to me. Now, I was checking that users.xml and authorizations.xml cannot already exist to be created. You will need to stop the NiFi and then rename those files: cp conf/authorizations.xml conf/authorizations.xml.backup cp conf/users.xml conf/users.xml.backup Then you can retry. Also, it will worth to check this thread as well, that contains a lot of information on this same issue: https://community.cloudera.com/t5/Support-Questions/Untrusted-proxy-error-Authentication-Failed-o-a-n-w-s/m-p/399540
... View more
08-22-2025
03:01 PM
Hello @HoangNguyen, If I do not understand wrong, what you want is not possible. ListFile does not support incoming FlowFile as an source. To do that, you will need to use variable registry. Look here: Display Name: Input Directory API Name: Input Directory Description: The input directory from which files to pull files Supports Expression Language: true (will be evaluated using variable registry only) https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ListFile/index.html Looking on that, FetchFile will do what you need: Display Name: File to Fetch API Name: File to Fetch Default Value: ${absolute.path}/${filename} Description: The fully-qualified filename of the file to fetch from the file system Supports Expression Language: true (will be evaluated using flow file attributes and variable registry) https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.FetchFile/index.html
... View more
08-22-2025
01:50 PM
Hello @Amry, Strange, you should be able to see something like this: Do you have that button? Can you confirm your CFM version so I can take a look on a version like that as well?
... View more
08-21-2025
09:28 AM
1 Kudo
Hello @MoJadallah, Sorry for no one answering so far. We do want to give you a good experience on your trial. I assume you requested your free trial from here: https://www.cloudera.com/products/cloudera-public-cloud-trial.html?internal_keyplay=ALL&internal_campaign=FY25-Q1-GLOBAL-CDP-5-Day-Trial&cid=FY25-Q1-GLOBAL-CDP-5-Day-Trial&internal_link=WWW-Nav-u01 When was that error hit? Just after the sign-up?
... View more
- « Previous
- Next »