Member since
11-17-2021
1149
Posts
258
Kudos Received
30
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 182 | 04-23-2026 02:02 PM | |
| 557 | 03-17-2026 05:26 PM | |
| 5243 | 11-05-2025 10:13 AM | |
| 876 | 10-16-2025 02:45 PM | |
| 1462 | 10-06-2025 01:01 PM |
05-12-2023
11:58 AM
Thank you for replying. I have tried so many different ports and I am even trying external zookeeper. None of my nifi's can connect to any port I provide them. It's almost like something is wrong with the code. I have installed ZK on one of my actual nifi servers and that starts up immediately on port 2181. That's what is leading me to think it's something in the code. The other odd thing I keep seeing is, apache.zookeeper.ClientCnxnSocketNetty future isn't success.
... View more
05-11-2023
12:43 AM
Hi @cotopaul Thanks for your reply. I am familiar with Jolt (a json to json transformation library). I have been thinking to add the required padding using the funmctions in Jolt and then use the FreeFormTextRecordSetWriter controller service. This service take the name of the key in Json and prepare the file containing only the value. It also keeps the padding added in the previous Jolt Transform. I think using 10 UpdateAttribute will be tough and I have multiple fields that need the required padding/empty spaces. Thank you for your answers!
... View more
05-04-2023
10:55 AM
Hi @SAMSAL , thanks for your response. You are right, changing to "Entire Text" worked I was in a hurry and didnt try changing the evaluation mode. I supposed that "Always Replace" would do the work. Thank you
... View more
05-04-2023
12:33 AM
1 Kudo
@danielhg1285, While the solution provided by @SAMSAL seems to be better for you and more production ready, you could also try the below things. This might work if you are using a stable statement all the time and if are not restricted to see the exact INSERT Statement but rather see the values trying to be inserted. - Shortly after RetryFlowFile, you can add an AttributesToJSON processor and manually define all the columns which you want to insert in the Attributes List Property. Make sure that you use the attribute name from your FlowFile (sql.args.N.value) in your correct order and you set Destination = flowfile-content. In this way, you will generate a JSON File with all the columns and all the values which you have tried to insert but failed. - After AttributesToJSON, you can keep your PutFile to save your file locally on your machine, hence opening it whenever and wherever you want 🙂 PS: This is maybe not the best solution, due to the following reasons, but it will get you started on your track: - You will need to know how many columns you have to insert and each time a new column will be added you will have to modify your AttributesToJSON processor. - You will not get the exact SQL INSERT/UPDATE Statement, but a JSON File containing the column-value pair, which can easily be analyzed by anybody.
... View more
05-03-2023
10:39 AM
@Abhay_Kumar Welcome to the Cloudera Community! To help you get the best possible solution, I have tagged our Spark experts @Gopinath and @smdas who may be able to assist you further. Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
... View more
05-02-2023
10:29 AM
@acasta Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
... View more
05-02-2023
10:06 AM
@Amit_barnwal Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks
... View more
04-24-2023
07:25 AM
Sorry for late response, I use oozie to submit a spark job
... View more
04-21-2023
03:00 AM
i fix this issue by run command on destination cluster,i think it caused by original version is too old to support ec (hadoop-2.7.5)
... View more
04-20-2023
01:10 AM
Thank you @mszurap for you response . I tried the suggested work around already and it seems like the issue still persists . I agree the table has lot of partitions but I am pretty sure the code times out before 5 mins . I have also tried enforcing the hive-site.xml with the updated timeout which also did not help much. Only thing which worked was adding spark.catalog.recoverPartitions(table) before issuing the drop partition command . I am really not sure as why recovering the partitions in the catalog eliminated the metastore warning . Below is the updated code which is working without any warning : spark.sql.catalog.recoverPartitions(orders) spark.sql("alter table orders drop if exists partition(year=2023)") data.write.mode('Overwrite').parquet(hdfsPath) Any help here in understanding the problem will be much appreciated .
... View more