Member since
06-24-2021
7
Posts
0
Kudos Received
0
Solutions
11-08-2022
07:47 AM
Hi I have the next JSON { "model": "2002",
"cars_drivers" : [ {
"id" : "1",
"name" : "Mick",
"is_processed" : 0
}, {
"id" : "2",
"model" : "John",
"is_processed" : 0
}} and I have 2 attributes attr_id = 1 attr_name = Mick I want to update is_processed in cars_drivers to 1 if the id = attr_id and name = attr_name So the expected result will be { "model": "2002",
"cars_drivers" : [ {
"id" : "1",
"name" : "Mick",
"is_processed" : 1
}, {
"id" : "2",
"model" : "John",
"is_processed" : 0
}}
... View more
Labels:
- Labels:
-
Apache NiFi
08-10-2022
11:56 AM
Hi I have a flow file with text content I want to remove the last line out of this content Input example :- Aaa Bbb Ccc Footer Expected output:- Aaa Bbb Ccc How I can accomplish this?
... View more
Labels:
- Labels:
-
Apache NiFi
07-28-2022
09:03 AM
Hi I have a flow file with huge string content and I want to create a JSON and add this string to the JSON as a value with a static key. Example Flow file string content: "Example string" Needed output :{"Static key":"Example string"} I tried to add the string content to the flow file attribute using ExtractText and use attribute to JSON processor to generate the needed output but I have 2 issues with this approach:- 1- The string content size is not under my control and I dont know how much I need to configure the Maximum Buffer Size and Maximum Capture Group Length. 2- Adding the content to the flow file attribute will duplicate the flow file size and this will impact the performance. 3- Finally I feel it is a workaround solution :)) Is there any other way to solve this issue?
... View more
Labels:
- Labels:
-
Apache NiFi
08-21-2021
09:16 AM
Hello I have a spark cluster on CHD 6.3.3 1 master node and 3 worker nodes I will read huge data from external RDBMS”mssql” using jdbc driver And I need to open the port in the firewall My external RDBMS port is 99766 But my problem is what port should I open from the spark side for each node so I can read and write data using the master and all workers in my spark application. Thank you
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
06-24-2021
06:18 PM
Hello I want to build batch-based ETL from RDBMS "SQL Server" Using apache spark . My Spark cluster is running part of the Cloudera Application. My question is Where should I store the ETL job watermark for example the maximum TIMESTAMP so the next job will get the records which have a bigger timestamp in the next batch run? Should I use a Hive table Or there is a better approach to store this data so it can be used in the next jobs?
... View more
Labels: