Member since
02-01-2022
270
Posts
96
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2051 | 06-12-2024 06:43 AM | |
2940 | 04-12-2024 06:05 AM | |
2082 | 12-07-2023 04:50 AM | |
1241 | 12-05-2023 06:22 AM | |
2159 | 11-28-2023 10:54 AM |
05-24-2022
06:02 AM
1 Kudo
@FediMannoubi Below is a basic approach to solve. Assuming both postgres tables are populated with rows per your example, your nifi flow would need to get the CSV (various ways to do that), once the contents of the csv are in a flowfile (i use GenerateFlowFile processor), you can use a RecordReader based processor to read the csv. This will allow you to write SQL against the flowfile with QueryRecord to get a single value. For example: SELECT city_name FROM FLOWFILE Next, in your flow you will need to get the city_name value into an attribute, i use EvaluateJsonPath. After that a ExecuteSQL processor and associated DBCP Connection pool to postgres. Then in ExecuteSQL your query is SELECT city_id FROM CITY WHERE city_name=${city_name} At the end of this flow you will have the city_name from csv, and city_id from postgres. You can now combine or use the further downstream to suit your needs. INSERT is done similarly, once you have the data in flowfiles, or attributes, using the same ExecuteSQL you write an insert instead. My test flow looks like this, but forgive the end, as I did not actually have a postgres database setup. You can find this sample flow [here]. I hope this gets you pointed in the right direction for reading csv and querying data from database.
... View more
05-17-2022
02:43 PM
@joshtheflame
I just wanted to provide a bit more context. The partial page shot you've included above appears to show Cloudera Manager running against a CDH 6.1.0 cluster. CDH 6.1.0 was released in December of 2018. As you no doubt are aware, that was quite a while ago, especially in terms of "internet time". Hopefully you are aware that CDH 6.1.x has reached its End of Support (EoS). You can find the most recent official reminder of the previous announcement that Cloudera Enterprise 6.x reached End of Support (EoS) in 2021 here:
March 2021 Customer Advisory - 2: End of Support for Cloudera Products (CDH/CM 6.x & HDP 3.x).
Cloudera's lifecycle support policies are documented here:
https://www.cloudera.com/legal/policies/support-lifecycle-policy.html
My understanding is that organizations with a valid Cloudera subscription for legacy products such as CDH would have been sent this announcement directly.
If that screenshot represents what your bank is running in production, I would recommend that you reach out to your Cloudera Account team and discuss your upgrade options as soon as possible. You are going to have to upgrade in order to take advantage of any of the offerings mentioned in @steven-matison 's reply earlier.
... View more
05-17-2022
12:35 PM
Nice one sir!
... View more
05-06-2022
11:50 AM
@Ghilani While I agree that using record based processors so you can work with single FlowFiles with multiple records in them to make more efficient dataflows, what you are doing here should be possible with a ReplaceText processor in the interim using "Literal Replace": Here we are searching for the pattern _" and replacing it with just ". If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post. Thank you, Matt
... View more
04-28-2022
04:54 AM
@Rohan44 maybe you have a typo above, but try: SELECT COUNT(*) FROM FLOWFILE
... View more
04-28-2022
04:51 AM
@BlueShangai First, HDP is very out dated, i recommend alternative and newer toolsets. That said, in my previous experience with this sandbox, even a 16gb machine struggles to deploy the entire cluster. If it does, there will be stability issues with 1 or more components. When i used a 32gb machine, the stack come sup and is much better stable sandbox. However, this HDP Sandbox is a gaint stack of services that is designed for many machines, so use something like this with caution.
... View more
04-26-2022
06:06 AM
@Tokolosk I would recommend that you provide java's cacerts as the keystore/truststore with the default password (changeit). This will work for most public certs. If you have a custom cert at the SFTP end you will need to import that cert and create your own keystore/truststore. I always try cacerts first.
... View more
04-25-2022
01:31 PM
@Tra The recommend path forward here is to use a JSON Record Reader and JSON Record Writer with schema matching your source (reader) and downstream needs. You want to avoid splitjson and regex matching flow file contents.
... View more
04-08-2022
08:10 AM
1 Kudo
Hi All, I was able to get my script tested on my 10 nodes DEV cluster. Below are the results: 1. All HDP core services started / stopped okay 2. None of Hive Service Interactive service started and hence, Hive service was not marked as STARTED though HMS and HS2 were started okay 3. None of the Spark2_THRIFTSERVER was started Any one can share some thoughts on points 2 and 3? Thanks snm1523
... View more
04-08-2022
07:10 AM
@mala_etl Well I cant do all of them, that would be to your values, not mine, but to achieve the move, you need to set: Completion Strategy: Move File Move Destination Directory: The directory to move file to Create Directory: enable true/ disable false Be sure to check the ? for each property, it will explain everything.
... View more
- « Previous
- Next »