Member since
01-07-2019
220
Posts
23
Kudos Received
30
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11999 | 08-19-2021 05:45 AM | |
| 3153 | 08-04-2021 05:59 AM | |
| 1543 | 07-22-2021 08:09 AM | |
| 6037 | 07-22-2021 08:01 AM | |
| 5495 | 07-22-2021 07:32 AM |
08-13-2020
04:53 AM
@ang_coder Depending on the number of unique values you need to add, updateAttribute + expression language will allow you to create flowfile attribute based on the table results in a manner I would call "manually". These can be used in routing or further manipulating the content (original database rows) according to your match logic. For example with ReplaceText you can replace the original value with the original value + the new value. Additionally during your flow you can programmatically change the results of the content of the flowfile to add the new column using the attribute from above, or with a fabricated query. In the latter example you would use a RecordReader/RecordWriter/UpdateRecord on your data. In a nutshell you create a translation on the content that includes adding the new field. This is a common use case for nifi and there are many different ways to achieve it. To have a more complete reply that better matches your use case, you should provide more information, sample input data, the expected output data, your flow, a template of your flow, and maybe what you have tried already. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-13-2020
04:44 AM
This message is labeled NiFi, so I assume you have NiFi available? In that case, look at finding the right processor for the job, something like ExecuteSQL may be a good starting point. ---- If your question is purely about how to make python and mariaDB interact, this may not be the best place to ask it.
... View more
08-12-2020
01:58 PM
The ListSFTP processor does not actually do anything with the file, it just builds a list of files that exist. Typically this would then feed into a GetSFTP processor. In the GetSFTP processor you can configure whether the original should be deleted, by default this would indeed happen. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.GetSFTP/index.html
... View more
07-29-2020
02:27 PM
Thanks, will think on refining the distinction between kudu and druid. Currently i would not want to include the fact that flink has state as 'storage', but regarding flink SQL, i may actually make another post later to talk about the way to interact with/access different kinds of data. (As someone also noticed, impala is also not here because it is not a store in itself, but works with stored data).
... View more
07-25-2020
01:52 AM
There will be multiple form factors available in the future, for now, I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with both Impala and Kudu. (The answer still works if all are on the same datahub).
Prerequisites
Data Hubs with NiFi and Impala+Kudu
Permission to access these (e.g. add a processor, create table)
Know your Workload User Name (CDP Management Console > Your name (bottom left) > Profile)
You should have set your Workload Password in the same location
Steps to write data from NiFi to Kudu in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kudu Data Hub Cluster:
Gather the FQDN links of the brokers and the used ports. Go to the Data Hub > Click Kudu Master > Click Masters.
Combine the RPC addresses together in the following format: host1:port,host2:port,host3:port
Example:
```master1fqdn.abc:7051,master2fqdn.abc:7051,master3fqdn.abc:7051```
In HUE on Impala/Kudu Data Hub Cluster:
Run the following query to create the kudu table (the little triangle makes it run):
`CREATE TABLE default.micro (id BIGINT, name STRING, PRIMARY KEY(id)) STORED AS KUDU;`
In NiFi GUI:
Ensure that you have some data in NiFi to that fits in the table. Use the `GenerateFlowFile` processor.
In Properties, configure the Custom Text to contain the following (copy carefully or use shift+enter for newline): id, name
1,dennis
Select the PutKudu processor, configure it as the following: - Settings
- Automatically terminate relationships: Tick both success and faillure
- Scheduling
- 1 sec
- Properties
- Kudu Masters: The combined list we created earlier
- Table Name: impala::default.micro
- Kerberos Principal: your Workload User Name (see prerequisites above)
- Kerberos Password: your Workload Password
- Record Reader: Create new service>CSV Reader
- Kudu Operation Type: UPSERT
Right-click the Nifi Canvas> Configure > The little cogwheel bolt of CSV Reader, set the following property and then apply: Treat First Line as Header: True
Click the little lightning bolt of CSV Reader > Enable
On the canvas connect your GenerateFlowFile processor to your PutKudu processor and start the flow.
You should now be able to select your table through HUE and see that a single record has been added: `select * from default.micro`
These are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation.
Potential refinements
A describe extended also exposes the hostnames. For me, this also worked, but I am unsure how safe it is not to explicitly define the ports.
... View more
Labels:
07-23-2020
01:49 PM
1 Kudo
The Cloudera Data Platform (CDP) comes with a wide variety of tools that move data, these are the same in any cloud as well as on-premises. Though there is no formal decision tree, I will summarize the key considerations from my personal perspective. In short, it can be visualized like this: Steps for finding the right tool to move your data Staying within Hive and SQL queries suffice? > Hive otherwise No complex operations (e.g. joins) > Nifi otherwise Batch > Spark otherwise Already have Kafka Streams/Spark Streaming in use? > Kakfa Streams/Spark Streaming otherwise Flink Some notes: If you can use Nifi or a more complex solution, use Nifi Use Flink as your streaming engine, unless there is a good reason not to. It is the latest generation of streaming engines. Currently, I do not recommend using Flink for batch processing yet, but that will likely soon change I did not include tools like Impala, Hbase/Phoenix, Druid as their main purpose is accessing data This is a basic decision tree, it should cover most situations but do not hesitate to deviate if your situations ask for this Also see my related article: Choose the right place to store your data Full Disclosure & Disclaimer: I am an Employee of Cloudera, but this is not part of the formal documentation of the Cloudera Data platform. It is purely based on my own experience of advising people in their choice of tooling.
... View more
06-29-2020
02:45 PM
Too cool man, great work!
... View more
02-05-2020
05:38 AM
the issue is resolved now. I have a cluster of 12 nodes. and on some nodes beeline-hs2Connection.xml was not present. putting the file in hive conf directory on each server resolved the issue. thanks Mohit
... View more
02-04-2020
09:18 AM
Hi all, Above solution is failing at one scenario, Scenario: if multiple flow files processed at a time and landed in the nifi queue which is used after update query ( i.e. puthiveql which increment processed_file_cnt by one for every flow file ) processor ,then there might be chances of triggering the next flow multiple times and that is wrong. Because we do select processed_file_cnt first and then doing the comparison for processed_file_cnt with input_file_cnt.
... View more
01-31-2020
08:49 PM
Hi , My assumptions was wrong , putsql processor does execute update query per flowfile
... View more