Member since
07-19-2018
613
Posts
101
Kudos Received
117
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4901 | 01-11-2021 05:54 AM | |
3337 | 01-11-2021 05:52 AM | |
8645 | 01-08-2021 05:23 AM | |
8158 | 01-04-2021 04:08 AM | |
36039 | 12-18-2020 05:42 AM |
09-08-2020
12:26 AM
1 Kudo
It sounds like your testing solution is exceeding the inbound capabilities of the flow tuning (nifi config, processor/queue config) Correct assessment. It has showed that the pipeline was not properly sized for the amount of data, which lead to a back-pressure in the ingest component
... View more
09-04-2020
06:25 AM
@P_Rat98 You need parquet tools to read parquet files from command line. There is no method to view parquet in nifi. https://pypi.org/project/parquet-tools/
... View more
09-04-2020
06:20 AM
@DanMcCray1 Once you have the content from Kafka as a flowfile, your options are not just limited to ExecuteScript. Depending on the type of content you can use the following ideas: EvaluateJsonPath - if the content is a single json, and you need one or more values inside the object then this is an easy way to get those values to attributes. ExtractText - if the content is text or some raw format, extractText allows you to regex match against the content to get values to attributes. QueryRecord w/ Record Readers & Record Writer - this is the most recommended method. Assuming your data has structure (text,csv,json,etc) and/or multiple rows/objects you can define a reader, with schema, output format (record writer), and query the results very effectively. If you indeed want to work with Execute Script you should start here: https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/248922 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-2/ta-p/249018 https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-3/ta-p/249148 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
09-02-2020
11:24 PM
Hi Everyone, sorry about the confusion. It was late and I was actually looking at the wrong flow file output: i.e. looking at the top one on the list (oldest) instead of the bottom one on the list (newest). @stevenmatison thank you for your reply and effort in making a template.
... View more
09-01-2020
09:16 AM
@stevenmatison Thanks for your answer. As my tables are relatively small and only used to duplicate existing data - is there any way to remove the existing folders before importing new data? regards
... View more
08-28-2020
08:52 AM
@P_Rat98 The error above is saying there is an issue with the Schema Name in your record reader or writer. When inside the properties for Convert Record, click the --> arrow through to the reader/writer and make sure they are configured correctly. You will need to provide the correct schema name (if it is already an existing attribute) or provide the schema text. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-28-2020
08:47 AM
@P_Rat98 You need to set the filename (Object Key) of each parquet file uniquely to save different S3 files. If that processor is configure to just ${filename} then it will over write additional executions. For the second option, if you have split in your data flow, the split parts should have key/value pair for the split and total splits. Inspect your queue and list attributes on split flowfiles for these attributes. You use these attributes with MergeContent to remerge everything back together into a single flowfile. You need to do this before converting to parquet, not after. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more
08-27-2020
08:22 AM
1 Kudo
Oh, that's great. Thanks for your response. That clarifies my question.
... View more
08-27-2020
07:04 AM
I'd recommend these customers work with their account team to plan their CDP Journey. I've dug into a number of customers facing this and found strategies for migrating/upgrading them to either public cloud, on-prem or the recent released private cloud offering.
... View more
08-27-2020
06:24 AM
@derisrayan Your question is impossible to answer without very detailed inspection of the following items: NiFi Cluster Size (# of nodes) and Spec of each Node (CPU/RAM/Disk) The size of the data processing per flowfile The number of pieces of the data arriving per execution of the flow After the above, the configuration of the data flow for concurrency and parallelism is tuned to what NiFi Cluster performance capabilities. This comes down to Total NiFi Nodes, Total Cores, the configuration and how many active threads the NiFi Cluster can handle. With a nicely configured NiFi cluster (3+ nodes) with as much ram and cores as possible, the transactions will be quite impressive. Scaling to 5-10-15+ nodes will increase this to an impressive production ready scale. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven @ DFHZ
... View more