Member since
02-01-2022
270
Posts
96
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2214 | 06-12-2024 06:43 AM | |
3384 | 04-12-2024 06:05 AM | |
2287 | 12-07-2023 04:50 AM | |
1363 | 12-05-2023 06:22 AM | |
2319 | 11-28-2023 10:54 AM |
05-17-2022
12:34 PM
@joshtheflame CDP Private Cloud Base, for on prem, is able to be deployed on openshift kubernetes. CDP Public Cloud, in Aws, Azure, or GCP is fully kubernetes deployed in the respective cloud kubernetes platforms. CDP is Hybrid and Multi-Cloud capable as well. Check out CDP Private Cloud Base: https://docs.cloudera.com/data-warehouse/1.3.1/openshift-environments/topics/dw-private-cloud-openshift-environments-overview.html and CDP Public Cloud: https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html
... View more
05-16-2022
10:43 AM
@IsaacKamalesh yes. A support request will need to be issued to get the correct networking allowed.
... View more
05-03-2022
12:35 PM
@Ghilani I believe the solution here is to use record based processors with a JSON RECORD Reader with the schema matching _ names, and a JSON RECORD Writer with schema matching the new names.
... View more
04-28-2022
05:01 AM
Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence). Hope this helps, Steven
... View more
04-28-2022
04:54 AM
@Rohan44 maybe you have a typo above, but try: SELECT COUNT(*) FROM FLOWFILE
... View more
04-28-2022
04:51 AM
@BlueShangai First, HDP is very out dated, i recommend alternative and newer toolsets. That said, in my previous experience with this sandbox, even a 16gb machine struggles to deploy the entire cluster. If it does, there will be stability issues with 1 or more components. When i used a 32gb machine, the stack come sup and is much better stable sandbox. However, this HDP Sandbox is a gaint stack of services that is designed for many machines, so use something like this with caution.
... View more
04-26-2022
06:24 AM
2 Kudos
@jonay__reyes I think by default you will see the result you are expecting, however, the expected limit of 5 concurrent connections may be a challenge. Let's address your questions first: Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it? - 1 proc w/ 5 concurrent tasks, will provide what is in effect 5 instance copies and they can run more than 5 requests each if there are ample flowfiles queued up. So, NO. For your use case, i would recommend that you set it to 1, and control the # of flowfiles upstream. Will the processor wait for the remote endpoint's request before sending the next one? YES if concurrent task set to 1. NO, if set higher (2+) they will execute in parallel How does the "Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec). Run Schedule sets how long a process will operate before a new instance is necessary. If the request/response times are low, this setting will allow you to push more data through each instance without creating separate processes for each. If the request/response time is high, you can use this to help with long execution. Experiment carefully here. I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it? Correct, there is no reason to do this, avoid duplicating processors For concurrent tasks and run schedule adjustments, you should always experiment in small increments, changing one setting at a time, evaluating, and repeating until you find the right balance. I suspect that you will not need 5 long executing request/responses in parallel, and that even with default settings, your queued flowfiles will execute fast enough to appear "simultaneous".
... View more
04-26-2022
06:06 AM
@Tokolosk I would recommend that you provide java's cacerts as the keystore/truststore with the default password (changeit). This will work for most public certs. If you have a custom cert at the SFTP end you will need to import that cert and create your own keystore/truststore. I always try cacerts first.
... View more
04-25-2022
01:31 PM
@Tra The recommend path forward here is to use a JSON Record Reader and JSON Record Writer with schema matching your source (reader) and downstream needs. You want to avoid splitjson and regex matching flow file contents.
... View more
04-08-2022
07:10 AM
@mala_etl Well I cant do all of them, that would be to your values, not mine, but to achieve the move, you need to set: Completion Strategy: Move File Move Destination Directory: The directory to move file to Create Directory: enable true/ disable false Be sure to check the ? for each property, it will explain everything.
... View more