About DennisJaheruddi

KUnew · ‎01-27-2020

I tried to use regex but it didn't work. Yes the space bt space did really work, I'm thinking to cover other cases by adding space bt & bt space in case they are at the beginning or end of sentence. Thank you for your help, really appreciate it!

Galapow · ‎01-27-2020

Problem is solved The module directory field was incorrect.

VijaySankar · ‎01-23-2020

Will do. Thanks.

MattWho · ‎01-22-2020

@Alexandros You can accomplish this use ReplaceText with a more complex Java regular expression. The Replace Text is designed to replace every occurrence of the string matched by your java regular expression with the replacement value. So you are probably seeing your replacement value inserted into your existing content twice. Try using the following java regular expression which will match your entire 3 lines of content: .*\n.*?(\d{2}.\d{4}).*?\n.*?(\d{2}.\d{4}).* Leave your replacement value as you already have it and make sure you have Evaluation mode still set to Entire text. Hope this helps, Matt

DennisJaheruddi · ‎12-30-2019

What to ask before making Data Flow When taking on a new responsibility for designing and maintaining data flows. What are the main question one should ask to ensure a good outcome? Here I list the key questions for important topics, as well as an illustration of what typically goes wrong if the questions are not asked. The most important points if you are under pressure to deliver Location The questions: Where is the data, where should it go (and where can I process it). And of course: Do I have the required access The Nightmare: Data is spread across multiple systems, one of these may not be identified. After you finally figure out which tables you need you try to start and don’t have access. When you finally get the data you either don’t have a compliant place to put it, or you are missing a tool. Finally you have the data but it is unclear how to get it written to the target. In the end a 3 day job takes 6 weeks. Context The questions: What is the data, and who understands the source/target? The Nightmare: You want to supply revenue data during business hours. First of all you get access to multiple tables, each containing various numbers which might be the revenue. After figuring out which one is the revenue, it turns out you have transactions from multiple timeones in and out of summer time which needs to be solved before moving it into the target application. Finally it turns out the target application needs fields not to be NULL and you have no idea what will happen if you use a wrong default. Process The questions: Who makes the specifications, and accepts the results. How to deal with the situation that the requirements change? (Or as it may be phrased, you did not understand them correctly). How to escalate if you are not put in circumstances where you can succeed? The Nightmare: The requirements are not completely clear. You make something, and get feedback you need to change one thing. After this, you need to change another thing. It is unclear whether these are refinements (from your perspective) or fixes (from their perspective), however when the deadline is not met it is clear where the finger will be pointed. The most important points if you want things to go right Complexity The questions: What exactly should be the output, what exactly needs to be done? The Nightmare: You build a data flow in Nifi, near the end the request comes to join two parts of the flow together, or do some complex windowing. Based on this kind of requirement you should have considered something like Spark, perhaps you need to redo some of the work to keep the flow logical, and introduce Kafka as well as a buffer in between. Supplier Commitment The questions: Who supplies the data. What is the SLA. Will I be informed if the structure changes? Will these changes be available for testing? Is the data supplier responsible for data quality? The Nightmare: You don't get a commitment, and suddenly your consumers start seeing wrong results. It turns out a column definition was changed and you were not informed. After this you get a message one of the smaller sources will be down for 12 hours, you need this to enrich your main source. So now you will be breaking the service level agreement to your consumers for a reason they may not want to understand. Nonfunctionals The questions: How big is the data, what is the expected througput. What is the required latency? The Nightmare: You design and test a flow with 10 messages per second, and buffers to cushion the volatility. You end up receiving 10000 messages per second. For this you may even need a bigger budget. After your througput (budget_ has been increased significantly, it turns out the buffers are too big and your throughput SLA is not met. Now you can go back to request an even larger compute capability. Of course there are other things to ask, such as requirements to work with specific (legacy) tooling, exact responsibilities per topic or security guidelines to abide by. But typically these are the things I consider to be the most critical and specific to working with data.

ClouderaUser777 · ‎12-30-2019

I found solution to my problem: Change is required to "HDP_3.0.1_docker-deploy-scripts_18120587fc7fb\assets\generate-proxy-deploy-script.sh" file. I moved: [9090]=9090 [9091]=9091 from section: tcpPortsHDF=( [2202]=22 [2182]=2181 [4557]=4557 .... ) to section: tcpPortsHDP=( [12049]=2049 [2201]=22 [2222]=22 [1100]=1100 .... [9090]=9090 [9091]=9091 ... ) Then I ran again "HDP_3.0.1_docker-deploy-scripts_18120587fc7fb\docker-deploy-hdp30.sh". Now it works.

AV_1010 · ‎12-30-2019

Thank you for the input, this was helpful advice!

Fierymech · ‎12-25-2019

@DennisJaheruddi ....Thanks much for making the Christmas more merrier 🙂 I agree to your statement and have configured the flow accordingly. I am marking your reply as accepted solution. Great advise and kudos to you again.

raymond_cui2015 · ‎12-25-2019

Hi Dennis, Thank you for the reply. I could use the ListFile and FetchFile processors to get the files of the target host (such as: MySQL log files).

peter_coppens · ‎12-24-2019

Hello, Tx for your reaction Dennis. I was able to continue. The reason I was having issues seemed to be related to the fact that the Groovy script was a standard class with a static main method. It makes sense that Nifi does not accept that I guess, but the process of diagnosing such issue might be made easier imo. It was a trial and error task, and as you suggested, by starting from an example. Anyway, I am good now 🙂 Regards Peter

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: Nifi twitter term

Re: Nifi : groovy: 32: unable to resolve class JSc...

Re: Nifi - SelectHive3QL processor is throwing a K...

Re: NIFI - Extract data from flowfile to write a ...

What to ask when becoming responsible for moving d...

Re: Apache NiFi: Processor configuration url issue

Re: NiFi - JSON to MongoDB - problems with array

Re: how to read file content and extract specific ...

Re: Need help to connect MySQL with CaptureChangeM...

Re: ExecuteGroovyScript gives InstantiationExcepti...