Member since
01-07-2019
220
Posts
23
Kudos Received
30
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5047 | 08-19-2021 05:45 AM | |
1811 | 08-04-2021 05:59 AM | |
879 | 07-22-2021 08:09 AM | |
3692 | 07-22-2021 08:01 AM | |
3429 | 07-22-2021 07:32 AM |
06-11-2020
01:27 PM
Our installation had the password hash in another table. update ambari.user_authentication set authentication_key='538916f8943ec225d97a9a86a2c6ec0818c1cd400e09e03b660fdaaec4af29ddbb6f2b1033b81b00' where user_id='1' Note: user_id=1 was the admin in my case.
... View more
02-27-2020
11:43 PM
Thanks. Implemented Flume to fetch the weblogs that was transferred to the Hadoop edge server upto HDFS. Also, due to firewall challenges and security implementations and lack of test environment, used an alternative solution of using Zena job scheduler to transfer the log files from ATM machines and mobile web app logs to hadoop edge server. Kafka came as a big challenge since we are using LDAP thus security and authentication issues quickly cropped up. Kudus to your suggestion!
... View more
02-05-2020
05:38 AM
the issue is resolved now. I have a cluster of 12 nodes. and on some nodes beeline-hs2Connection.xml was not present. putting the file in hive conf directory on each server resolved the issue. thanks Mohit
... View more
02-04-2020
09:18 AM
Hi all, Above solution is failing at one scenario, Scenario: if multiple flow files processed at a time and landed in the nifi queue which is used after update query ( i.e. puthiveql which increment processed_file_cnt by one for every flow file ) processor ,then there might be chances of triggering the next flow multiple times and that is wrong. Because we do select processed_file_cnt first and then doing the comparison for processed_file_cnt with input_file_cnt.
... View more
01-31-2020
08:49 PM
Hi , My assumptions was wrong , putsql processor does execute update query per flowfile
... View more
01-27-2020
09:29 PM
I tried to use regex but it didn't work. Yes the space bt space did really work, I'm thinking to cover other cases by adding space bt & bt space in case they are at the beginning or end of sentence. Thank you for your help, really appreciate it!
... View more
01-27-2020
08:47 AM
Problem is solved The module directory field was incorrect.
... View more
01-23-2020
11:51 AM
Will do. Thanks.
... View more
01-22-2020
01:59 PM
@Alexandros You can accomplish this use ReplaceText with a more complex Java regular expression. The Replace Text is designed to replace every occurrence of the string matched by your java regular expression with the replacement value. So you are probably seeing your replacement value inserted into your existing content twice. Try using the following java regular expression which will match your entire 3 lines of content: .*\n.*?(\d{2}.\d{4}).*?\n.*?(\d{2}.\d{4}).* Leave your replacement value as you already have it and make sure you have Evaluation mode still set to Entire text. Hope this helps, Matt
... View more
12-30-2019
09:42 AM
What to ask before making Data Flow When taking on a new responsibility for designing and maintaining data flows. What are the main question one should ask to ensure a good outcome? Here I list the key questions for important topics, as well as an illustration of what typically goes wrong if the questions are not asked. The most important points if you are under pressure to deliver Location The questions: Where is the data, where should it go (and where can I process it). And of course: Do I have the required access The Nightmare: Data is spread across multiple systems, one of these may not be identified. After you finally figure out which tables you need you try to start and don’t have access. When you finally get the data you either don’t have a compliant place to put it, or you are missing a tool. Finally you have the data but it is unclear how to get it written to the target. In the end a 3 day job takes 6 weeks. Context The questions: What is the data, and who understands the source/target? The Nightmare: You want to supply revenue data during business hours. First of all you get access to multiple tables, each containing various numbers which might be the revenue. After figuring out which one is the revenue, it turns out you have transactions from multiple timeones in and out of summer time which needs to be solved before moving it into the target application. Finally it turns out the target application needs fields not to be NULL and you have no idea what will happen if you use a wrong default. Process The questions: Who makes the specifications, and accepts the results. How to deal with the situation that the requirements change? (Or as it may be phrased, you did not understand them correctly). How to escalate if you are not put in circumstances where you can succeed? The Nightmare: The requirements are not completely clear. You make something, and get feedback you need to change one thing. After this, you need to change another thing. It is unclear whether these are refinements (from your perspective) or fixes (from their perspective), however when the deadline is not met it is clear where the finger will be pointed. The most important points if you want things to go right Complexity The questions: What exactly should be the output, what exactly needs to be done? The Nightmare: You build a data flow in Nifi, near the end the request comes to join two parts of the flow together, or do some complex windowing. Based on this kind of requirement you should have considered something like Spark, perhaps you need to redo some of the work to keep the flow logical, and introduce Kafka as well as a buffer in between. Supplier Commitment The questions: Who supplies the data. What is the SLA. Will I be informed if the structure changes? Will these changes be available for testing? Is the data supplier responsible for data quality? The Nightmare: You don't get a commitment, and suddenly your consumers start seeing wrong results. It turns out a column definition was changed and you were not informed. After this you get a message one of the smaller sources will be down for 12 hours, you need this to enrich your main source. So now you will be breaking the service level agreement to your consumers for a reason they may not want to understand. Nonfunctionals The questions: How big is the data, what is the expected througput. What is the required latency? The Nightmare: You design and test a flow with 10 messages per second, and buffers to cushion the volatility. You end up receiving 10000 messages per second. For this you may even need a bigger budget. After your througput (budget_ has been increased significantly, it turns out the buffers are too big and your throughput SLA is not met. Now you can go back to request an even larger compute capability. Of course there are other things to ask, such as requirements to work with specific (legacy) tooling, exact responsibilities per topic or security guidelines to abide by. But typically these are the things I consider to be the most critical and specific to working with data.
... View more
Labels: