609
Posts
95
Kudos Received
114
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
92 | 01-11-2021 05:54 AM | |
73 | 01-11-2021 05:52 AM | |
93 | 01-08-2021 05:23 AM | |
242 | 01-04-2021 04:08 AM | |
499 | 12-18-2020 05:42 AM |
11-12-2020
05:26 AM
@Namitjain This should absolutely work. I would suggest you update the post with information about the exact issue you are having. Be sure to include your configuration of the putS3 Processor. I also recommend not routing failure back on the processor itself in testing. Route to another proc, or an output port during testing. When a flowfile goes to failure, inspect it. Look at the attributes on the flowfile, it often has info on why it failed. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
11-02-2020
05:15 AM
@P_Rat98 Per our PM discussion, in your flow use DetectDuplicate before sending an email. This should rate limit the # of messages you send based on your configuration of detectDuplicate. Additionally when this is linked in your flow, and duplicates are auto-terminated it will drain the flow and stop it from filling up the queue. Also as suggested you can chose to retain the duplicates, but move them into a much bigger Queue which isnt going to back up the main flow. Then once you see the email, you can go look at flow, see what flowfiles were causing issues, and take some corrective action. If you really need to monitor flow for a queue being full, you would need to use the nifi API to check the status of the queue. This maybe more work than it is worth, when you can solve as above much easier. However, i would recommend you check the api out, there are a lot of api capabilities and I am beginning to use nifi api calls within my flow to monitor, stop, start, and take actions automatically that would normally require a human doing them in the UI. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
10-29-2020
01:51 PM
Thanks @stevenmatison Do you by chance know the answer to this question https://community.cloudera.com/t5/Support-Questions/Extract-string-nested-in-JSON-value/m-p/305099 It's probably something very easy, but nothing that I tried works. Valentin
... View more
10-29-2020
07:54 AM
Thanks, I am trying some stuff now to parse data using the JoltSpec/JoltTransformJSON processor that could help me with this issue, but thanks for this help, hopefully can get things running more smoothly soon. 🙂
... View more
10-29-2020
04:46 AM
@amey84 Yes. Although yum install still provides the bundled postgres, you can choose to install it or another database separately. During ambari-server setup you choose Y here: Enter advanced database configuration [y/n] (n)? y The following links will be helpful here for more info about ambari + postgres: https://docs.cloudera.com/HDPDocuments/Ambari-2.6.1.5/bk_ambari-administration/content/using_ambari_with_postgresql.html If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
10-29-2020
04:39 AM
@Kaur it appears like your nifi node does not have enough system ram to allow you to use 2g and 4g settings. I suggest increasing the node specification to at least 8gb or 16 gb of system ram and test boostrap config with 2g 4g or 4g 8g respectively. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
10-24-2020
07:20 PM
You are right. Why do I even need a user name and password to download it? It doesn't look like an open-source.
... View more
10-20-2020
06:49 AM
Great article! Is Registry also keeping track of the user that committed the changes? If yes, does this also work in the case that NiFi is configured to use OIDC instead of client certificates? (i.e. different usernames in NiFi than in Registry)
... View more
10-19-2020
01:21 PM
The solution you are looking for is: ReplaceText: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.0/org.apache.nifi.processors.standard.ReplaceText/ You can find loads of examples here in the forum with this search: https://community.cloudera.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&q=replaceText If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
10-16-2020
01:53 AM
After I have not been able to find a solution that would be easy to implement inside of NiFi, I've written a small perl (yuk) script that can be used to adjust timestamps in a CSV file to be in ISO8601 format. Maybe it is useful to someone else: #!/bin/perl -w
# This perl script adds timezone information to timestamps without a
# timezone. All timestamps in the input file that follow the format
# "YYYY-MM-DD HH:MM:SS" are converted to ISO8601 timestamps.
use strict;
use DateTime::Format::Strptime;
my $time_zone = 'Europe/Amsterdam';
my $parser = DateTime::Format::Strptime->new(
pattern => '%Y-%m-%d %T',
time_zone => $time_zone
);
my $printer = DateTime::Format::Strptime->new(
pattern => '%FT%T%z',
time_zone => $time_zone
);
while (<>) {
s/(?<=")(\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d)(?=")/
my $dt = $parser->parse_datetime($1);
$printer->format_datetime($dt);
/ge;
print;
}
... View more
10-07-2020
10:57 PM
From what I've investigated, it may be due to the version of the mysql-connector-java. The class com.mysql.jdbc.jdbc2.optional.MysqlDataSource is present in version 5 and the one that I have installed is the 8. I tried installing explicitly the version 5 but I got different errors. What I've done and got it working is changing that class to the one available in version 8, which is com.mysql.cj.jdbc.MysqlXADataSource. For Schema Registry you have to modify 2 files: /var/lib/ambari-server/resources/mpacks/hdf-ambari-mpack-3.4.1.1-4/common-services/REGISTRY/0.3.0/package/scripts/params.py /var/lib/ambari-agent/cache/common-services/REGISTRY/0.3.0/package/scripts/params.py There change the variable registry_storage_java_class to the value com.mysql.cj.jdbc.MysqlXADataSource Note: the variable should appear twice. The same for Streaming Analytics, you have to modify 2 files. This time: /var/lib/ambari-server/resources/mpacks/hdf-ambari-mpack-3.4.1.1-4/common-services/STREAMLINE/0.5.0/package/scripts/params.py /var/lib/ambari-agent/cache/common-services/STREAMLINE/0.5.0/package/scripts/params.py There change the variable streamline_storage_java_class also to the value com.mysql.cj.jdbc.MysqlXADataSource You should then be able to start the services. I just solved this, so I'm not aware if any other errors will show up by using these services.
... View more
10-01-2020
04:24 PM
Hi steven. Thanks for the quick response. I'm running this HDP cluster in SUSE 12 SP2. This node has 32 GB RAM and using just 4. Free RAM is 27 GB. Yarn Configuration is like this: ResourceManager Java heap size = 2048 NodeManager Java heap size = 1024 AppTimelineServer Java heap size = 8072 ulimit used by RM process is: core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 128615 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited --- From RM log file: 2020-09-29 17:15:00,825 INFO scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:getMinimumAllocation(1367)) - Minimum allocation = <memory:1024, vCores:1> 2020-09-29 17:15:00,825 INFO scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:getMaximumAllocation(1379)) - Maximum allocation = <memory:24576, vCores:3 No matter how much memory is assigned to RM, it always fails with this Jana OoM. What may be a recommended Java Memory configuration for Yarn components?
... View more
10-01-2020
09:48 AM
@aniket5003 NiFi can be added to ambari using one of the HDF Management packs. Depending on your relationship with Cloudera, you may need to use your account to get after the NiFi 1.12.1 management pack. I do know other versions are out on the open internet (1.9 and below), but newest versions will require a cloudera username and password to access repos and artifacts. Once you have a management pack added to ambari, you should be able to install nifi and other HDF components in an HDP cluster. Additionally you can get 1.12.1 from nifi.apache.org and install outside of ambari interface if you need something quick. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-30-2020
08:41 AM
@Elf IMO anything is possible with ambari. That said, out of the box, maybe it would not appear to be possible without some advanced ambari admin skills. I took a look at the link you provided and that is an example of how to spin up a single machine with many of the services you may already have in your ambari cluster. To install griffin in an ambari cluster you would need to pick a node, install griffin, missing requirements (services/components not in your cluster), and thoughtfully modify the configuration to use the existing services from the ambari cluster. For example, feed griffin you configuration locations for hadoop, hdfs, hive, etc and NOT use the specific directions to install those parts based on sample documentation. If you do decide to go down this path, please update here with your progress or create new Questions with specific errors you may have.
... View more
09-30-2020
07:04 AM
@ujay Of course. The link referenced xml is a template file. Click through and get the raw xml code and save to a file. From here you import the template: Then in the upper navigation grab the template icon and drag to the nifi canvas: It should automatically choose the last template uploaded: Once the template is on the canvas click through into the process group created: You will need to do some work in Controller Services so check out the notes in the Red Box. The flow is an example of how to generate many flow files and detect duplicates. Be sure to do some research on the processor (google search) to understand how others have resolved working with the processor as your begin to integrate this into your own flow. This community is also a great research tool too:
... View more
09-30-2020
05:08 AM
@praneet Adding the value to the processor (+) is a suitable method. You just need to make sure you get the right string in that field. It's blocked out, but appears not just the actual token but prepended with "Bearer". Try just the token string. One thing I like to do for any API, before I start to work on invokeHttp configuration, is to use Postman to help me identify all of the required settings to test connection to an API. Once this is complete, I can definitively ensure that nifi invokeHttp is sending the same request. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-30-2020
05:03 AM
@Manoj90 In addition o @PVVK point, you need to be careful with routing relationships back on the originating processor. During development I like to use a output port that I call End of Line or EOL1, EOL2, EOL3 as I need more in larger flows. This is to evaluate if something goes to fail, retry, etc. Later once I am certain the flow is working as I need, I either auto terminate these routes, or I route them out of my processor group to an Event Notification system. It looks like this: Using an output port to hold un needed routes during testing If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-30-2020
04:52 AM
@mansu You need to make sure the file and locations have the correct permissions for the nifi user. For example in linux: chown -R nifi:nifi /path/to/files If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-28-2020
09:58 PM
At the end of the day I used Steve Matison and 'PVVK's solutions. First I used PVVK's to modify my flowfile into valid html, then I successfully was able to use SplitJson processor to break up the flowfiles cleanly. Then since there were multiple 'logEvent' entries in some of the records I saved off the header items as attributes and then used SteveMatison's method to break down each logEvent and added the header attributes back to it for processing. Thanks all!
... View more
09-28-2020
03:54 PM
@vikrant_kumar24 Based on what you describe you are either exceeding the capabilities of the outbound network connection from nifi, or the receiving end API. I suspect the latter but its hard to tell without seeing the number of nodes and processor scheduling settings. You can try to slow down the execution of invokeHttp to confirm. If it is set to 0 sec it can create a lot of connections to the endpoint. Try 5,10,15 seconds and re-evaluate. Also If you have more than 1 one nifi node, it can create a great deal of concurrent connections. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-28-2020
03:46 PM
@aniket5003 Check out the following post for a few different ways to locate or update the password: https://community.cloudera.com/t5/Support-Questions/Default-password-for-Metron-UI-Version-0-4-0/td-p/199277 If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-28-2020
03:41 PM
@surajnag This is a great question. Here are some of my ideas which I have used in development and production nifi flows. During development route all flow exceptions to an output port. I call these EOL, or End of Line and increment them with #s, like EOL1, EOL2, EOL3. I use these to hold failure, retry, original, no-retry, etc type outbound connections during testing. As I move to production, some of these may be auto terminated, and some may remain. In production, I route ALL major points of failure which are not auto terminated to an output port called ERROR. In my flows I sometimes have multiple ERROR output ports for difference purposes. Depending on the use case, the error is sent to some type of event notification. For Example: send an email, or post to a slack channel. In other cases these are routed to a process group that is in a stopped/disabled state. Based on the error, I may make some change in the flow, and then enable/start the flow to create a method to REPLAY that flowfile. In reference to successful end of flow, I may just auto terminate, as I want to assume the flow is always on, and everything has succeeded if it is not routed as above in #1 or #2. In other cases the bottom or end of the flow is also routed to some type of Event Notification. However, be careful here, as you could create tons of notifications depending on your flow. I hope some of these ideas are helpful. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-26-2020
09:08 PM
Hi Steven, I am able to upload file to /user/hive/warehouse/ods.db/hive_test/ or /user/hive/warehouse/ods.db/hive_test but Hue still unable to import the data. The same syntax was successfully run on linux prompt, so I really confuse why it doesn't work from Hue. regards, wenfeng
... View more
09-25-2020
12:17 PM
Hi Steven, I used @bingo 's solution to get Nifi to find my JAVA_HOME. But you mention that Nifi does not need this to run. Do you know what is the impact for running nifi without it knowing where Java is installed?
... View more
09-24-2020
08:34 AM
1 Kudo
I solved my problem. In my case one of the talbes' name was starting with a character underscore "_" because of which there was an issue where 2 single quotes were added automatically in the path of the hdfs directory where the copy of the file was stored. I changed the name of the column by removing the underscore character and now i can import the table into Hive database. I think special characters like that are not easily parsed in Hive or HDFS.
... View more
09-24-2020
06:34 AM
1 Kudo
@MKS_AWS There are a few ways to break up JSON within a flowfile (splitJson,QueryRecord). However if its just one giant blob of JSON you may not find that very useful. Perhaps you can share some sample json to that effect. Check out this way to do SNS up to 2gb per payload: https://github.com/awslabs/amazon-sns-java-extended-client-lib If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-24-2020
05:56 AM
@ravi_sh_DS This gets a bit high level, so forgive me, as I am not sure how you know which ID to change and what to change it too. That said, your approach could be to use QueryRecord and find the match you want, then update that match with UpdateRecord. You can also split the json image array with SplitJson, then use UpdateRecord as suggested above. In either method depending on your Use Case when you split the records and process the splits separately you may need to rejoin them downstream. Some older methods useful here are SplitJson, EvaluateJson, UpdateAttribute, AttributeToJson, but the Query Update Records are now preferred as it is possible to do things more dynamically.
... View more
09-23-2020
04:46 PM
@alex15 I suspect the issue you have is that nifi use is not able to execute the script. Make sure the user ownership of the file is correct, also confirm read/write permissions. In unix/linux these are chown and chmod. If this answer resolves your issue or allows you to move forward, please choose to ACCEPT this solution and close this topic. If you have further dialogue on this topic please comment here or feel free to private message me. If you have new questions related to your Use Case please create separate topic and feel free to tag me in your post. Thanks, Steven
... View more
09-16-2020
09:38 PM
@stevenmatison can you explain about how to add a driver in sqoop command, please
... View more
09-15-2020
09:51 AM
1 Kudo
Actually, both replies can be considered as valid. I confirmed that one, which better fits to my use case.
... View more