Member since
05-07-2020
32
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
960 | 03-15-2022 08:41 AM |
01-25-2023
02:13 AM
We using PutHive3Streaming processor to send data to Hive from NiFi, I have an issue where we are getting LOTS of small delta files on our busier feeds which is causing issues with compaction etc. I have used a series of merges in NiFi to ensure each flowfile contains many thousand records but it still creates many delta files. I wondered if anyone had any advice on tuning 'Records Per Transaction' and 'Transactions per Batch' options on the PutHive3Streaming processor, I believe this could help with my issue but have had mixed/confusing results from testing. There isn't a great deal of information on best practice that I have found. Has anyone else had similar issues/found adjustments helpful?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-02-2022
01:57 AM
Hello all, NiFi Jolt question Does anyone know if it's possible to convert timestamp formats using Jolt using 'field.value' the way you can using update Record? I'm aware that I can use an attribute value in Jolt to convert a field, like: "date" : "${timestamp:toDate('yyyy-MM-dd HH:mm:ss'):toNumber()}" but I was wondering if this function can be done using Jolt on incoming 'field.values', have tried using the default spec as below but it doesn't change the format. Thanks in advance. Andy
... View more
Labels:
- Labels:
-
Apache NiFi
10-31-2022
09:33 AM
@steven-matison I have been trying to get ifElse working for me but the below gives me an empty string > "" And this gives me null as a string > "null" is there a way to return null not as a string?
... View more
10-31-2022
02:16 AM
Hi @steven-matison Here is a flow file example, as you can see the date field is only present in the array some of the time, what I want to achieve is, if the date is present I need it in epoch. If the date is not present for the field to be 'null,' (not "null" as a sting/in brackets) or ignored altogether. { "example": "cloudera", "risk_rating": "low", "observed" false, "sources": [ { "date": "2020-06-12T12:00:00.000Z", "unique_id": "some_value", "source_name": "cloudera", "url": "https://www.cloudera.com", "description": "example_log", },{ "unique_id": "some_other_value", "source_name": "google", "url": "https://www.google.com", "description": "example_log_without_date" }], "report_confidence": "confirmed", "base_score" 7.8, "authentication": "none" }
... View more
10-28-2022
02:39 AM
Hello all I have a slightly annoying problem in NiFi, I have some JSON records that have a field 'date' nested under 'sources' that I want to turn to Epoch time. Using updateRecord I am able to do this using the following config: As you can see where this field is present it works, however sometimes this field is not present, in these cases the record value is blank: To do deal with these blank field I did some searching online and found the following 'isBlank' could be used in updateRecord property to deal with blank fields: The outcome of this fixes the blank field issue... but, doesn't convert the present field values to epoch, see below: Does anyone know how I can get these two actions to perform at the same time? Thanks in advance. Andy
... View more
Labels:
- Labels:
-
Apache NiFi
10-14-2022
02:58 AM
Hi @Fredi Can you send a screenshot of the advanced tab rules if possible, I'm a little confused as to what you are wanting to achieve. Cheers
... View more
10-03-2022
07:16 AM
@nramanaiah have been able to run further testing and confirm that my partitions are purging as expected! thanks again for the assistance!
... View more
09-27-2022
01:53 AM
@nramanaiah I haven't had a chance to do further testing yet, I will let you know ASAP. Thanks again for the help.
... View more
09-20-2022
04:46 AM
@nramanaiah I'm still experiencing some issues with this, I have applied metastore.msck.repair.enable.partition.retention=true and restarted, all looks good as below. I have applied the ALTER table statements to set the retention of 1 day to a test table without error, but when I do a SELECT statement in Beeline I can still see data from last week? Any idea what I'm missing?
... View more
09-14-2022
08:40 AM
@nramanaiah thanks very much for the help!
... View more
09-13-2022
12:34 AM
Thanks for the reply @nramanaiah. I seem to be unable to find an option ' metastore.msck.repair.enable.partition.retention' does it need to be added as a custom option and if so under which drop down? Thanks
... View more
09-08-2022
01:44 AM
I am trying to set partition retention times on existing Hive manged tables using the following: ALTER TABLE <table name> SET TBLPROPERTIES ('discover.partitions'='true'); ALTER TABLE <table name> SET TBLPROPERTIES ('partition.retention.period'='1d'); as stated on this page below, however I am still able to search partitions older than a day so it appears to not be working? It does mention on the page that this is for 'external' tables, can anyone let me know if this an 'age off' retention period is possible on managed tables? am I missing any commands etc? https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.4/using-hiveql/content/hive-set-partition-retention.html Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
09-02-2022
03:04 AM
1 Kudo
@araujo thank you! I had gotten around the issue by using a replaceText loop to search for capitals within' the keys 1 at a time and prepend underscores, followed by a jolt to lowercase all keys. You're script returns the same result, very impressed. Thanks again.
... View more
08-22-2022
08:16 AM
I have stolen to spec below from here: https://stackoverflow.com/questions/54696540/jolt-transformation-lowercase-all-keys which works in lowercasing all keys, however I ideally need the hyphens between the words. [ { // unwrap the keys and values into literal // "key" : "A", "value" : "b" "operation": "shift", "spec": { "*": { "$": "&1.key", "@": "&1.value" } } }, { "operation": "modify-overwrite-beta", "spec": { "*": { // Now that the origional key // is on the "right hand side" // lowercase it "key": "=toLower" } } }, { // pivot back, the now lowercased keys "operation": "shift", "spec": { "*": { "value": "@(1,key)" } } } ]
... View more
08-22-2022
07:36 AM
Within my NiFi flow I have a number of datasets, with various schemas where the 'key' of key value pairs are on camelCase but i want any incoming keys to be outputted in snake case, does anyone know what jolt spec i can use to achieve this? Example input: { "aRandomFieldName":"Random Value"} Required Output: { "a_random_field_name":"Random Value"} Been struggling to achieve this using Jolt or replaceText processors, any assistance would be great. Thanks in advance Andy.
... View more
Labels:
- Labels:
-
Apache NiFi
03-15-2022
08:41 AM
Hello @Azhar_Shaikh Thanks for the reply, as it turns out it wasn't a service account problem. We found that the ListS3's output included a 'key' field, and this is what was required in The FetchS3Object processor for 'Object Key'. So the fix I applied was to split the json into individual records (SplitJson), then pull the keys out as attributes (EvaluateJsonPath) then input ${key} into the FetchS3 processor.. worked a treat.
... View more
03-15-2022
01:34 AM
Hi @VidyaSargur I do not see an option for 'Accept as Solution' below the post, I assume because I didn't ask the original question. Regards
... View more
03-11-2022
01:31 AM
Hello I am having an issue retrieving bucket contents into NiFi using the FetchS3Object processor. I have configured the ListS3 processor to pull in a json array containing all the bucket information, I'm happy that's working ok... example output: I have configured the FetchS3Object processor with the same Accesskey, Secret etc but I get the following error, I haven't been able to find much online ref this error, can anyone see where I'm going wrong? FetchS3Object Config: Any assistance would be greatly appreciated, I've seen a lot of questions asked about these two processors but haven't found anything about this particular error. Cheers Andy
... View more
Labels:
- Labels:
-
Apache NiFi
03-10-2022
08:46 AM
@araujo Thanks very useful indeed!
... View more
03-10-2022
01:34 AM
@araujo Thanks André What if I wanted to turn the key value pairs separated by '=' in json content as in the original question? for instance the following syslog: converted into: [ { "sig" : "0", "arch" : "c000003e", "syscall" : "87" }] I'm aware this can be done using regex to create attributes and then attributeToJson but some of my logs have hundreds of key value pairs so that's not an option, there must be a way to convert it using record processing? i.e convertRecord
... View more
03-08-2022
03:10 AM
1 Kudo
Hi, I also have the same problem, I am able to get the 'body' into valid json but want the key value pairs (separated by '=') into json, having no luck?
... View more
01-10-2022
06:21 AM
Thanks Matt!! I hadn't used the advanced tab on the processor until now, worked a treat for my use case. Apologies for the slightly confusing question. I am now able to pick out key words from the hostname attribute to direct them to the correct database. Thanks for the quick response!
... View more
01-10-2022
04:15 AM
Hello all Within NiFi, updateAttribute processor I am trying to change an attribute called 'hive_database' based on the value of another attribute called 'hostname'. For instance if: #1: hostname = Mickey Mouse #2: hostname = James Bond I want to use a 'contains' statement (or similar ) to change the output of new attribute 'hive_database' property: hive_database value: ${hostname:contains('Mickey'):<OUTPUT>('cartoon')} <OR> ${hostname:contains('Mickey'):<OUTPUT>('movie')} So that #1 output would be: property= hive_database value= cartoon The parts I'm stuck on is the OUTPUT and OR parts above, do you know if this would be possible using the update attribute processor? I am trying to avoid having to use routeOnAttribute to break out multiple different 'hosname' values to send data to the relevant 'hive_database' Hope this makes sense, unable to share screenshots etc as sensitive vales. Any help greatly appreciated.
... View more
Labels:
11-25-2021
03:18 AM
Thanks for the reply, is there a way to use Jolt to extract to flowfile attribute?
... View more
11-17-2021
07:28 AM
Hi If one flowfile is processed regularly and the other comes in intermittently do they even need to be merged? it sounds like the flow is so quiet that use of merge is not required?
... View more
11-17-2021
07:23 AM
Hello all I have a NiFi data flow that has flowfiles containing multiple json records (with exactly the same received time) and I'm trying to extract the timestamp as a flowfile attribute as follows: All the received times match as I'm using a PartitionRecord processor prior to EvaluateJsonPath, I have confirmed they are identical. I am aware that I can split > extract > merge but I am trying to avoid that due to the high volume of data passing through this flow, is there any way to just extract the first timestamp for all the records in a single flowfile using EvaluateJsonPath? or another record processor? I am also trying to avoid having to use ExtractText and regex to pull the timestamp, again due to the volume of data in the flow. There is an existing flow working in our production environment using plain/text and regex etc. I am trying to re-design using record processing to improve efficiency. any assistance welcome. 🙂
... View more
- Tags:
- apache nifi
- avoiding regex
- evaluatejsonpath
- extract timestamp
- high volume flow
- json
- multiple records
- NiFi
- PartitionRecord
- RecordProcessing
Labels:
- Labels:
-
Apache NiFi
07-05-2021
07:36 AM
Thanks for the reply, I am pulling the .rpm file into NiFi using API. How do i convert it to binary? I have not been able to get it to work using invokeHTTP so a colleague of mine has written a Python script for me to use in a excecuteScript processor. This is working in his test environment using a test .rpm file he pulled into using NiFi using GetFile as apposed to API. However in my environment using the API file the flowfile does not move through the processor? again could be as it's not binary?
... View more
06-07-2021
04:21 AM
Thanks for both replies, I managed to get it working last week the same way as you have shown Matt. Cheers
... View more
06-02-2021
11:27 AM
Is there a processor that can detect hidden characters in NiFi? In our test environment I am using an API invoke to deliver a JSON payload, for added security we want to filter out any hidden and potentially malicious text. I have been trying to use RouteText, RouteOnContent and ExtractText processor to only allow alphanumeric and punctuation characters through but can't seem to get the regex to work when looking for 'uncommon' text or characters? any help would be appreciated. Andy
... View more
Labels:
- Labels:
-
Apache NiFi
05-17-2021
06:16 AM
Hello, I have been tasked with POSTing a .rpm file (RedHat package manager) to a server location, this is to automate a manual process that we perform on a weekly basis.
I am able to download the file into NiFi using InvokeHTTP GET. At this stage the 'Content-Type' attribute shows as 'audio/x-pn-realaudio-plugin' , I have read online that this is a common mistake when dealing with .rpm files. I have then used a IdentifyMimeType processor which updates mime type to: application/x-rpm.
In the API documentation is says I require POST /rest/v1/updates as location 'file' as a parameter, see below:
I have been given the following python script that achieves what I am trying to do in Python, so far I have not been able to replicate in Nifi.
I have tried numerous different configs of the processor, manually changed attributes: file, filename, mime.extension, mime.type, Content-Type, Accept-Encoding. I have tried manually adding attributes from the Python script and referencing from the invoke. I have tried to compress to gzip as I was informed it should be in compressed format so I though NiFi had maybe changed it.
So far I have never had anything other than 'no retry' responses.
I currently have set up my invokeHTTP as below:
When I have 'Send Message Body' true - I get the following response:
When I set 'Send Message Body' false - I get:
Any assistance with this at all would be v.much appreciated. I have been trying for a number of days with every different option/config I can think of but cannot get it to POST.
In the script if mentions getting a 'csrftoken' response, am I able to do this using another invokeHTTP?
Thanks in advance
Griggsy
... View more
- Tags:
- .rpm
- Gzip
- invokehttp
- invokehttp processor
- NiFi
- no-retry
- post
- POST .rpm file
- python
- redhat-package-manager
- Script
Labels:
- Labels:
-
Apache NiFi
-
NiFi Registry