Member since
12-21-2016
16
Posts
0
Kudos Received
0
Solutions
11-15-2017
09:56 PM
I'm trying to come up with a means for dynamic attribute creation. Say I've got a flat Json structured string of name value pairs, in a cassandra db as a text field. {"A":"1", "B":"2"} Is there a way to parse this out to a result of an A attribute with value 1 and B attribute with value 2 without explicitly creating a A attribute then polling the jsonpath? I haven't had any luck in finding a way to dynamically set the lefthand side of an 'update attribute' and wondering if I'm just overlooking something.
... View more
Labels:
- Labels:
-
Apache NiFi
10-16-2017
03:39 PM
I had the same question a few months later after forgetting what I did last time and lo and behold my google search results in my own question on hwx, so thanks again.
... View more
07-13-2017
02:52 PM
We had an issue with our nifi instance where someone left something running that they shouldn't have in the morning the server is a bit of a mess. While we're in the process of sorting through the errors and logs, I'm thinking I'd like to start the nifi instance with all the processors in a stopped state. Is there a way to do this? I can't find anything with google searches or skimming the administrators docs.
... View more
Labels:
- Labels:
-
Apache NiFi
06-15-2017
04:40 PM
The payloads that we are merging are json, unfortunately we are currently stuck with nifi 1.1 until we can get an upgrade, so a change there is not an immediate fix.
... View more
06-15-2017
02:50 PM
Our application is currently running load through a split content, doing some enrichment on the 2 split pieces in parallel, then merging the results back together again. At a glance there were no problem with the merge content processor, but when we actually started measuring the time it was taking to to merge flowfiles we could see some results under 100ms, but also a good chunk that spread thinly from 100 to 2000ms. A little research led to the reminder that flowfile content is kept on disk until its merged, which means that we have to deal with a large number of random disk reads, and that seems to explain the slow down. Is there something, anything we can do to keep the reads towards the lower end? How about a way to pre-fetch or cache the flowfile content? An alternative solution I was knocking around in my head, is there a way to merge just attributes for two flowfiles to skip the diskread? The payloads are small enough to fit as an attribute.
... View more
Labels:
- Labels:
-
Apache NiFi
06-14-2017
05:10 PM
Thank you, this was very helpful. For my specific purposes and my specific scenario it would seem using the difference in time should be adequate. I will keep in mind your warnings regarding time in queue, which was also the reason I didn't want to use lineage duration either, as we have events sitting in front of a control rate.
... View more
06-14-2017
02:45 PM
When looking in the lineage of a flowfile in provenance, there is a Time field for each node in the lineage Looks of the format... Time 06/13/2017 14:18:47.678 EDT When is this field set with regards to the processors task? I ask because I am trying to track the time each processor is taking, and the event duration is not always set, so I am simply taking the timestamps of each processor and taking the difference between them. To date I've assumed the Time is taken as the flowfile 'enters' the processor so diffing time between node 2 and node 1 gives the time taken for node 1 to handle the flowfile, but I'd was wondering if someone can confirm this.
... View more
Labels:
- Labels:
-
Apache NiFi
05-25-2017
03:24 PM
As mentioned above on Matt's comment, yes, the one left behind always has the latest timestamp.
... View more
05-25-2017
03:23 PM
Yes, the one that is left behind is the latest generated file. The last file gets picked up on the second run. My use case was looking for a listing of all the files in an hdfs directory at a given moment. GetHDFS provides that functionality with the inefficient overhead of bringing the actual files into nifi. I was hoping to just get the list of files with listHDFS. I'm thinking I might look into ExecuteStreamCommand to generate the list with a hdfs dfs -ls and parse that list.
... View more
05-25-2017
02:26 PM
I am running a ListHDFS processor pointing to a directory on hdfs on a timer driven schedule set to execute once per hour. After making sure the state is clear on the processor, I run it and see that it creates a flow file for all but 1 file in the directory. There are 5 files in the directory, and only 4 flowfiles are created. If I add more files and clear the state and attempt to run again, the pattern repeats, always one less flowfile is create, so one file is missed. It is not the same file that is missed with each run. Why is the processor missing 1 file each time? Is this by design? This is in HDF 2.1.0.1 and Apache NiFi - Version 1.1.0.2.1.0.1-1
... View more
Labels:
- Labels:
-
Apache NiFi
02-06-2017
06:50 PM
Sorry I didn't respond sooner, I haven't been able to reproduce this yet either, and haven't seen it since. The scenario did have two HiveConnectionPools to the same Hive instance, with different users/dbs in different processor groups. The error itself was on the hadoop side, but the error was thrown because of permissions of the user it was operating as which would have been a function of some sort of interference between the controller services.
... View more
01-23-2017
03:11 PM
We’ve experienced an error in Nifi that I can only
think is a bug. While we were running a
Hive query in a processor group that is read/write only for Group A, using a
Hive controller service configured with Group A’s specific hive user id, the
processor failed due to insufficient permissions to access the content for the
query on HDFS with user=Group B’s hive controller service id, which is
configured in Group B’s processor group, which is only accessible by Group B. So we are seeing a conflict of
configurations between two completely separate processor groups. Here is a snippet from the error in the logs org.apache.hadoop.security.AccessControlException:
Permission denied: user=group_b_id,
access=EXECUTE, inode="/user/ group_a_id /tech/poc /out_table": group_a_id:group_a_group:drwxr-x--- at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) The point of contention is user=group_b_id. Nowhere in our
processor group do we reference that user id.
It would only be used over in a completely different processor group. The act of disabling and then enabling the
controller service temporarily fixed the issue and we cannot recreate it. Have there been other documented
cases of controller service configurations for similar components interfering
with one another?
... View more
- Tags:
- Data Ingestion & Streaming
- NiFi
- nifi-hive
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache NiFi