Member since
07-30-2019
105
Posts
129
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
762 | 02-27-2018 01:55 PM | |
1240 | 02-27-2018 05:01 AM | |
3096 | 02-27-2018 04:43 AM | |
665 | 02-27-2018 04:18 AM | |
1906 | 02-27-2018 03:52 AM |
11-07-2019
06:17 AM
1 Kudo
It is possible this is related to https://issues.apache.org/jira/browse/NIFI-6846. This has been merged into Apache NiFi master but not put in a release. If you're a Cloudera supported user please reach out for support on this.
... View more
02-17-2019
01:51 AM
This appears to have been addressed in https://issues.apache.org/jira/browse/NIFI-5795
... View more
10-16-2018
08:31 PM
There are a lot of great blogs/docs on the nifi record readers/writers and associated processors. For the Hive part specifically we'll want someone more familiar with the Hive metastore calls needed to help give guidance.
... View more
10-16-2018
05:24 PM
Hello. The current handling of NiFi in the face of schema changes to the data it handles relative to the downstream Hive table is that we would not send data we have which is not reflected in the downstream schema. So while the upstream data might change by adding a simple column NiFi would be happy and the flow to Hive should continue. If you want the new column reflected you would need to update the schema in an out of band process either in Hive or in establishing a flow that automates schema updates being reflected to Hive. It would be a fine feature request to add the ability to optionally automate the process of aligning current schema of flowing data and Hive table schema so that as new columns arrive we can send it. It would be good to hear the thinking for this in general as it relates to changes such as type changes, columns being removed, how many new columns would be considered odd, etc.. There are definitely some problematic aspects to this idea but for the safe cases it could be helpful. This would be good to talk about with the Hive team as this would be specific to the NiFi/Hive integration. NiFi in general has always handled such cases easily and with the record processors we can automatically evolve with the schemas and do so in a Schema Registry compliant manner and you dont have to change code/config/etc.. to leverage that.
... View more
06-06-2018
09:06 PM
What happens when you pass that to FetchS3Object? My first thought here is that ListS3 should not be producing output flowfiles for anything other than retrievable objects/files and if it is then it is either a bug or a mode that should be supported so that the directories/buckets themselves aren't listed but rather only their content.
... View more
04-13-2018
07:27 PM
1 Kudo
Since the data coming in is JSON concatenated together with newlines the first thing you can do is use SplitText to split on newlines. Then the resulting objects are JSON documents at which point in NiFi you can use all kinds of fun processors. The suggestion above to use MergeContent *I think* would head down the opposite path from what I believe the question was but perhaps I understood it incorrectly.
... View more
02-27-2018
02:00 PM
@Gaurang Shah Hello there. You will really want to take a look at HDF 3.1 that launched just before you sent this post. In it is the new Apache NiFi Registry which makes this case very easy and moves way beyond wrestling with templates.
... View more
02-27-2018
01:57 PM
@Peter Kotula The reporting task API does not have access to cluster details and is intended as a way for a nifi node to publish information it knows about itself. To monitor the health/status of the cluster the HTTP-based (REST) API is the intended mechanism. It would be interesting to hear why you're ruling that out.
... View more
02-27-2018
01:55 PM
There are no real additional reasons. The key reason for ListenSyslog is that it understands syslog message framing on top of a raw TCP socket. Otherwise, it is basically not much different than ListenTCP or ListenUDP.
... View more
02-27-2018
01:54 PM
It is important that the script you're invoking always will terminate on its own. NiFi cannot reliably kill threads that it has handed out to components. The community has a feature in progress which will provide a solution to work around these cases but the real solution is to ensure that the invoked stream command always terminates properly.
... View more
02-27-2018
01:50 PM
I just verified on a latest build that setting that property does not result in any validation errors. It is possible there was an issue that has been resolved since that release. You might want to try HDF 3.1.
... View more
02-27-2018
05:01 AM
@Hemantha kumara We have a docker container available as you note and it works fine as a way to launch a single node NiFi. We dont have any published kubernetes specific configurations at this time but to your question of do we plan to support that - yes we do. We cannot commit to any kind of timelines in this forum or even that we'll definitely do it but it is safe to say it is a direction of high interest.
... View more
02-27-2018
04:55 AM
If you tried to extend an existing processor it is possible/likely your nar is including many things you didn't intend or should not have in it which can effectively pollute the bundling. If you want to extend another processor or component in NiFi you want to do so by copying the code rather than using maven to grab the same libraries and that component library then doing normal Java extension. Or, you want to do that extension within the original bundle itself. Does your nar have a listed dependency on the dbcp nar?
... View more
02-27-2018
04:48 AM
I'm not familiar with any of those format extensions but you assuming you have Java libraries to operate on them or you can integrate them nicely into NiFi. Many of the standard processors can be used as great examples to get you going. Format/Schema transformation is a really common use case for NiFi.
... View more
02-27-2018
04:43 AM
1 Kudo
Network issues can certainly be a factor. However, you might also want to ensure you use the precise Kafka client for the given Kafka broker version. Since you're on Kafka 0.11 you might want NiFi 1.5.0 or HDF 3.1.0 which has support for that directly in ConsumeKafka_0_11
... View more
02-27-2018
04:40 AM
Hello @Henrik Olsen you should not need to do additional file integrity checks beyond what the transport protocols will do for you. However, your question about List/Fetch and guarantees that the data is done being written to... There are no guarantees. The most reliable model in the world of resolving race conditions with File IO between the producer (the thing writing the file) and the consumer (the thing grabbing the file) is to use file naming techniques such as prepending the file with a '.' while writing and removing it when complete. If you cannot establish such a model then you can resort to more complicated techniques like waiting to Fetch listed files after some intentional/artificial delay.
... View more
02-27-2018
04:36 AM
1 Kudo
Hello @ANDREJ KUZMIN You'll want to look at PutElasticSearchRecord which allows for a nice microbatching/higher efficiency path.
... View more
02-27-2018
04:23 AM
1 Kudo
You'll want to review the heap dump further but it sounds highly likely that FlowFile objects are being built up in large numbers (10s/100s of thousands+) within the ProcessSession which can quickly take up a lot of heap. The heap storage for that is not the content of the flowfiles but is the attributes and that can still add up fast. You want to ensure you're committing the session frequently to move those along and not have the session have too many flowfiles being tracked at once. Now, having said that you should also consider doing ConsumePulsarRecord instead of just ConsumePulsar, for example. Doing it via the record model will dramatically outperform doing it otherwise unless you do your own raw record framing such as newlines or something where a single flowfile represents many records at once. If it is a single record is a single flowfile then just be sure you're committing the session frequently.
... View more
02-27-2018
04:18 AM
3 Kudos
You can use ListSFTP -> FetchSFTP -> PutFile flow on NiFi to grab the files from wherever you're storing them for the master copy of the configs. This will have NiFi keep itself up to date and you can have your hadoop resources point at the location where you do the PutFile.
... View more
02-27-2018
04:10 AM
1 Kudo
@Adrian Oprea great video and that really helps eliminate a lot of questions. The result of this part of the command $(ps -ef | grep -v grep | grep kibana | wc -l) ...does not appear to match when executed in the bash prompt versus the nifi environment. You might want to run only that part of the command in your script to see what it says in the attribute in NiFi. Also, you might want to ensure you're running in the shell you expect. You might want to have #!/bin/bash At the top of the script, for example.
... View more
02-27-2018
03:52 AM
1 Kudo
I believe you'll want to run the HDF build of NiFi which has libraries tailored to work with HDP Hive.
... View more
02-26-2018
02:10 PM
2 Kudos
Hello there Chris. While not a direct answer to your question the community made the NiFi 1.x release line available in Aug 2016. In the latest release on the 1.x line (1.5.0) the community introduced the Apache NiFi Registry. This provides a really powerful and well integrated way to have versioned flows stored in a central registry which you can use to have good SDLC behaviors from dev, to staging, to prod and which handles things like sensitive properties and processor group level variables well. It also lets you have nested versioned groups which is really useful for multi-tenant/team cases.
... View more
12-26-2017
03:43 PM
2 Kudos
Yes this solves the original issue of this thread (promptForName). What is happening is the JDK/JRE security code is allowing the search for other methods to obtain the principal in a condition where a failure has occurred and a retry is being blocked most likely due to insufficient time. We've spent a considerable amount of time debugging this condition. The link to the system property that explains its meaning/role is here https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/single-signon.html. Specifically read the 'Exceptions to the Model' case where this property is described. Doing this will ensure the JDK/JRE does not attempt any methods/mechanisms other than what we've said we want and specifically it avoids the scenario where it would try to prompt for the user to supply a name at the command prompt which would obviously never work and worse yet when that happens our thread is stuck until a restart. So, yes, add this system property and you should be in far better shape with regard to the prompt for name issue.
... View more
12-26-2017
02:55 PM
5 Kudos
Hello @Tarun Kumar. Set this system property 'javax.security.auth.useSubjectCredsOnly' to true. To configure it this way in NiFi you can add this line, for example, to your nifi/conf/bootstrap.conf file. java.arg.101=-Djavax.security.auth.useSubjectCredsOnly=true
... View more
06-21-2017
08:07 PM
1 Kudo
Any component, custom or not, which is not responding in a timely manner to lifecycle calls such as when it is unscheduled will do this. I've seen it quite a bit lately as well. We should consider listing thread titles or something that have it as that will help spot the culprit pretty quickly.
... View more
06-21-2017
01:58 PM
3 Kudos
At present you can consume messages from a Kafka cluster which have been encoded using Confluent Schema Registry serializers. However, we would read them in the raw byte form and the data would not be very useful unless you are able to handle that elsewhere in the flow. With the HDF 3.0 release we've now provided support for schema registries in general which include a built-in simple schema registry in Apache NiFi and the ability to leverage the Hortonworks Schema Registry. Rather than pushing such logic into Kafka specific serializers and deserializers we have a more powerful and broadly applicable reader and writer mechanism which is both format and schema aware without processors having to be worried about anything other than Record objects which is what the readers and writers deserialize and serialize. So, i said all that to say that a good logical step for us then is to consider adding support for the Confluent Schema Registry. There is a ticket for this work in the community and hopefully it will be progressed soon.
... View more
04-04-2017
08:28 PM
Can you please share the details of what was being put? Perhaps explain/show what the content of the failed flow file was?
... View more
01-26-2017
04:08 PM
2 Kudos
The stack trace shows "connection reset by peer". There are some good explanations of what this tells us on the Internet but the moral of the story is it suggests the connection NiFi was writing to was closed and NiFi was notified of that. It happened while trying to write the response which is an exceptional condition so you get this stack trace. I think we'd need to understand the systems involved in this web request/response cycle to help diagnose much further.
... View more
01-16-2017
03:54 PM
6 Kudos
Hello @Arsalan Siddiqi. These are some excellent questions and thoughts regarding provenance. Let me try to answer them in order. ONE: The Apache NiFi community can definitely help you with question on specific timing of releases and what will be included. I do know though that there is work underway for around Apache NiFi's provenance repository so that it can index even more event data on a per second basis than it does today. Exactly when this will end up in a release is subject to the normal community process of when the contribution is released and reviewed and merged. That said, there is a lot of interest in having higher provenance indexing rates so I'd expect it to be in an upcoming release. TWO: The current limitation we generally see is related to what I mention above in ONE. That is we see provenance indexing rate being a bottleneck on overall processing of data because we do cause backpressure to ensure that they backlog of provenance indexing doesn't just grow unbounded while more and more event data is processed. We are first going to make indexing faster. There are other techniques we could try later such as indexing less data which would make indexing far faster at the expense of slower queries. But such a tradeoff might make sense. THREE: Integration with a system such as Apache Atlas has been shown to be a very compelling combination here. The provenance that NiFi generates plays nicely with the type that Atlas ingests. If we get more and more provenance enabled systems reporting to Apache Atlas then it can be the central place to view such data and get a view of what other systems are doing and thus give that nice system of systems view that people really need. To truly prove lineage across systems there would likely need to be some cryptographically verifiable techniques employed. FOUR: The provenance data at present is prone to manipulation. In Apache NiFi we have flagged future work to adopt privacy by design features such as those which would help detect manipulated data and we're also looking at solutions to have distributed copies of the data to help with loss of availability as well. FIVE: It is designed for extension in parts. You can for example create your own implementation of a provenance repository. You can create your own reporting tasks which can harvest data from the provenance repository and send it to other systems as desired. At the moment we don't have it open for creating additional event types. We're intentionally trying to keep the vocabulary small and succinct. There are so many things left that we can do with this data in Apache NiFi and beyond to take full advantage of what it offers for the flow manager, the systems architect, the security professional, etc.. There is also some great inter and intra systems timing data that can be gleaned from this. Systems like to brag about how fast they are....provenance is the truth teller. Hope that helps a bit. Joe
... View more