Member since
04-11-2016
471
Posts
325
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2075 | 03-09-2018 05:31 PM | |
2640 | 03-07-2018 09:45 AM | |
2535 | 03-07-2018 09:31 AM | |
4398 | 03-03-2018 01:37 PM | |
2468 | 10-17-2017 02:15 PM |
09-06-2017
11:45 AM
1 Kudo
Another option, without modifying your current workflow, is to configure your ConvertAvroToORC processor to use parallel threads. To do that, you can change the "concurrent tasks" parameter in the "scheduling" tab of the configuration.
... View more
09-06-2017
11:42 AM
2 Kudos
Hi @Aaron Dunlap, Depending of the HDF version you are using, you could leverage the record-oriented processors to perform the CSV - Avro conversion in a much more efficient way. Then I assume you're doing a conversion into ORC format to query the data using Hive. If that's the case, a common pattern is to let Hive do the conversion: what I usually do is to send the data into a landing folder in HDFS as Avro data, then I use a PutHiveQL processor to execute few queries (one to create a temporary table on top of the avro data using the corresponding avro schema, one to insert select the data from the temporary table to the final table which is ORC, and one to delete the temporary table), and then a DeleteHDFS processor to delete the data used to create the temporary table (because the drop table statement does not delete the data if you created a temporary external table). There is an ORC reader/writer on the roadmap that will replace all of that (you'll be able to directly convert from CSV to ORC using record-oriented processors) but that's not ready yet. Hope this helps.
... View more
06-22-2017
09:12 PM
3 Kudos
Hi @Raj B, I'd certainly recommend you to use multiple successive MergeContent processors instead of one. If your trigger is the size: you want to end with a file of 100MB, then I'd use a first MergeContent to merge small files into files of 10MB and then another one to merge into one file of 100MB. That's a typical approach for MergeContent and SplitText processors to avoid such issues. Hope this helps.
... View more
06-22-2017
10:05 AM
1 Kudo
Hi @regie canada, The second message is probably due to the fact that the processor cannot be started. You should have more details regarding the "why" in nifi-app.log file. I suspect that the port could already be in use on the host. I see you are talking about ListenTCP although your screenshots show ListenSyslog, are you sure you don't have multiple ListenX processors listening on the same port? 10k events seconds should not be an issue at all (it depends of the size of the events obviously but I guess we are talking logs and you should be good). Hope this helps.
... View more
06-21-2017
08:04 PM
1 Kudo
If it's LDAP, then you should use SIMPLE and you can ignore the TLS properties.
... View more
06-20-2017
08:17 PM
The XML path must follow the following requirements: http://commons.apache.org/proper/commons-configuration/userguide/howto_hierarchical.html
I think that's doable. Not sure this is the best approach if you have 100s of input directories though. If you have one input directory for one output directory, is there a way to compute the destination directory based on the path of the input directory? Could be easier to use expression language on the input directory to define the output one.
... View more
06-20-2017
07:58 PM
Looks like your LDAP configuration is incorrect. Is it LDAPS or LDAP? It seems to be an error related to SSL/TLS parameters.
... View more
06-20-2017
03:39 PM
1 Kudo
First of all, you don't need to use both GetFile and FetchFile. GetFile is fine, but if you want to use FetchFile, it must be used in combination with ListFile. See article about List/Fetch pattern. Then you want to send the path in the flow file attributes, not in the content. And there is a slash missing at the beginning of your XPath expression. And now I realized that I misunderstood what you are trying to achieve. I didn't understand that you have two different files with one containing the destination path. I thought it was one single file. So... basically, what I suggested is not going to be OK. But just in case, here is a template with what I had in mind. xpath.xml
Now let's focus on your use case. 🙂 You want to use a Lookup controller service that points to your configuration file. Then you can reference your controller service into a LookupAttribute processor that will extract the value from your configuration file and that will set it as an attribute of your flow file. Then the flow becomes: listFile, FetchFile, LookupAttribute, PutFile. Here is a template that should fulfill your requirements (just change the paths as needed). Don't forget that controller services are defined at process group level. Also note, if I'm correct, that this template requires latest version of NiFi to get it working. xmllookup.xml
... View more
06-20-2017
03:19 PM
2 Kudos
Hi MB, Yes, Zookeeper is used by a lot of components (for High Availability purpose), not only HBase. It is a mandatory and vital component.
... View more
06-20-2017
01:57 PM
1 Kudo
Hi @Pavan Challa, I'd recommend to use the EvaluateXPath processor: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.EvaluateXPath/index.html You can use the following XPath parameter: /config/path Extract it and put it as an attribute of your flow file and then you can use the way you want in the following steps. Hope this helps.
... View more