Member since
02-22-2016
60
Posts
71
Kudos Received
27
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4262 | 07-14-2017 07:41 PM | |
1194 | 07-07-2017 05:04 PM | |
4802 | 07-07-2017 03:59 PM | |
827 | 07-06-2017 02:59 PM | |
2679 | 07-06-2017 02:55 PM |
08-31-2016
09:03 PM
5 Kudos
@Saikrishna Tarapareddy, Apache NiFi 1.0 includes the JoltTransformJSON processor, which embeds the Jolt JSON-to-JSON transformation library. JoltTransformJSON can indeed be used to flatten JSON. You can see an example of using the processor here: https://community.hortonworks.com/articles/44726/json-to-json-simplified-with-apache-nifi-and-jolt.html And there's an example of a Jolt spec to flatten JSON here: http://jolt-demo.appspot.com/#bucketToPrefixSoup HDF 2.0, forthcoming, includes Apache NiFi 1.0, so going forward this will likely be the right way to do it.
As an alternative, you could use a scripted processor to do something similar. See: http://funnifi.blogspot.com/2016/02/executescript-json-to-json-conversion.html
... View more
08-30-2016
08:24 PM
@Pravin Battle your cluster name, host name/IP, port or authentication credentials are probably not correct, or maybe you don't have Elasticsearch configured to listen on the right IP/port. I'd double check that cluster.name, network.host and transport.tcp.port in elasticsearch.yml are as expected. If you run netstat or telnet/nc to port 9500 on localhost what do you get? Can you curl the REST API?
... View more
08-30-2016
07:34 PM
1 Kudo
@gkeys, in short no. There's nothing out of the box for integrated mainframe offload/ingest. That said there are several options that people have been pursuing for mainframe data movement using Apache NiFi (I've seen/talked to people doing all of these): Text flat file export with List/FetchSFTP and SplitText Binary flat file export with custom processors for record processing (e.g., using JRecord or 100% homegrown code) 3rd party mainframe offload/ETL tools like Syncsort While dumping the data out in other formats or writing a custom processor works well, in scenarios with large numbers of record formats this doesn't scale in terms of processor or data flow development. That's when you should turn to products whose sole purpose is handling mainframe data.
... View more
08-30-2016
02:39 PM
2 Kudos
@Prakash Thangavel, installing NiFi on Windows is pretty straight forward but there are a couple of limitations right now: There isn't an installer There isn't any Windows services integration for start/stop/restart That said, it runs fine (and the Apache project does include Windows builds in its continuous integration). So, assuming you have a JDK/JRE installed, you can just download either the HDF or Apache NiFi .zip or .tar.gz, uncompress it to whatever desired location, and then run it by executing the batch script ./bin/run-nifi.bat. I've seen a few issues pop up with installations on Windows, so there are some things to keep in mind: Make sure to use a 64-bit Java 1.7 or 1.8 JDK/JRE, especially if you're going to increase the heap size in ./conf/boostrap.conf. Recent versions of Windows sweep through C:\Program Files periodically and set all files to read-only, so installing NiFi in C:\Program Files doesn't work unless you change the location of ./conf/flow.xml.gz, ./conf/archive, ./logs, ./state, ./work, and all of the data repositories (./*_repository). Basically, install it somewhere besides C:\Program Files.
... View more
06-13-2016
06:00 PM
There are two things you could do here: Use the AWS SDK to do a prefix listing, parallelize the result, and then probably do a mapPartitions, applying the following approach for multi-deletes: http://docs.aws.amazon.com/AmazonS3/latest/dev/DeletingMultipleObjectsUsingJava.html Use two buckets, one for the original files with a lifecycle policy that will apply the deletes, and another for the rolled up data: http://docs.aws.amazon.com/AmazonS3/latest/dev/delete-or-empty-bucket.html#delete-bucket-lifecycle
... View more
06-13-2016
05:47 PM
Typically no. You're actually limited on the number of buckets you can create whereas number of objects, and thus prefixes, effectively not. The situation where you want different buckets is where you want to specify different bucket policies; e.g., for data lifecycle (+/- versioning, automatic archive to glacier), security, and environment (dev, test, prod). The design of prefixes/key names/directories should then be guided by your access patterns with similar sorts of considerations you have for organizing data in HDFS. Listings over prefixes/recursive listings can be slow, so thinking along those terms, if you're going to do listings you'll want enough hierarchy or structure to your key names that those result sets don't get huge. If you're only ever going to do access to specific keys, this is less of an issue.
... View more
06-13-2016
02:10 PM
Probably yes. A mostly unimportant thing here is that since S3 is really just a key value store the path delimiters are only meaningful to the higher level APIs and tools; i.e., stuff that lets you do prefix listings. So it can be anything, but in reality though people almost never change it from the default '/'.
... View more
06-13-2016
11:32 AM
1 Kudo
@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.
... View more
06-04-2016
10:08 PM
5 Kudos
I think there are 4 options you can consider: Use ExecuteScript with the current code on NiFi's classpath or in ExecuteScript's module directory. It doesn't matter whether it's a static method, but if it's not, the relevant class will get instantiated on each call to onTrigger, so if it's heavy, then, yeah, it'll be inefficient. Use InvokeScriptedProcessor. This is like ExecuteScript but basically lets you implement the full set of processor lifecycle methods so you'll be able to reuse resources on successive calls to onTrigger. Create a custom processor (and/or controller service) which depends on your code. Again, you'll gain control over the entire processor lifecycle, be able to reuse local resources and if you need to implement things like connection/resource pools you can create them with controller services. The NiFi Developer Guide [1] has lots of information about this. Also, there's a Maven archetype [2] available. If you really don't want to embed your code in NiFi, you can have NiFi communicate with your process via REST, TCP or its own site-to-site protocol. In fact, NiFi communicates with streaming frameworks such as Storm, Spark and Flink using it's site-to-site protocol client [3] so this pattern is pretty well-established. You can check out an example of this with the Spark receiver [4]. References: [1] https://nifi.apache.org/developer-guide.html [2] https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions [3] https://github.com/apache/nifi/blob/master/nifi-commons/nifi-site-to-site-client/src/main/java/org/apache/nifi/remote/client/SiteToSiteClient.java [4] https://github.com/apache/nifi/blob/master/nifi-external/nifi-spark-receiver/src/main/java/org/apache/nifi/spark/NiFiReceiver.java
... View more
05-08-2016
09:13 AM
4 Kudos
Raj, I think you can achieve what you're looking to do using an ExecuteScript processor after ExtractHL7Attributes to match against and rename the attributes in question. For example, the Ruby script attached below matches against the segment name and field index, and then uses the `element_names` mapping to rewrite the field index to its respective element name while keeping the segment name and segment index the same. That said, with as many segments and elements as are in HL7, having to include this mapping may be more cumbersome than you were hoping. You can of course externalize it to a file or DB, but since the HL7 element names are fixed, I don't think the overhead is worth it. I thought a little bit about whether you could use ReplaceTextWithMapping here too but you'll still have to rewrite the attribute names so you're mapping on a contiguous string -- we need to match against segment name and field index, but segment index is in between in the ExtractHL7Attributes output.
... View more
- « Previous
- Next »