About gkeys

mqureshi · ‎09-04-2016

Hi @gkeys I think Nifi and Sqoop are two different tools serving two different use cases and cannot directly be compared, at least not yet. Sqoop is bundled with bulk loading adapters developed by database vendors and/or Hortonworks together. The purpose of Sqoop is bulk loading of data to and from the RDBMS. It uses fast connectors designed for bulk loading. Sqoop's performance is measured based on the bulk loading tool it is using. Since these are specialized, bulk loading tools designed for batch jobs, Sqoop really shines with these use cases. Nifi on the other hand is a system designed to move data within the organization as well as bring data in from outside sources and also facilitate data movement between data centers. The data Nifi helps move across is usually live data from applications, logs, devices and other sources producing event data. Given Nifi is so rich in its features, you can also use it to fetch data from lot of other sources including databases and files. For reading data from databases, Nifi uses a JDBC adapter. This would enable you to move x number of records at a time from some database. The bottleneck being the JDBC adapter. When we measure nifi's performance, we are not including the performance of fetching data from the source. What we are measuring is how fast Nifi is able to move data across as soon as it gets it. That performance is documented here and it's about 50 MB/s of read/write on a typical server. Can a JDBC source deliver data at this rate? Honestly, I doubt it but this has nothing to do with Nifi. It's more a function of the driver and the database and a lot of other variables just like in any other jdbc program.

gkeys · ‎09-02-2016

@Ashwini Maurya A good starting point is here: http://hortonworks.com/developers/

gkeys · ‎08-21-2016

One minor thing to remember about the answer is that maximum number of entries must be left blank

jayachander_it · ‎08-23-2016

Second option didn't work. I have installed jdk, need to check whether that might solve the issue.

myoung · ‎08-20-2016

@gkeys This is a great article and filled with helpful tips!

76_subhasis · ‎03-18-2019

SYNCSORT Ironstream is the best option in this case.

mohamed_emadald · ‎10-12-2017

Hi I made scenario like above but invoke http get return json and target.url how to let invoke http return json only

rgelhausen · ‎07-27-2016

"ETL" which is embarassingly parallel (all processing logic can execute completely based purely on the contents of the incoming record itself) is in NiFi's sweet spot. ETL which requires lookups for billions of records, or which must perform "group by" operations fits better in traditional Hadoop solutions like Hive, Pig, or Spark.

mburgess · ‎08-01-2018

If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.

MattWho · ‎07-18-2016

@gkeys What are the permissions on both the file(s) you are trying to pickup with the GetFile processor and the permissions on the directory the file(s) live in? -rwxrwxrwx 1 nifi dataflow 24B Jul 18 18:20 testfile and drwxr-xr-- 3 root dataflow 102B Jul 18 18:20 testdata With the above example permission, I reproduce exactly what you are seeing. If "Keep Source File" is set to true, NiFi creates a new flowfile with the content of the file. If "Keep Source File" is set to false, NiFi GetFile yields because it does not have the necessary permissions to delete the file from the directory. This is because the write bit is required on the source directory for the user who is trying to delete the file(s). In my example nifi is running as user nifi, so he can read the files in the root owned testdata directory because the directory group ownership is dataflow just like my nifi user and the dir has r-x permissions. fi i change that dir permissions to rwx then my nifi user will also be able to delete the testfile. Thanks, Matt

Online	Offline
Last Visited	‎06-11-2019 01:24 AM

Member Since	‎06-20-2016 01:29 PM
Last Visited	‎06-11-2019 01:24 AM
Posts	488
Kudos received	430

Cloudera Community

Re: DR for hadoop

Re: API + how to know by API command all machines ...

Re: Does data get copied in edge node from externa...

Re: is it possible to set the hadoop.tmp.dir value...

Re: How to handle nulls when exporting from Hive?

Re: Are there benchmark results available for NiFi...

Re: How to start project in Hadoop, I have complet...

Re: How do I best set MergeContent properties to c...

Re: How to download JSON files from live feed?

Re: Pig Doing Yoga: How to Build Superflexible Pig...

Re: Is there a NiFi processor for mainframe VSAM i...

Re: Using NiFi to ingest and transform RSS feeds t...

Re: Nifi ETL: Principles or decision points for ET...

Re: Nifi ExecuteScript: Using external libraries w...

Re: Why does NiFi GetFile work with KeepSourceFile...