Member since
06-20-2016
488
Posts
433
Kudos Received
118
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3106 | 08-25-2017 03:09 PM | |
1965 | 08-22-2017 06:52 PM | |
3393 | 08-09-2017 01:10 PM | |
8063 | 08-04-2017 02:34 PM | |
8114 | 08-01-2017 11:35 AM |
09-04-2016
07:09 PM
3 Kudos
Hi @gkeys I think Nifi and Sqoop are two different tools serving two different use cases and cannot directly be compared, at least not yet. Sqoop is bundled with bulk loading adapters developed by database vendors and/or Hortonworks together. The purpose of Sqoop is bulk loading of data to and from the RDBMS. It uses fast connectors designed for bulk loading. Sqoop's performance is measured based on the bulk loading tool it is using. Since these are specialized, bulk loading tools designed for batch jobs, Sqoop really shines with these use cases. Nifi on the other hand is a system designed to move data within the organization as well as bring data in from outside sources and also facilitate data movement between data centers. The data Nifi helps move across is usually live data from applications, logs, devices and other sources producing event data. Given Nifi is so rich in its features, you can also use it to fetch data from lot of other sources including databases and files. For reading data from databases, Nifi uses a JDBC adapter. This would enable you to move x number of records at a time from some database. The bottleneck being the JDBC adapter. When we measure nifi's performance, we are not including the performance of fetching data from the source. What we are measuring is how fast Nifi is able to move data across as soon as it gets it. That performance is documented here and it's about 50 MB/s of read/write on a typical server. Can a JDBC source deliver data at this rate? Honestly, I doubt it but this has nothing to do with Nifi. It's more a function of the driver and the database and a lot of other variables just like in any other jdbc program.
... View more
09-02-2016
04:14 PM
@Ashwini Maurya A good starting point is here: http://hortonworks.com/developers/
... View more
08-21-2016
02:41 PM
One minor thing to remember about the answer is that maximum number of entries must be left blank
... View more
08-23-2016
07:44 PM
Second option didn't work. I have installed jdk, need to check whether that might solve the issue.
... View more
08-20-2016
10:42 PM
@gkeys This is a great article and filled with helpful tips!
... View more
03-18-2019
11:43 AM
SYNCSORT Ironstream is the best option in this case.
... View more
10-12-2017
02:23 PM
Hi I made scenario like above but invoke http get return json and target.url how to let invoke http return json only
... View more
07-27-2016
05:22 AM
"ETL" which is embarassingly parallel (all processing logic can execute completely based purely on the contents of the incoming record itself) is in NiFi's sweet spot. ETL which requires lookups for billions of records, or which must perform "group by" operations fits better in traditional Hadoop solutions like Hive, Pig, or Spark.
... View more
08-01-2018
07:48 PM
If you need a number of dependencies like Hadoop for a script, you may want to consider creating an actual processor/NAR, that way you can inherit the nifi-hadoop-libraries NAR from your NAR, and it gives you access to the hadoop JARs from your code. Another alternative is to use Groovy Grab in your script to bring in the Hadoop dependencies you need. It will download another set of them to the Grapes cache, but you won't have to worry about getting all the transitive dependencies manually. A more fragile alternative is to add a NAR's working directory to your Module Directory property in ExecuteScript, for example the nifi-hadoop-libraries NAR's working directory for dependencies is something like: <NiFi location>/work/nar/extensions/nifi-hadoop-libraries-nar-<version>.nar-unpacked/META-INF/bundled-dependencies/ This directory doesn't exist until NiFi has been started and extracts the contents of the corresponding NAR to its working directory location.
... View more
07-18-2016
10:39 PM
1 Kudo
@gkeys What are the permissions on both the file(s) you are trying to pickup with the GetFile processor and the permissions on the directory the file(s) live in? -rwxrwxrwx 1 nifi dataflow 24B Jul 18 18:20 testfile and drwxr-xr-- 3 root dataflow 102B Jul 18 18:20 testdata With the above example permission, I reproduce exactly what you are seeing. If "Keep Source File" is set to true, NiFi creates a new flowfile with the content of the file. If "Keep Source File" is set to false, NiFi GetFile yields because it does not have the necessary permissions to delete the file from the directory. This is because the write bit is required on the source directory for the user who is trying to delete the file(s). In my example nifi is running as user nifi, so he can read the files in the root owned testdata directory because the directory group ownership is dataflow just like my nifi user and the dir has r-x permissions. fi i change that dir permissions to rwx then my nifi user will also be able to delete the testfile. Thanks,
Matt
... View more
- « Previous
- Next »