Member since
01-11-2016
355
Posts
230
Kudos Received
74
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8279 | 06-19-2018 08:52 AM | |
3211 | 06-13-2018 07:54 AM | |
3657 | 06-02-2018 06:27 PM | |
3958 | 05-01-2018 12:28 PM | |
5496 | 04-24-2018 11:38 AM |
11-01-2017
11:09 AM
1 Kudo
Hi @Sherrine Green Thompson Joe gave a great explanation of NiFi vs Streamset here : https://stackoverflow.com/questions/36899612/difference-between-apache-nifi-and-streamsets I don't think NiFi and Talend share the same playground. NiFi is for flow management. Talend is ETL. Even if you can do transformation in NiFi it's not the same thing as Talend. For instance, NiFi can not join two tables where Talend can not deal with unstructured data. There are some common scenarios but conceptually they answer different needs. I hope this help clarifying the positioning
... View more
11-01-2017
10:57 AM
1 Kudo
@pranayreddy bommineni It's another story if you want to have this in one generic processor that you want to build. To do this you need to implement your logic in your Java code and write your own processor https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html
... View more
11-01-2017
10:18 AM
2 Kudos
Hi @balalaika What's your NiFi version? this is a known issue resolved in NiFi 1.2 https://issues.apache.org/jira/browse/NIFI-3213 Even after this, there are several corner situations that the initial design of List* processor try to avoid. For instance, a file being written when List is fired should not be listed. Also, files created just after microsecond after List can be missed if the source system support timestamp with seconds granularity. So it's a tradeoff between missing some files, or delaying ingestion. List processor keeps only the timestamp of the last file ingested and not the list of files (for scalability reasons). To deal with these situations, the last few files are not listed and kept for the next time. The design has been improved in NiFi 1.4, check it out : https://issues.apache.org/jira/browse/NIFI-4069 and https://issues.apache.org/jira/browse/NIFI-3332 Try to investigate on this information and see from where your problems come from. What you can do also is use a cron to run at 3:00 and 3:05 to get files missed the first time (assuming your data comes every 24hours). If you have seconds precisions in your listing then try the new processors in NiFi 1.4 which add a property "Target System Timestamp Precision". Keep in mind that another corner case is not dealt with today which is writing a file at Time T, whith timestamp T2 knowing that T2 < T https://issues.apache.org/jira/browse/NIFI-2383
... View more
11-01-2017
09:51 AM
1 Kudo
What are your constraints? Can you describe in more details what you want to do with data? if it's only getting data a select * will bring data whatever the schema is. I am trying to understand your use case to see what's the best way to do it and what are the limits. Also, depending on the database, you may query directly to get a schema like "show create table" with MySQL but this need further processing to make the schema useful. Finally ExtractAvroMetadata may help since SQL processor get data in Avro. But again, this requires querying the table a first time.
... View more
11-01-2017
09:31 AM
1 Kudo
@pranayreddy bommineni Maybe InferAvroSchema will be useful for you. You can query the table to get only one row then use InferAvroSchema to get the schema. Is this something useful ?
... View more
10-31-2017
10:19 PM
Hi @Aviram Voda Good news for your. Spark 2.2 is supported in HDP 2.6.3 annonced today : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_release-notes/content/comp_versions.html
... View more
10-31-2017
09:56 PM
Hi @Charles Bradbury For your information, Spark 2.2 is supported in HDP 2.6.3 annonced today : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.3/bk_release-notes/content/comp_versions.html
... View more
10-26-2017
09:09 AM
Hi @Andre Labbe That's good news. Following NIFI-3694, it looks like logback has been updated to 1.2.3 on NiFi 1.2.
... View more
10-25-2017
07:43 PM
@uri ben-ari You can use Ambari API to delete services from the host, then delete the host : https://cwiki.apache.org/confluence/display/AMBARI/Using+APIs+to+delete+a+service+or+all+host+components+on+a+host
... View more
10-25-2017
07:37 PM
@dhieru singh For this use GetFTP instead of FetchFTP
... View more