Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11227 | 04-15-2020 05:01 PM | |
| 7131 | 10-15-2019 08:12 PM | |
| 3114 | 10-12-2019 08:29 PM | |
| 11499 | 09-21-2019 10:04 AM | |
| 4343 | 09-19-2019 07:11 AM |
11-19-2018
10:03 PM
@Wei Wu ListS3 Processor doesn't allow any upstream connections if you want to fetch some files from the S3 object without listS3 processor then you need to keep your filenames as attributes to the flowfile then use FetchS3 object processor to fetch the files from s3 bucket. To get all the filenames from the S3 bucket you need to use some kind of RestApi call or shell script to list out all filenames from the Bucket then extract the filenames as attributes to the flowfile. Then feed the connection to the FetchS3 processor to fetch actual file contents without ListS3 processor. Refer to this link to get more context over these kind of usecase.
... View more
11-19-2018
09:56 PM
@ Justen Starting from NiFi-1.8.0 we are going to have filename same as flowfileuuid so you are able to search based on flowfileuuid to find out the specific filename. If you are using prior to NiFi-1.8 then you can search based on filename using below process instead of clicking on "i" Click on the right search then you are able to view search events box then keep your desired filename overthere then click on search to get only the specific file from the provenance.
... View more
11-19-2018
09:43 PM
@Henrik Olsen Based on Number Of Records To Analyze property value NiFi will analyze those many records (or) based on each flowfile number of records to determine type for the record. If we keep 1million records to analyze if you are having one flowfile with 1 million records then only the value will be considered (or) processor will limit through number of records in the flowfile. . i think there are no null values for the columns that's why NiFi inferavroschema processor not able to add null as default type for some columns(in case of empty spaces they are not treated as null values for the string type).
... View more
11-19-2018
09:27 PM
@Jacob Paul I believe your flowfiles having source-date1 attribute with value 20181119112100. Then change your update attribute property values as source-date as ${source-date1:substring(0,8)} source-time as ${source-date1:substring(8,13)} Then update attribute adds these flowfile attributes for all outgoing flowfiles from UpdateAttribute processor. In addition you can also perform same kind of operation without extracting as attributes using QueryRecord processor. Configure/Enable Record Reader/Writer controller services and use apache-calcite's Substring function to create source-date,source-time columns in the flowfile.
... View more
11-15-2018
10:14 PM
1 Kudo
@ravi kargam As a work around Instead of using replaceRegex function use replace function with below configs: UpdateRecord Configs: Replacement Value Strategy
Record Path Value /employeeName replace(/employeeName,'[','(') (or) If you are updating only one column value then Replacement Value Strategy
Literal Value /employeeName ${field.value:replaceAll('\\[','(')}
... View more
11-15-2018
05:17 PM
@n c Once you white list the param in ambari then you are able to set the parameter in hive cli.
... View more
11-15-2018
03:12 AM
2 Kudos
@Henrik Olsen The same exact case is introduced in NiFi-1.6 version jira addressing this bug NiFi-4883. Starting from NiFi-1.6 we are able to use one record writer for invalid records and use different record writer for the valid records.
... View more
11-15-2018
03:00 AM
@Julio Gazeta I think this thread also having same issue hitting max back pressure on the queue. Same fix as described here: https://community.hortonworks.com/questions/227489/apache-nifi-distribution-trouble-in-cluster-spark.html will be applicable for this thread also.
... View more
11-15-2018
02:03 AM
1 Kudo
@n c So the month ie "10" is actually appearing as part of the table data. Is that correct? Yes this is correct, when we create partition table we are going to have all partition columns at the end of the column list. Partitions are going to boost the query performance when we are using partition column in out where clause. Example: if you want to count number of records are in mth=10 then select count(*) from test_par_tbl where mth=10; Now the above query won't do full table scan as predicate only scan the mth=10 partition and shows up the result. when dealing with 100's of million datasets partitions will be optimization techniques to boost up the query performances by avoiding full table scans. 2.Even with out partition field in where clause you can still able to run the below query but this will do full table scan select count(*) from test_par_tbl where month(create_dt)=10; Both these queries will give you same results but taking performance as consideration on big data sets first query will run more efficiently. Is it possible to partition the table as above and not have the partition column/value as part of the table data? This is not possible because if you won't have partition column as part of table data then hive will do full table scan on the entire dataset. If you still want to take off the partition column from the dataset, then create a view on top of the partition_table it by excluding the column.
... View more
11-12-2018
11:52 PM
@Varun Yadav I don't think we can upload multiple templates at one time but you can keep all the templates in one folder and then read the filenames and pass each filename(using a loop) to curl api call to upload the template into NiFi canvas.
... View more