Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11124 | 04-15-2020 05:01 PM | |
| 7026 | 10-15-2019 08:12 PM | |
| 3068 | 10-12-2019 08:29 PM | |
| 11254 | 09-21-2019 10:04 AM | |
| 4190 | 09-19-2019 07:11 AM |
06-16-2018
09:37 PM
@Bal P For avro data file schema will be embedded and you want to write all the fields in CSV format so we don't need to setup the registry. if you are writing only specific columns not all and for other formats JSON.. then schema registry is required
... View more
06-14-2018
01:00 PM
@rajat puchnanda Merge a group of flowfile (or) records is possible with MergeContent/MergeRecord processors. Example: if flowfile(ff1) having 123 records then ff2 having 345 by using mergecontent/record processors we can merge these flowfiles in to one like 123345. Merge means combining the group of records/flowfiles(union all) ,if you want to remove duplicates(i.e 3 is duplicate record) from the combined record flowfile content then you can use QueryRecord Processor with row_number window function to eliminate duplicates. This scenario is possible with NiFi without using lookup record processors. But as you mentioned in one of the answer Scenario2: InputFile 1 deptid firstname lastname 1 Aman Sharma 2 Raman Verma InputFile 2 deptid salary email 1 20000 abc@gmail.com 2 30000 bgf@gmail.com OutputFile(By merging file1 and file2):- deptid firstname lastname salary email 1 Aman Sharma 20000 abc@gmail.com 2 Raman Verma 30000 bgf@gmail.com This is not possible with MergeContent/Record but you can try with QueryRecord processor by implementing group and collect as set (or) some sort of sql logic in queryrecord processor to transpose the data into your desired format.This query Would be intensive if you are doing on larger number of records.
... View more
01-25-2019
10:44 PM
What is better choice and why? Repairing a existing table or recreating it ?
... View more
06-11-2018
05:28 PM
Thanks it works.
... View more
06-10-2018
01:04 PM
1 Kudo
@gal itzhak List s3 -> Fetch s3 objects -> merge content(output as avro) -> convert Avro to orc -> put s3 object the above approach is correct merge content processor won't support merge format as orc,we still needs to merge all the avro files into one then feed it into AvroToOrc processor. Supported Merge Formats in NiFi: You can either use Merge Record processor also which reads incoming flowfile and writes the merged flowfile based on the configured Record Writer and merges the records based on the configurations. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.MergeRecord/index.html How to merge small orc files then? Merging small orc files we still need to do through hive/spark Compacting small files using Concatenate: As your storing the the orc file to S3 then you can merge orc files and if you are having hive table on top of s3 orc files. Use Alter table concatenate to merge of small ORC files together by issuing a CONCATENATE command on their table or partition. The files will be merged at the stripe level without reserialization ALTER TABLE istari [PARTITION partition_spec] CONCATENATE; refer to this link for more details regarding concatenate. (or) 2.Compacting small files without using Concatenate: step1: Let's assume your final orc table having thousands of small orc files then Create a temporary table by selecting the final table as hive> create table <db.name>.<temp_table_name> stored as orc as select * from <db_name>.<final_table>; step2: Now we have created temp_table by selecting all the data from final table, then again overwrite the final table by selecting the temp table data you can use order by/sort by/distribute by clauses to create new files in the final table with the even distribution. hive> insert overwrite table <db_name>.<final_table> select * from <db.name>.<temp_table_name> order by/sort by <some column>; in addition you can set all the hive session properties before overwriting the final table. By following this approach until the overwrite job gets completed, make sure any other applications are not writing data into final table because we are doing overwrite from temp table if other applications are writing the data to final table we are going to loose the data. Refer to this ,this and this links will describe more details regarding how to compact small files. - If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more
06-09-2018
03:24 AM
Fantastic and detailed reply. I would try this out and reply if that works .Thanks a lot @Shu
... View more
06-12-2018
11:25 AM
@RAUI
Does the answer helpful to resolve your issue..!! Take a moment to Log in and Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues and close this thread.
... View more
06-07-2018
11:25 AM
1 Kudo
@aman
mittal
Great, Good to know that..!! Other way of doing by checking the length of fragment.index attribute then using nested ifelse statements to determine the prepend by 00,0. but the expression will become complex using Advanced property will be good approach. If the Answer addressed your question, Take a moment to Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
... View more