Support Questions

rbalakrishnantc · ‎11-15-2016

I have a CSV file with 9 columns. How can I remove duplicates among columns 4 through 9?

What we tried:

1. Split 1-4 columns in a file

2. Split 4-9 columns -> Deduplicate records

Now, i tried using 'ReplaceTextWithMapping' to merge the files with 4th column (Common on both files). But I am not sure if my approach is right.

Is there any other way to achieve this.

pvillard · ‎11-15-2016

Hi @bala krishnan ,

Not a solution but just to let you know that with the next version of NiFi (coming soon) you will be able to use ValidateCSV processor to achieve what you are looking for. In the meantime, I think that splitting the file is not going to help. Maybe trying something custom with ExecuteScript processor but probably not ideal.

Hope this helps.

View solution in original post

pvillard · ‎11-15-2016

Hi @bala krishnan ,

Not a solution but just to let you know that with the next version of NiFi (coming soon) you will be able to use ValidateCSV processor to achieve what you are looking for. In the meantime, I think that splitting the file is not going to help. Maybe trying something custom with ExecuteScript processor but probably not ideal.

Hope this helps.

Cloudera Community

Support Questions

ReplaceTextWithMapping processor. De-duplicate only specific columns in a file

NIFI processor for converting a numeric type to st...

NiFi: ReplaceTextWithMapping processor

How to consume kafka messages from a specific offs...

ReplaceTextWithMapping to replace value with space

How to update in phoenix based on specific columns...

Uploading Files for Cloudera Support - alternate m...

How to change column Type in SparkSQL?

how to read file content and extract specific line...

How to merge multiple HDFS files using Nifi Proces...

How to Suppress some specific unwanted log entries...