Actually, an easier way to ignore the column name duplication and still process the columns correctly, would be to use a schema to describe your data.
For example, say you have the following CSV:
col_a,col_b,col_b
1,2,3
4,5,6
You can configure your CSVReader with the following:
![araujo_0-1645579635567.png araujo_0-1645579635567.png](https://community.cloudera.com/t5/image/serverpage/image-id/33658iD8CADFD2DF202E82/image-dimensions/659x211?v=v2)
![araujo_1-1645579682280.png araujo_1-1645579682280.png](https://community.cloudera.com/t5/image/serverpage/image-id/33659i32B2357B8B9E18DA/image-size/medium?v=v2&px=400)
And the data will be processed correctly:
![araujo_2-1645579732546.png araujo_2-1645579732546.png](https://community.cloudera.com/t5/image/serverpage/image-id/33660i3C00D2C4AB51384B/image-dimensions/181x122?v=v2)
HTH,
André
--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.