Support Questions
Find answers, ask questions, and share your expertise

how to process comma(",") through spark if source file is CSV(comam delimited) and data itself has comma(",") somewhere?

Explorer

I need to process CSV file through spark , I need load CSV file into hive tables through spark however my files itself has comma in data not as a separator but as a content at several places in this case there are three questions

1) How will spark identify that this is not a separator and consider this comma as a content of data

2) How can we process such data and load into hive including comma which is content and not a separator

Please share some techniques to achieve above points.

1 REPLY 1

Re: how to process comma(",") through spark if source file is CSV(comam delimited) and data itself has comma(",") somewhere?

Hi @HDave,

if your text fields have a double quotation or something like that, it shouldn't be a problem.

In the other case: You can't use a delimiter which is used in the fields you want to separate!

So to answer your questions:

1) Do you have a sample dataset? Maybe you can try some fancy regex stuff (though I don't think it will work in most cases).

2) As mentioned before, you should use double quotation marks for text fields. But best practice would be just to use a delimiter which isn't used by your fields.

Best regards

Jan