Created 08-13-2017 12:01 PM
I need to process CSV file through spark , I need load CSV file into hive tables through spark however my files itself has comma in data not as a separator but as a content at several places in this case there are three questions
1) How will spark identify that this is not a separator and consider this comma as a content of data
2) How can we process such data and load into hive including comma which is content and not a separator
Please share some techniques to achieve above points.
Created 08-13-2017 06:45 PM
Hi @HDave,
if your text fields have a double quotation or something like that, it shouldn't be a problem.
In the other case: You can't use a delimiter which is used in the fields you want to separate!
So to answer your questions:
1) Do you have a sample dataset? Maybe you can try some fancy regex stuff (though I don't think it will work in most cases).
2) As mentioned before, you should use double quotation marks for text fields. But best practice would be just to use a delimiter which isn't used by your fields.
Best regards
Jan