Created 08-04-2016 04:50 AM
I am reading a csv file into a spark dataframe. i have the double quotes ("") in some of the fields and i want to escape it. can anyone let me know how can i do this?. since double quotes is used in the parameter list for options method, i dont know how to escape double quotes in the data
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("delimiter", "|"). option("escape", -----
Created 08-04-2016 04:31 PM
Assume you have a file /tmp/test.csv" like
Col1|Col2|Col3|Col4 12|34|"56|78"|9A "AB"|"CD"|EF|"GH:"|:"IJ"
If I load it with Spark I get
val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true") .option("delimiter", "|").option("escape", ":").load("/tmp/test.csv") df.show() +----+----+-----+-------+ |Col1|Col2| Col3| Col4| +----+----+-----+-------+ | 12| 34|56|78| 9A| | AB| CD| EF|GH"|"IJ| +----+----+-----+-------+
So the example contains delimiter in quotes and escaped quotes. I use ":" to escape quotes, you can many other characters (don't use e.g. "#")
Is this something you want to achieve?
Created 08-05-2016 06:12 AM
@Bernhard Walter, my requirement is similar. File with data like
Col1|Col2|Col3|Col4
"AB"|"CD"|"DE"| "EF"
and i need an output like after loading into a dataframe
Col1|Col2|Col3|Col4
AB|CD|DE|EF
I dont see your suggestion working. How will escaping : escape doble quotes
Created 08-05-2016 07:44 AM
There is an issue with the space in front of "EF":
Let's use (you don't need the "escape" option, it can be used to e.g. get quotes into the dataframe if needed)
val df = sqlContext.read.format("com.databricks.spark.csv") .option("header", "true") .option("delimiter", "|") .load("/tmp/test.csv") df.show()
With space in front of "EF"
+----+----+----+-----+ |Col1|Col2|Col3| Col4| +----+----+----+-----+ | AB| CD| DE| "EF"| +----+----+----+-----+
Without space in front of "EF":
+----+----+----+----+ |Col1|Col2|Col3|Col4| +----+----+----+----+ | AB| CD| DE| EF| +----+----+----+----+
Can you remove the space before loading the csv into Spark?