Support Questions

ShobhitSingh · ‎02-09-2023

I'm facing weird issue, not sure why Spark is behaving like this.

samplefile.txt:

COL1|COL2|COL3|COL4 
"1st Data"|"2nd ""\\\\P"" data"|"3rd data"|"4th data"

This is my spark code to read data:

val df = spark.read.format("csv").option("header","true").option("inferSchema","true").option("delimiter","|").load("\samplefile.xtx")
df.show(false)

Some how it is combining 2 columns data into one. Spark Scala : 2.4 Version

Any idea why spark is behaving like this.

RangaReddy · ‎03-30-2023

Hi @ShobhitSingh

You need to adjust the csv file

sample.csv
=========

COL1|COL2|COL3|COL4
1st Data|2nd|3rd data|4th data
1st Data|2nd \\P data|3rd data|4th data
"1st Data"|"2nd '\\P' data"|"3rd data"|"4th data"
"1st Data"|"2nd '\\\\P' data"|"3rd data"|"4th data"

Spark Code:

spark.read.format("csv").option("header","true").option("inferSchema","true").option("delimiter","|").load("/tmp/sample.csv").show(false)

Output:

+--------+--------------+----------+--------+
|COL1 |COL2 |COL3 |COL4 |
+--------+--------------+----------+--------+
|1st Data|2nd |3rd data |4th data|
|1st Data|2nd \\P data |3rd data |4th data|
|1st Data|2nd '\P' data |3rd data |4th data|
|1st Data|2nd '\\P' data|3rd data |4th data|
+--------+--------------+----------+--------+

View solution in original post

steven-matison · ‎02-09-2023

@ShobhitSingh You need to handle the escape with another option:

.option("escape", "\\")

You may need to experiment with the actual string in the match argument ("//") to suit your needs. Be sure to check spark docs specific to your version. For example:

https://spark.apache.org/docs/latest/sql-data-sources-csv.html

ShobhitSingh · ‎02-09-2023

Hi Steven,

Even if my data is like this, its causing issue.

"1st Data"|"2nd ""\P"" data"|"3rd data"|"4th data"

What is causing issue? Any Idea.

I know spark is having default escape as backslash. But why it is behaving like this.

steven-matison · ‎02-09-2023

Click into that doc and check out the other escape option. I think you need to handle the quotes too.

RangaReddy · ‎03-30-2023