About Alans

Alans · ‎12-23-2020

Thanks for your answer, but I prefer not changing the data file as the data fields may contain comma or line break Is there a possible way to import the file directly? Thanks & Merry Christmas 🙂

Alans · ‎12-22-2020

Thanks for your reply, but it seems your script doesn't work The dataset delimiter is shift-out (\x0f) and line-separator is shift-in (\x0e) in pandas, i can simply load the data into dataframe using this command: df1 = pd.read_csv("/folder/file.gz", sep = '\x0f', lineterminator = '\x0e' ) May I know how to do this in spark?

Alans · ‎12-21-2020

Hi All, I'm new to spark and I'm looking on how to import a csv with custom liner separator into a DataFrame. I'm using CDH 2.2.0. Data: ID/x0fRegion/x0e1/x0fUS/x0e2/x0fRussia/x0e Expected DataFrame: ID Region 1 US 2 Russia I tried to use spark.read.csv with lineSep argument, but it seems my spark version doesn't support it. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader Any suggestion? Thanks

Online	Offline
Last Visited	‎01-04-2021 09:20 PM

Member Since	‎12-21-2020 07:57 PM
Last Visited	‎01-04-2021 09:20 PM
Posts	3

Cloudera Community

Re: Line Separator in Spark

Re: Line Separator in Spark

Line Separator in Spark