Member since
12-21-2020
3
Posts
0
Kudos Received
0
Solutions
12-23-2020
06:27 PM
Thanks for your answer, but I prefer not changing the data file as the data fields may contain comma or line break Is there a possible way to import the file directly? Thanks & Merry Christmas 🙂
... View more
12-22-2020
07:57 PM
Thanks for your reply, but it seems your script doesn't work The dataset delimiter is shift-out (\x0f) and line-separator is shift-in (\x0e) in pandas, i can simply load the data into dataframe using this command: df1 = pd.read_csv("/folder/file.gz", sep = '\x0f', lineterminator = '\x0e' ) May I know how to do this in spark?
... View more
12-21-2020
08:03 PM
Hi All, I'm new to spark and I'm looking on how to import a csv with custom liner separator into a DataFrame. I'm using CDH 2.2.0. Data: ID/x0fRegion/x0e1/x0fUS/x0e2/x0fRussia/x0e Expected DataFrame: ID Region 1 US 2 Russia I tried to use spark.read.csv with lineSep argument, but it seems my spark version doesn't support it. https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader Any suggestion? Thanks
... View more
Labels:
- Labels:
-
Apache Spark