- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How Read data with Pipe delimiter and semicolon using Pyspark
- Labels:
-
Apache Spark
Created on ‎12-17-2020 10:19 AM - edited ‎12-17-2020 10:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can any one help me on my request please.
Input File Records
"ABC":"Mobile"|"XYZ":"Tablet"|"LKJ":"MAC"|"TIME":"US"
Need output like below
ABC |XYZ |LKJ |TIME
Mobile |Tablet |MAC | US
Am reading with databricks with pipe delimiter and its giving number of columns from there onwards how can move forward??
I used pyspark.sql.function.split method. am not getting requried ouput format.
I worked on all possible scenario's . Could you please me.
Created ‎12-18-2020 08:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*Reading thhe file from lookup file and location and country,state column for each record
step 1:*
for line into lines:
SourceDf = sqlContext.read.format("csv").option("delimiter","|").load(line)
SourceDf.withColumn("Location",lit("us"))\
.withColumn("Country",lit("Richmnd"))\
.withColumn("State",lit("NY"))
*step 2:
looping each column from above DF and doing split operation but am getting only two column in KeyValueDf.*
for col_num in SopurceDf.column:
InterDF = pyspark.sql.fucntion.split(SourceDf[col_num],":")
KeyValueDF = SourceDf.withColumn("Column_Name",InterDF.get(0))\
.withColumn("Column_value",InterDf.get(1))
*in step 1 : Data Splited with Pipe and created 60 columns
in Step 2: again i want to split output of step1 with Semicolon.*
*Can any one help me please how get expected result. .*
*File format :
ABC:"MobileData"|XYZ:"TableData"|ZXC:"MacData"|MNB:"WindowData"
result:
ABC | XYZ |ZXC |MNB
MobileData TabletData MacData WindowData*
