Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How Read data with Pipe delimiter and semicolon using Pyspark

avatar
New Contributor

Can any one help me on my request please.

Input File Records

"ABC":"Mobile"|"XYZ":"Tablet"|"LKJ":"MAC"|"TIME":"US"

Need output like below

ABC     |XYZ    |LKJ  |TIME

Mobile  |Tablet |MAC | US

Am reading with databricks with pipe delimiter and its giving number of columns from there onwards how can move forward??

I used pyspark.sql.function.split method. am not getting requried ouput format.

 

I worked on all possible scenario's . Could you please me.

 

 

 

 

1 REPLY 1

avatar
New Contributor

*Reading thhe file from lookup file and location and country,state column for each record
step 1:*

for line into lines:
SourceDf = sqlContext.read.format("csv").option("delimiter","|").load(line)
SourceDf.withColumn("Location",lit("us"))\
.withColumn("Country",lit("Richmnd"))\
.withColumn("State",lit("NY"))

*step 2:
looping each column from above DF and doing split operation but am getting only two column in KeyValueDf.*

for col_num in SopurceDf.column:
InterDF = pyspark.sql.fucntion.split(SourceDf[col_num],":")
KeyValueDF = SourceDf.withColumn("Column_Name",InterDF.get(0))\
.withColumn("Column_value",InterDf.get(1))


*in step 1 : Data Splited with Pipe and created 60 columns
in Step 2: again i want to split output of step1 with Semicolon.*

*Can any one help me please how get expected result. .*

 

*File format :
ABC:"MobileData"|XYZ:"TableData"|ZXC:"MacData"|MNB:"WindowData"
result:
ABC           | XYZ         |ZXC        |MNB
MobileData TabletData MacData WindowData*