Support Questions

VamshiPadig · ‎12-17-2020

Can any one help me on my request please.

Input File Records

"ABC":"Mobile"|"XYZ":"Tablet"|"LKJ":"MAC"|"TIME":"US"

Need output like below

ABC |XYZ |LKJ |TIME

Mobile |Tablet |MAC | US

Am reading with databricks with pipe delimiter and its giving number of columns from there onwards how can move forward??

I used pyspark.sql.function.split method. am not getting requried ouput format.

I worked on all possible scenario's . Could you please me.

VamshiPadig · ‎12-18-2020

*Reading thhe file from lookup file and location and country,state column for each record
step 1:*

for line into lines:
SourceDf = sqlContext.read.format("csv").option("delimiter","|").load(line)
SourceDf.withColumn("Location",lit("us"))\
.withColumn("Country",lit("Richmnd"))\
.withColumn("State",lit("NY"))

*step 2:
looping each column from above DF and doing split operation but am getting only two column in KeyValueDf.*

for col_num in SopurceDf.column:
InterDF = pyspark.sql.fucntion.split(SourceDf[col_num],":")
KeyValueDF = SourceDf.withColumn("Column_Name",InterDF.get(0))\
.withColumn("Column_value",InterDf.get(1))

*in step 1 : Data Splited with Pipe and created 60 columns
in Step 2: again i want to split output of step1 with Semicolon.*

*Can any one help me please how get expected result. .*

Cloudera Community

Support Questions

How Read data with Pipe delimiter and semicolon using Pyspark