Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Iterating through nested fields in spark DF

avatar
Explorer

I have a dataframe with following schema :-

scala> final_df.printSchema
root
 |-- mstr_prov_id: string (nullable =true)|-- prov_ctgry_cd: string (nullable =true)|-- prov_orgnl_efctv_dt: timestamp (nullable =true)|-- prov_trmntn_dt: timestamp (nullable =true)|-- prov_trmntn_rsn_cd: string (nullable =true)|-- npi_rqrd_ind: string (nullable =true)|-- prov_stts_aray_txt: array (nullable =true)||-- element: struct (containsNull =true)|||-- PROV_STTS_KEY: string (nullable =true)|||-- PROV_STTS_EFCTV_DT: timestamp (nullable =true)|||-- PROV_STTS_CD: string (nullable =true)|||-- PROV_STTS_TRMNTN_DT: timestamp (nullable =true)|||-- PROV_STTS_TRMNTN_RSN_CD: string (nullable =true)

I am running following code to do basic cleansing but its not working inside "prov_stts_aray_txt" , basically its not going inside array type and performing transformation desire. I want to iterate through out nested all fields(Flat and nested field within Dataframe and perform basic transformation.

for(dt <- final_df.dtypes){
  final_df = final_df.withColumn(dt._1,when(upper(trim(col(dt._1)))==="NULL",lit(" ")).otherwise(col(dt._1)))}

please help. Please note it's just sample DF actual DF holds multiple array struct type with different number of field in it. Hence which I need to create is in dynamic fashion.

Thanks

2 REPLIES 2

avatar
Explorer

avatar
Super Collaborator