Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Iterating through nested fields in spark DF

Iterating through nested fields in spark DF

New Contributor

I have a dataframe with following schema :-

scala> final_df.printSchema
root
 |-- mstr_prov_id: string (nullable =true)|-- prov_ctgry_cd: string (nullable =true)|-- prov_orgnl_efctv_dt: timestamp (nullable =true)|-- prov_trmntn_dt: timestamp (nullable =true)|-- prov_trmntn_rsn_cd: string (nullable =true)|-- npi_rqrd_ind: string (nullable =true)|-- prov_stts_aray_txt: array (nullable =true)||-- element: struct (containsNull =true)|||-- PROV_STTS_KEY: string (nullable =true)|||-- PROV_STTS_EFCTV_DT: timestamp (nullable =true)|||-- PROV_STTS_CD: string (nullable =true)|||-- PROV_STTS_TRMNTN_DT: timestamp (nullable =true)|||-- PROV_STTS_TRMNTN_RSN_CD: string (nullable =true)

I am running following code to do basic cleansing but its not working inside "prov_stts_aray_txt" , basically its not going inside array type and performing transformation desire. I want to iterate through out nested all fields(Flat and nested field within Dataframe and perform basic transformation.

for(dt <- final_df.dtypes){
  final_df = final_df.withColumn(dt._1,when(upper(trim(col(dt._1)))==="NULL",lit(" ")).otherwise(col(dt._1)))}

please help. Please note it's just sample DF actual DF holds multiple array struct type with different number of field in it. Hence which I need to create is in dynamic fashion.

Thanks

2 REPLIES 2

Re: Iterating through nested fields in spark DF

New Contributor
Highlighted

Re: Iterating through nested fields in spark DF

Expert Contributor