I am trying to read data from kafka and writing them in parquet format via Spark Streaming.
The problem is, the data from kafka are in variable data structure.
For example, app one has columns A,B,C, app two has columns B,C,D. So the data frame I read from kafka has all columns ABCD. When I decide to write the dataframe to parquet file partitioned with app name,
the parquet file of app one also contains columns D, where the columns D is empty and it contains no data actually. So how to filter the empty columns when I writing dataframe to parquet?
Thanks!