Member since
03-16-2022
1
Post
0
Kudos Received
0
Solutions
03-16-2022
01:42 AM
In general, different tasks necessitate varying degrees of flexibility. I've used them on datasets with varied column counts, where the first n columns are always the same, but the next n columns range from 3 to 500. Placing this into a DataFrame with an ArrayType column allows you to do any usual Spark processing while maintaining the data attached. If necessary, other processing steps can explode the array to separate rows, or I can access the complete set.
... View more