About zeno12

zeno12 · ‎03-16-2022

In general, different tasks necessitate varying degrees of flexibility. I've used them on datasets with varied column counts, where the first n columns are always the same, but the next n columns range from 3 to 500. Placing this into a DataFrame with an ArrayType column allows you to do any usual Spark processing while maintaining the data attached. If necessary, other processing steps can explode the array to separate rows, or I can access the complete set.

Online	Offline
Last Visited	‎04-01-2022 01:18 PM

Member Since	‎03-16-2022 01:40 AM
Last Visited	‎04-01-2022 01:18 PM
Posts	1

Cloudera Community

Re: Uses of Complex Spark SQL Data Types