I read that Spark SQL has three complex data types: ArrayType, MapType, and StructType. When would you use these? I'm confused because I was taught that SQL tables should never, ever contain arrays/lists in a single cell value, so why does Spark SQL allow having arraytype?
Complex types are generally used to aggregate the characteristics of an object, for example:
current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING>
So now we have the 'current_address' attribute and its members grouped.
This is not only organizational, but also has an impact on the performance of the processes related to this table.
When you want to retrieve a data it can be done like this:
SELECT id, name, current_address.street_address.street_number, current_address.street_address.street_name, current_address.street_address.street_type, current_address.country, current_address.postal_code FROM struct_demo;
Despite the example they are giving, it refers to Apache Impala the concept is the same applied to Spark.
Hope this helps.