Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Uses of Complex Spark SQL Data Types

Uses of Complex Spark SQL Data Types

New Contributor

I read that Spark SQL has three complex data types: ArrayType, MapType, and StructType. When would you use these? I'm confused because I was taught that SQL tables should never, ever contain arrays/lists in a single cell value, so why does Spark SQL allow having arraytype?

1 REPLY 1
Highlighted

Re: Uses of Complex Spark SQL Data Types

New Contributor

Hi,

 

Complex types are generally used to aggregate the characteristics of an object, for example:

Based on: https://impala.apache.org/docs/build/html/topics/impala_struct.html#struct
type:

current_address STRUCT <
        street_address: STRUCT
        <street_number: INT,
         street_name: STRING,
         street_type: STRING>,
         country: STRING,
         postal_code: STRING>

So now we have the 'current_address' attribute and its members grouped.
This is not only organizational, but also has an impact on the performance of the processes related to this table.

When you want to retrieve a data it can be done like this:

SELECT id, name,
current_address.street_address.street_number,
current_address.street_address.street_name,
current_address.street_address.street_type,
current_address.country,
current_address.postal_code
FROM struct_demo;

 

Despite the example they are giving, it refers to Apache Impala the concept is the same applied to Spark.

Hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here