Support Questions

Find answers, ask questions, and share your expertise

How to create parquet file with complex data types like struct.

avatar
New Contributor

Basically I want to create a table in impala with complex data types and insert data into it. To load this table I need to generate parquet files with Complex data types. Any help is really appreciated. Thanks

 

basically i want to create a table something like this 

DESCRIBE struct_demo;
+-------------------+--------------------------+
| name              | type                     |
+-------------------+--------------------------+
| id                | bigint                   |
| name              | string                   |
| employee_info     | struct<                  |
|                   |   employer:string,       |
|                   |   id:bigint,             |
|                   |   address:string         |
|                   | >                        |
| places_lived      | array<struct<            |
|                   |   street:string,         |
|                   |   city:string,           |
|                   |   country:string         |
|                   | >>                       |
| memorable_moments | map<string,struct<       |
|                   |   year:int,              |
|                   |   place:string,          |
|                   |   details:string         |
|                   | >>                       |
| current_address   | struct<                  |
|                   |   street_address:struct< |
|                   |     street_number:int,   |
|                   |     street_name:string,  |
|                   |     street_type:string   |
|                   |   >,                     |
|                   |   country:string,        |
|                   |   postal_code:string     |
|                   | >                        |

 

so as u see few columns are structs and maps. How can we generate parquet file with this kind of data.

1 REPLY 1

avatar
Expert Contributor

Hi @Nisha2019,

This example seems like a snippet from our documentation here. Just above this example DESCRIBE statement there is a sample CREATE TABLE query that generates this table schema, please see bellow.

As per ingesting data into these tables, Impala does not support creating data with complex type columns currently, Loading Data Containing Complex Types describes it in more detail. Additionally, some more information can be found in the Complex type considerations chapter.

Hive does not support inserting values to a parquet complex type one-by-one either, but there are two solutions:

  1. Creating a temporary table with values, then transform it to Parquet complex type with Hive, please see our documentation here for sample queries: Constructing Parquet Files with Complex Columns Using Hive
  2. Using INSERT INTO ... SELECT <values> query, for inserting records one by one, reference queries can be found in the description of IMPALA-3938. Please note that this will generate separate files for each records that occasionally need to be compacted.
CREATE TABLE struct_demo
(
  id BIGINT,
  name STRING,

-- A STRUCT as a top-level column. Demonstrates how the table ID column
-- and the ID field within the STRUCT can coexist without a name conflict.
  employee_info STRUCT < employer: STRING, id: BIGINT, address: STRING >,

-- A STRUCT as the element type of an ARRAY.
  places_lived ARRAY < STRUCT <street: STRING, city: STRING, country: STRING >>,

-- A STRUCT as the value portion of the key-value pairs in a MAP.
  memorable_moments MAP < STRING, STRUCT < year: INT, place: STRING, details: STRING >>,

-- A STRUCT where one of the fields is another STRUCT.
  current_address STRUCT < street_address: STRUCT <street_number: INT, street_name: STRING, street_type: STRING>, country: STRING, postal_code: STRING >
)
STORED AS PARQUET;