Reply
New Contributor
Posts: 5
Registered: ‎08-07-2016

Loading parquet files in hive table returns all NULL

I have few files stored in HDFS in parquet format and I am trying to create a new external table in hive which is suppose to point to that data files.

So, I created a table in hive using -

CREATE EXTERNAL TABLE ORDERS_P (ORDERID INT, 
ORDER_DATE BIGINT,CUSTOMER_ID INT,STATUS STRING)STORED AS PARQUET
LOCATION 'hdfs:///user/cloudera/retail/parquet/orders';

Table is created but when i run query on table in hive as -

SELECT * FROM ORDERS_P LIMIT 10

it returns all NULL values as except ORDER_DATE Column -

NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL

I verified data does exists in those files using spark-shell. Not sure what I am doing wrong. Any help appreciated.

Cloudera Employee
Posts: 832
Registered: ‎03-23-2015

Re: Loading parquet files in hive table returns all NULL

Hi ,

 

I did a quick test and it worked for me. I created the parquet table within Hive, can you advise how your parquet data was generated?

 

Have you tried to use parquet-tools to see the content of the file to verify?

New Contributor
Posts: 5
Registered: ‎08-07-2016

Re: Loading parquet files in hive table returns all NULL

Hi ,

 

Parquet data was generated using sqoop import of orders table in retail_db using below snippet - 

 

 

sqoop import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--as-parquetfile \
--target-dir /user/cloudera/retail/parquet/orders1 \
--m 4;

 

No, I havent tried using parquet-tools. I did verify data using simple scala script

val file = sqlContext.parquetFile("retail/parquet/orders")
file.printSchema

root
 |-- order_id: integer (nullable = true)
 |-- order_date: long (nullable = true)
 |-- order_customer_id: integer (nullable = true)
 |-- order_status: string (nullable = true)


file.show()

+--------+-------------+-----------------+---------------+
|order_id|   order_date|order_customer_id|   order_status|
+--------+-------------+-----------------+---------------+
|   51663|1402729200000|             5016|         CLOSED|
|   51664|1402729200000|            11404|     PROCESSING|
|   51665|1402729200000|             8645|       COMPLETE|
|   51666|1402729200000|             7192|PENDING_PAYMENT|
|   51667|1402729200000|            11178|     PROCESSING|
|   51668|1402729200000|              982|     PROCESSING|
|   51669|1402729200000|             7160|        PENDING|
|   51670|1402729200000|             9322|       COMPLETE|
|   51671|1402729200000|             9000|       COMPLETE|
|   51672|1402729200000|              117|         CLOSED|
|   51673|1402729200000|             7538|       COMPLETE|
|   51674|1402729200000|             8348|     PROCESSING|
|   51675|1402729200000|             6199|PENDING_PAYMENT|
|   51676|1402729200000|            11014|        ON_HOLD|
|   51677|1402729200000|             6321|       COMPLETE|
|   51678|1402729200000|             6020|         CLOSED|
|   51679|1402729200000|             8327|PENDING_PAYMENT|
|   51680|1402729200000|             5421|         CLOSED|
|   51681|1402729200000|             9367|PENDING_PAYMENT|
|   51682|1402729200000|            10903|     PROCESSING|
+--------+-------------+-----------------+---------------+
only showing top 20 rows

 

 

Cloudera Employee
Posts: 832
Registered: ‎03-23-2015

Re: Loading parquet files in hive table returns all NULL

Hi,

 

I have tested using your procedure, and data returned successfully:

 

+------------------------+--------------------------+---------------------------------+----------------------------+--+
| parquet_test.order_id  | parquet_test.order_date  | parquet_test.order_customer_id  | parquet_test.order_status  |
+------------------------+--------------------------+---------------------------------+----------------------------+--+
| 51663                  | 1402729200000            | 5016                            | CLOSED                     |
+------------------------+--------------------------+---------------------------------+----------------------------+--+

I tested using single row of data in MySQL and import using Sqoop command, and I used CDH5.7.0 for testing.


Can you please confirm the version of CDH you are using? (if you are on CDH, or hive and sqoop version otherwise).

New Contributor
Posts: 5
Registered: ‎08-07-2016

Re: Loading parquet files in hive table returns all NULL

Hi,

 

I am using cloudera quickstart virtual machine 5.8.0.

Hive version : Hive 1.1.0-cdh5.8.0

Sqoop verision : Sqoop 1.4.6-cdh5.8.0