Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Loading parquet files in hive table returns all NULL

Loading parquet files in hive table returns all NULL

New Contributor

I have few files stored in HDFS in parquet format and I am trying to create a new external table in hive which is suppose to point to that data files.

So, I created a table in hive using -

CREATE EXTERNAL TABLE ORDERS_P (ORDERID INT, 
ORDER_DATE BIGINT,CUSTOMER_ID INT,STATUS STRING)STORED AS PARQUET
LOCATION 'hdfs:///user/cloudera/retail/parquet/orders';

Table is created but when i run query on table in hive as -

SELECT * FROM ORDERS_P LIMIT 10

it returns all NULL values as except ORDER_DATE Column -

NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL
NULL    1402729200000   NULL    NULL

I verified data does exists in those files using spark-shell. Not sure what I am doing wrong. Any help appreciated.

4 REPLIES 4

Re: Loading parquet files in hive table returns all NULL

Guru

Hi ,

 

I did a quick test and it worked for me. I created the parquet table within Hive, can you advise how your parquet data was generated?

 

Have you tried to use parquet-tools to see the content of the file to verify?

Highlighted

Re: Loading parquet files in hive table returns all NULL

New Contributor

Hi ,

 

Parquet data was generated using sqoop import of orders table in retail_db using below snippet - 

 

 

sqoop import \
--connect jdbc:mysql://quickstart.cloudera:3306/retail_db \
--username retail_dba \
--password cloudera \
--table orders \
--as-parquetfile \
--target-dir /user/cloudera/retail/parquet/orders1 \
--m 4;

 

No, I havent tried using parquet-tools. I did verify data using simple scala script

val file = sqlContext.parquetFile("retail/parquet/orders")
file.printSchema

root
 |-- order_id: integer (nullable = true)
 |-- order_date: long (nullable = true)
 |-- order_customer_id: integer (nullable = true)
 |-- order_status: string (nullable = true)


file.show()

+--------+-------------+-----------------+---------------+
|order_id|   order_date|order_customer_id|   order_status|
+--------+-------------+-----------------+---------------+
|   51663|1402729200000|             5016|         CLOSED|
|   51664|1402729200000|            11404|     PROCESSING|
|   51665|1402729200000|             8645|       COMPLETE|
|   51666|1402729200000|             7192|PENDING_PAYMENT|
|   51667|1402729200000|            11178|     PROCESSING|
|   51668|1402729200000|              982|     PROCESSING|
|   51669|1402729200000|             7160|        PENDING|
|   51670|1402729200000|             9322|       COMPLETE|
|   51671|1402729200000|             9000|       COMPLETE|
|   51672|1402729200000|              117|         CLOSED|
|   51673|1402729200000|             7538|       COMPLETE|
|   51674|1402729200000|             8348|     PROCESSING|
|   51675|1402729200000|             6199|PENDING_PAYMENT|
|   51676|1402729200000|            11014|        ON_HOLD|
|   51677|1402729200000|             6321|       COMPLETE|
|   51678|1402729200000|             6020|         CLOSED|
|   51679|1402729200000|             8327|PENDING_PAYMENT|
|   51680|1402729200000|             5421|         CLOSED|
|   51681|1402729200000|             9367|PENDING_PAYMENT|
|   51682|1402729200000|            10903|     PROCESSING|
+--------+-------------+-----------------+---------------+
only showing top 20 rows

 

 

Re: Loading parquet files in hive table returns all NULL

Guru

Hi,

 

I have tested using your procedure, and data returned successfully:

 

+------------------------+--------------------------+---------------------------------+----------------------------+--+
| parquet_test.order_id  | parquet_test.order_date  | parquet_test.order_customer_id  | parquet_test.order_status  |
+------------------------+--------------------------+---------------------------------+----------------------------+--+
| 51663                  | 1402729200000            | 5016                            | CLOSED                     |
+------------------------+--------------------------+---------------------------------+----------------------------+--+

I tested using single row of data in MySQL and import using Sqoop command, and I used CDH5.7.0 for testing.


Can you please confirm the version of CDH you are using? (if you are on CDH, or hive and sqoop version otherwise).

Re: Loading parquet files in hive table returns all NULL

New Contributor

Hi,

 

I am using cloudera quickstart virtual machine 5.8.0.

Hive version : Hive 1.1.0-cdh5.8.0

Sqoop verision : Sqoop 1.4.6-cdh5.8.0