Support Questions

Find answers, ask questions, and share your expertise

HIVE ORC table returns NULLs

avatar
Contributor

I am creating hive external table ORC (ORC file located on S3 , environment AWS EMR 5.9 Hive 2.3.0 )

Command

<code>CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION 's3://bucket_name'

After running the query:

<code>Select * from Table1;

Result is:

<code>+-------------------------------------+---------------------------------------+
| Table1.id  | Table1.name  |
+-------------------------------------+---------------------------------------+
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
| NULL                                | NULL                                  |
+-------------------------------------+---------------------------------------+

Interesting that the number of returned records 10 and it is correct but all records are NULL. What is wrong, why query returns only NULLs? I am using EMR instances on AWS. Should I configure/check to support ORC format for hive?

I tested it with file location on s3 , hdfs. Query from hive , beeline. the behavior is the same:

select count (*) returns 10.

select * returns NULLs ...

ORC file attachment:

https://drive.google.com/file/d/0B3MYgurAigDMdm1ESkZYWm9Zdms/view

I know guys it is not a Hortonworks distribution, but I would really appreciate your help 🙂

Thanks

Oleg.

2 REPLIES 2

avatar
Contributor

Hi , I've made some research:

ORC file is generated by NIFI ConvertAvroToOrc ( NIFI 1.4 ).

I ran the same tests with EMR 5.7 which has hive 2.1 and hive successfully can query orc external table.

I checked and it looks like EMR 5.9 hive 2.3 is using upgraded version of ORC

Question:

What should be done to NIFI ConvertAvroToOrc knows to work with hive 2.3 properly?

I checked writer version of ORC:

ORC generates with NIFI "writerVersion": "HIVE_8732"

ORC generated with HIVE 2.3 "writerVersion": "ORC_135"

Should I create a separate topic, since it looks like the problem is related to NIFI ORC component?

Thanks

Oleg.

avatar
Contributor

@Oleg Ruchovets Hi Oleg, I have a similar situation. I am loading an orc formatted file into HDFS to which the Hive table's Location is pointing to. In my case not all my values are null, If there are 33 files in the hdfs location where I stored it as orc format, I see 33 rows with data in the Hive table but I also see another 10 rows where all null in the hive table. Have you found a solution for your issue yet.