Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to Prevent Hive from Storing Data with "/x00" prepended to every character?

Highlighted

How to Prevent Hive from Storing Data with "/x00" prepended to every character?

I am under the impression that Hive stores all of its data in a rather odd format.

 

 

If I issue a command like:

 

INSERT OVERWRITE DIRECTORY '${OUTPUT}'
SELECT device_id, device_type, sum(cost)
FROM ${INPUT_TABLE}
GROUP BY device_id, device_type
ORDER BY device_id, device_type;

 

Let's pretend it emits data to HDFS such as:

 

D01AA100
D01BB150
D01CC200

 

I can issue a hadoop -fs cat ... and see the expected results.

 

However, if I open this file with a python script, all the results look like:

 

\x00D\x000\x001\x00\x00A\x00A\x00\x001\x000\x000\x00
\x00D\x000\x001\x00\x00B\x00B\x00\x001\x005\x000\x00
\x00D\x000\x001\x00\x00C\x00C\x00\x002\x000\x000\x00

 

In addition, if I make an external hbase table in hive and insert overwrite table into it, if I use happybase in python or if I just go to the hbase shell and issue a scan on the table, every single character is prepended with "\x00"!

 

How can I prevent Hive from doing this?

 

Why is it doing it?

 

Thank you!