Reply
New Contributor
Posts: 2
Registered: ‎03-30-2017

Sqoop + Hive table stored as parquet = datatype error

[ Edited ]

First of all, sorry for my bad English.
Guys, I've been having a problem trying to import data from my Oracle database into Hive using Sqoop.
When importing the data into a previously created table, with decimal datatypes and timestamp, I get a conversion error.
If I do not pre-create the table and allow sqoop to do so, it creates it with string datatypes and bigint.
I tried to use the map-column-hive parameter, but with parquet it turns out to be useless. And that's where things get interesting!

 

With the sequenceFile format, map-column-hive is respected.

Briefly, I need to import the data from Oracle into Hive, with the table stored as the parquet, and maintaining decimal and timestamp datatypes. In fact, I'd like to previously create my table with varchar, char, decimal, and timestamp datatypes.

 

How can I proceed? Should I upgrade any of my components?

I use QuickStart VMs 5.8 today, I believe Hive is in version 1.1.

 

Here are some links to similar problems:

http://community.cloudera.com/t5/Data-Ingestion-Integration/SQOOP-IMPORT-map-column-hive-ignored/td-...

 

http://community.cloudera.com/t5/Data-Ingestion-Integration/Using-Parquet-in-sqoop-import-automatica...

 

 

I thank you,
Alexandre.

Cloudera Employee
Posts: 4
Registered: ‎04-03-2017

Re: Sqoop + Hive table stored as parquet = datatype error

Hi,

 

As parquet is a binary format so Hive tables gets created after looking at the schema file(under .metadata directory) of the target directory.

 

When you make sqoop hive import + parquet file then the table that sqoop creates in hive is based on the metadata that has been fetched from sqoop at the time of fetching the metadata.

 

When you are using --map-column-hive, then sqoop is  not changing the data type at the time of creation of schema file. Sqoop will try to change the datatype once all the metadata and data has been import. Parquet table data types has been limited to their schema. The data type that has been mentioned in the schema file on that the table will be created.

 

 

In this case if your schema file has bigint so it doesn't matter if you try to give the data type as string at --map-column-hive. Sqoop will create the datatype as bigint because parquet schema file has bigint in it.

 

So it will be better to change the datatype itself when you are fetching the metadata and importing in bby using --map-column-java. 

 

NOTE:- If it was a text file then --map-column-hive would have done its magic because it is not dependent on any schema file.

 

Regards

Nitish Khanna

 

Cloudera Employee
Posts: 4
Registered: ‎04-03-2017

Re: Sqoop + Hive table stored as parquet = datatype error

Please let me know if this also doesn't work

New Contributor
Posts: 2
Registered: ‎03-30-2017

Re: Sqoop + Hive table stored as parquet = datatype error

Hi, thanks for replying.
I did not understand how I can apply this solution.

Maybe I was not clear on my question. Is it possible to import data via Sqoop to a Hive table, stored as Parquet, and previously created with decimal and timestamp datatypes?

Cloudera Employee
Posts: 4
Registered: ‎04-03-2017

Re: Sqoop + Hive table stored as parquet = datatype error

Hi,

 

Sqoop parquet mapping is not generic. So if it doesn't understand, it converts into string.

 

For Avro and Parquet imports Sqoop relies upon Avro schema. Due to this implementation DATE/TIMESTAMP values are imported as Java type Long by default. You can override the type to import as a Java String by using the option (--map-column-java <colname>=String)

 

Example I have table in oracle.

 

column1=decimal

column2=timestamp

 

add the additional parameter in sqoop command

 

--map-column-java column1=Double,column2=String

 

This will cast map the column1 from decimal to double . If you don't give this then it will make it as string that is default mapping.

 

 

Announcements
New solutions