Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop + Hive table stored as parquet = datatype error

avatar
New Contributor

First of all, sorry for my bad English.
Guys, I've been having a problem trying to import data from my Oracle database into Hive using Sqoop.
When importing the data into a previously created table, with decimal datatypes and timestamp, I get a conversion error.
If I do not pre-create the table and allow sqoop to do so, it creates it with string datatypes and bigint.
I tried to use the map-column-hive parameter, but with parquet it turns out to be useless. And that's where things get interesting!

 

With the sequenceFile format, map-column-hive is respected.

Briefly, I need to import the data from Oracle into Hive, with the table stored as the parquet, and maintaining decimal and timestamp datatypes. In fact, I'd like to previously create my table with varchar, char, decimal, and timestamp datatypes.

 

How can I proceed? Should I upgrade any of my components?

I use QuickStart VMs 5.8 today, I believe Hive is in version 1.1.

 

Here are some links to similar problems:

http://community.cloudera.com/t5/Data-Ingestion-Integration/SQOOP-IMPORT-map-column-hive-ignored/td-...

 

http://community.cloudera.com/t5/Data-Ingestion-Integration/Using-Parquet-in-sqoop-import-automatica...

 

 

I thank you,
Alexandre.

6 REPLIES 6

avatar
Expert Contributor

Hi,

 

As parquet is a binary format so Hive tables gets created after looking at the schema file(under .metadata directory) of the target directory.

 

When you make sqoop hive import + parquet file then the table that sqoop creates in hive is based on the metadata that has been fetched from sqoop at the time of fetching the metadata.

 

When you are using --map-column-hive, then sqoop is  not changing the data type at the time of creation of schema file. Sqoop will try to change the datatype once all the metadata and data has been import. Parquet table data types has been limited to their schema. The data type that has been mentioned in the schema file on that the table will be created.

 

 

In this case if your schema file has bigint so it doesn't matter if you try to give the data type as string at --map-column-hive. Sqoop will create the datatype as bigint because parquet schema file has bigint in it.

 

So it will be better to change the datatype itself when you are fetching the metadata and importing in bby using --map-column-java. 

 

NOTE:- If it was a text file then --map-column-hive would have done its magic because it is not dependent on any schema file.

 

Regards

Nitish Khanna

 

avatar
Expert Contributor

Please let me know if this also doesn't work

avatar
New Contributor

Hi, thanks for replying.
I did not understand how I can apply this solution.

Maybe I was not clear on my question. Is it possible to import data via Sqoop to a Hive table, stored as Parquet, and previously created with decimal and timestamp datatypes?

avatar
Expert Contributor

Hi,

 

Sqoop parquet mapping is not generic. So if it doesn't understand, it converts into string.

 

For Avro and Parquet imports Sqoop relies upon Avro schema. Due to this implementation DATE/TIMESTAMP values are imported as Java type Long by default. You can override the type to import as a Java String by using the option (--map-column-java <colname>=String)

 

Example I have table in oracle.

 

column1=decimal

column2=timestamp

 

add the additional parameter in sqoop command

 

--map-column-java column1=Double,column2=String

 

This will cast map the column1 from decimal to double . If you don't give this then it will make it as string that is default mapping.

 

 

avatar
New Contributor

Hi,

 

I´m facing a simal problem.

I want to import data into Parquetfiles using sqoop.

I want to let Sqoop create the table, so it´s not bound to a preexisting one.

 

The goal is to represent each column in the same datatyp it is in my RDBMS.

 

The --map-column-hive seems to work fine for textfiles but not for Parquets.

That´s what i understand so far.

 

It was mentioned that -map-column-java works with parquetfiles, but unfortunately it does not work with all datatypes.

e.g.

ERROR tool.ImportTool: Imported Failed: No ResultSet method for Java type TIMESTAMP

 

I want to import a Timestamp as Timestamp and DECIMALS as DECIMAL, not a string or bigint.

I can´t get this to work, because -map-column-java doesn´t notice timestamp to be a compatible datatype

 

Do i understand right, that there is currently no way to achieve this except for changing the fileformat to text or sequencefile?

avatar
Expert Contributor

Hi,

 

There are some internal Jira's that have been raised for the Sqoop + Parquet + --decimal to make the import happen perfectly fine without changing the datatype by Sqoop.

 

But untill that Jira is fixed we have to change the data type.

 

Regards

Nitish