Support Questions

Find answers, ask questions, and share your expertise

SQOOP IMPORT --map-column-hive ignored

avatar
Contributor

Hello guys,

 

I am using Scoop to transfer data from postgresql to hive usaing storage parquet.

 

I have a problem with the postgresql format "timestamp without time zone" which is mapped to BIGINT and becomes unsable for queriing.

I found the import option: --map-column-hive varname=type.

I have tried with the type:STRING and TIMESTAMP, both are ignored and I do not get any message about it.

 

Version used: SQOOP 1.4.6, Postgresql 9.4, CDH 5.8.

any idea is welcome.

20 REPLIES 20

avatar
Contributor

Nobody is facing this problem?

avatar
New Contributor

Did you find a solution for this problem?

 

I am having the same issue.  I've scoured the internet and I think In my case, --map-column-hive is not working with the --as-parquetfile command for some reason.  I am not really sure why but I think that is the reason.

 

A workaround that I am trying is to import the file regularly and create a table in hive using the file you imported and casting to timestamp there

avatar
Explorer

Do we have any solution for this? I am facing the same issue.

avatar
Champion

Could you let us know the timestamp format - is it something like the below 

yyyy-mm-dd hh:mm:ss? 

 

avatar
Explorer
Hi,
The format is yyyy-mm-dd hh:mm:ss in RDBMS. When i try to sqoop it in hive, it does unix timestamp conversion but i want it to be in date format which is readable.
Thanks

avatar
Contributor

Hello,

in PostgreSQL, I have my datetime stored as TIMESTAMP without TIME ZONE.

I have tried with other variables types, the option is not working when doing an import to create a hive table in parquet format.

avatar
Champion

Would consider trying  type casting BIGINT TO Timestamp . 

Also please refer this document , I read it long back. I am quoting it from the cloudera manul document 

 

 

If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns. 
The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table.
The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds.
Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type.

I guess there is underlying problem with Timestamp when you use Parquet file. 

http://www.cloudera.com/documentation/archive/impala/2-x/2-1-x/topics/impala_parquet.html#parquet_da...

 

 

 

 

avatar
Contributor

Hi csguna,

 

this is not the point, there is an option for changing/forcing type which is being ignored in the case of importing a table from postgresql to Hive in parquet format.

 

avatar
Explorer
I agree with Kerjo.. Type cast is further step but i want to get the data straightaway in date format. Tried --map-column-hive but it is getting ignored