Created on 09-21-2016 05:25 AM - edited 09-16-2022 03:40 AM
Hello guys,
I am using Scoop to transfer data from postgresql to hive usaing storage parquet.
I have a problem with the postgresql format "timestamp without time zone" which is mapped to BIGINT and becomes unsable for queriing.
I found the import option: --map-column-hive varname=type.
I have tried with the type:STRING and TIMESTAMP, both are ignored and I do not get any message about it.
Version used: SQOOP 1.4.6, Postgresql 9.4, CDH 5.8.
any idea is welcome.
Created 09-23-2016 12:58 AM
Nobody is facing this problem?
Created 09-30-2016 12:26 PM
Did you find a solution for this problem?
I am having the same issue. I've scoured the internet and I think In my case, --map-column-hive is not working with the --as-parquetfile command for some reason. I am not really sure why but I think that is the reason.
A workaround that I am trying is to import the file regularly and create a table in hive using the file you imported and casting to timestamp there
Created 10-07-2016 04:20 AM
Do we have any solution for this? I am facing the same issue.
Created on 10-08-2016 05:02 AM - edited 10-08-2016 05:05 AM
Could you let us know the timestamp format - is it something like the below
yyyy-mm-dd hh:mm:ss?
Created 10-13-2016 02:13 AM
Created 10-13-2016 04:43 AM
Hello,
in PostgreSQL, I have my datetime stored as TIMESTAMP without TIME ZONE.
I have tried with other variables types, the option is not working when doing an import to create a hive table in parquet format.
Created 10-13-2016 06:37 AM
Would consider trying type casting BIGINT TO Timestamp .
Also please refer this document , I read it long back. I am quoting it from the cloudera manul document
If you use Sqoop to convert RDBMS data to Parquet, be careful with interpreting any resulting values from DATE, DATETIME, or TIMESTAMP columns.
The underlying values are represented as the Parquet INT64 type, which is represented as BIGINT in the Impala table.
The Parquet values represent the time in milliseconds, while Impala interprets BIGINT as the time in seconds.
Therefore, if you have a BIGINT column in a Parquet table that was imported this way from Sqoop, divide the values by 1000 when interpreting as the TIMESTAMP type.
I guess there is underlying problem with Timestamp when you use Parquet file.
Created 10-13-2016 06:42 AM
Hi csguna,
this is not the point, there is an option for changing/forcing type which is being ignored in the case of importing a table from postgresql to Hive in parquet format.
Created 10-13-2016 07:52 AM