Created on 09-21-2016 05:25 AM - edited 09-16-2022 03:40 AM
Hello guys,
I am using Scoop to transfer data from postgresql to hive usaing storage parquet.
I have a problem with the postgresql format "timestamp without time zone" which is mapped to BIGINT and becomes unsable for queriing.
I found the import option: --map-column-hive varname=type.
I have tried with the type:STRING and TIMESTAMP, both are ignored and I do not get any message about it.
Version used: SQOOP 1.4.6, Postgresql 9.4, CDH 5.8.
any idea is welcome.
Created 10-13-2016 07:43 PM
Created on 10-14-2016 05:31 AM - edited 10-14-2016 05:33 AM
@kerjo and @manpreet2 I would like to clarify something. Are you both facing the same issue or something similar?
If the issues are only similar it may be better for @manpreet2 to post the second issue into a new thread to avoid confusion. If they are the same issue, please disregard my question and carry on. 🙂
Created on 11-07-2016 08:39 PM - edited 11-07-2016 08:41 PM
@kerjo, I'm seeing the same issue.
Given the following sqoop invocation:
sqoop \ import \ --map-column-hive LoggedTime=timestamp \ --connect jdbc:mysql://madb-t:3306/Sardata \ --username reader \ --password 'xx' \ --hive-table 'memory' \ --table 'Memory' \ --as-parquetfile \ --target-dir '/user/sardata/mem' \ --hive-overwrite \ --hive-import
I find that the --map-column-hive option is totally ignored. (It doesn't matter if I set it to LoggedTime=string, LoggedTime=blah, or PurpleMonkeyDishwasher=sdfjsaldkjf. It is ignored.)
I noticed tonight that if I avoid Parquet and instead use --as-textfile the --map-column-hive option works correctly. I do not know the reason for this behavior at the moment. And I still need to get the files into Parquet format for Impala to report against. Just wanted to share my observation.
Created 11-08-2016 12:10 AM
Created 04-17-2017 03:12 AM
Created on 11-28-2017 06:13 AM - edited 11-28-2017 06:23 AM
Hi kerjo,
I was facing the same issue with map-column-java and map-column-hive while importing date/timestamp column from Oracle RDBMS to Hive parquet table using Sqoop. My Hive column is of type string.
Version CDH 5.12.1, Parquet 1.5.0.
A partial workaround I discovered is to convert the Date column to string using a free form query in Sqoop.
Input column in oracle (type Date):
'16.11.2017 09:44:11'
--query
select t.trans_id, to_char(t.trans_date_sysdate) trans_date_char from test_table_1 t where $CONDITIONS
Result in Hive looks like this:
16-NOV-17
Unfortunately, this is not enough, I am still not able to import the HH:mm:SS part.
Created 11-28-2017 09:04 AM
Hello,
I haven't done anything on this topic. My idea is to fix the code of the sqoop but I need one or two days to do it...
Created 11-30-2017 04:45 AM
EDIT: I have successfuly imported Date column from Oracle into Hive by modifiyng my query:
select t.trans_id, to_char(t.trans_date_sysdate,'YYYYMMDD_HH24MiSS') trans_date_char trans_date_char from test_table_1 t where $CONDITIONS
Created 09-06-2018 07:37 PM
I'm also facing the same issue, it would be great if cloudera expert team pitches in and confirm whether it is bug or not and shed some light on other possible work arounds.
These issues are show stoppers, easy access to resolutions would be win win to everybody.
Created 04-26-2019 08:29 AM
To add a little detail here, I have a column selection in my source query from Oracle like
to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
'YYYY-MM-DD HH24:MI:SS.FF6tzh:tzm') AS ACTIVITY_TSTZ_CHAR
If I leave the `T` out of the ISO 8601 format (I have a space above), it comes through as a string into my parquet, however if I change this column to (notice the "T" in the date/time format):
to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
'YYYY-MM-DD"T"HH24:MI:SS.FF6tzh:tzm') AS ACTIVITY_TSTZ_CHAR,
then even with
--map-column-java ACTIVITY_TSTZ_CHAR=String
in my sqoop command, I get a timestamp error:
Error: java.io.IOException: SQLException in nextKeyValue at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565) at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) Caused by: java.sql.SQLDataException: ORA-01821: date format not recognized`