Support Questions

kerjo · ‎09-21-2016

Hello guys,

I am using Scoop to transfer data from postgresql to hive usaing storage parquet.

I have a problem with the postgresql format "timestamp without time zone" which is mapped to BIGINT and becomes unsable for queriing.

I found the import option: --map-column-hive varname=type.

I have tried with the type:STRING and TIMESTAMP, both are ignored and I do not get any message about it.

Version used: SQOOP 1.4.6, Postgresql 9.4, CDH 5.8.

any idea is welcome.

csguna · ‎10-13-2016

kerjo

I was thinking a work around of type casting in the hive side . I understand that your map-column-hive is being ignored . Correct me if I am wrong.

cjervis · ‎10-14-2016

@kerjo and @manpreet2 I would like to clarify something. Are you both facing the same issue or something similar?

If the issues are only similar it may be better for @manpreet2 to post the second issue into a new thread to avoid confusion. If they are the same issue, please disregard my question and carry on. 🙂

Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

eriks · ‎11-07-2016

@kerjo, I'm seeing the same issue.

CDH 5.8.2 (w/ sqoop 1.4.6)
MariaDB 5.5, or AWS Redshift, or..

Given the following sqoop invocation:

sqoop                                         \
import                                        \
--map-column-hive LoggedTime=timestamp        \
--connect jdbc:mysql://madb-t:3306/Sardata    \
--username reader                             \
--password 'xx'                               \
--hive-table 'memory'                         \
--table 'Memory'                              \
--as-parquetfile                              \
--target-dir '/user/sardata/mem'              \
--hive-overwrite                              \
--hive-import

I find that the --map-column-hive option is totally ignored. (It doesn't matter if I set it to LoggedTime=string, LoggedTime=blah, or PurpleMonkeyDishwasher=sdfjsaldkjf. It is ignored.)

I noticed tonight that if I avoid Parquet and instead use --as-textfile the --map-column-hive option works correctly. I do not know the reason for this behavior at the moment. And I still need to get the files into Parquet format for Impala to report against. Just wanted to share my observation.

--
Cloudera CDH 5.8.2 / Centos 7

kerjo · ‎11-08-2016

Hi @eriks,

I am busy this week but I think I can try to build a fix next week.
Johann

SparkLearner · ‎04-17-2017

Hi @kerjo,

Did you fixed the map-column hive ignored issue?
Could you please share your ideas? It would be very helpful

Regards,
Ganeshbabu R

AlexMulti · ‎11-28-2017

Hi kerjo,

I was facing the same issue with map-column-java and map-column-hive while importing date/timestamp column from Oracle RDBMS to Hive parquet table using Sqoop. My Hive column is of type string.

Version CDH 5.12.1, Parquet 1.5.0.

A partial workaround I discovered is to convert the Date column to string using a free form query in Sqoop.

Input column in oracle (type Date):

'16.11.2017 09:44:11'

--query

select t.trans_id, to_char(t.trans_date_sysdate) trans_date_char from test_table_1 t where $CONDITIONS

Result in Hive looks like this:

16-NOV-17

Unfortunately, this is not enough, I am still not able to import the HH:mm:SS part.

kerjo · ‎11-28-2017

Hello,

I haven't done anything on this topic. My idea is to fix the code of the sqoop but I need one or two days to do it...

AlexMulti · ‎11-30-2017

EDIT: I have successfuly imported Date column from Oracle into Hive by modifiyng my query:

select t.trans_id, to_char(t.trans_date_sysdate,'YYYYMMDD_HH24MiSS') trans_date_char trans_date_char from test_table_1 t where $CONDITIONS

Prabhu0505 · ‎09-06-2018

I'm also facing the same issue, it would be great if cloudera expert team pitches in and confirm whether it is bug or not and shed some light on other possible work arounds.

These issues are show stoppers, easy access to resolutions would be win win to everybody.

Caddac · ‎04-26-2019

To add a little detail here, I have a column selection in my source query from Oracle like

to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
        'YYYY-MM-DD HH24:MI:SS.FF6tzh:tzm') AS                                 ACTIVITY_TSTZ_CHAR

If I leave the `T` out of the ISO 8601 format (I have a space above), it comes through as a string into my parquet, however if I change this column to (notice the "T" in the date/time format):

to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
        'YYYY-MM-DD"T"HH24:MI:SS.FF6tzh:tzm') AS                                 ACTIVITY_TSTZ_CHAR,

then even with

--map-column-java ACTIVITY_TSTZ_CHAR=String

in my sqoop command, I get a timestamp error:

Error: java.io.IOException: SQLException in nextKeyValue
	at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.sql.SQLDataException: ORA-01821: date format not recognized`

Cloudera Community

Support Questions

SQOOP IMPORT --map-column-hive ignored