Support Questions

Find answers, ask questions, and share your expertise

SQOOP IMPORT --map-column-hive ignored

avatar
Contributor

Hello guys,

 

I am using Scoop to transfer data from postgresql to hive usaing storage parquet.

 

I have a problem with the postgresql format "timestamp without time zone" which is mapped to BIGINT and becomes unsable for queriing.

I found the import option: --map-column-hive varname=type.

I have tried with the type:STRING and TIMESTAMP, both are ignored and I do not get any message about it.

 

Version used: SQOOP 1.4.6, Postgresql 9.4, CDH 5.8.

any idea is welcome.

20 REPLIES 20

avatar
Champion
kerjo

I was thinking a work around of type casting in the hive side . I understand that your map-column-hive is being ignored . Correct me if I am wrong.

avatar
Community Manager

@kerjo and @manpreet2 I would like to clarify something. Are you both facing the same issue or something similar?

 

If the issues are only similar it may be better for @manpreet2 to post the second issue into a new thread to avoid confusion. If they are the same issue, please disregard my question and carry on. 🙂 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

avatar
New Contributor

@kerjo, I'm seeing the same issue.

 

  • CDH 5.8.2 (w/ sqoop 1.4.6)
  • MariaDB 5.5, or AWS Redshift, or..

Given the following sqoop invocation:

 

sqoop                                         \
import                                        \
--map-column-hive LoggedTime=timestamp        \
--connect jdbc:mysql://madb-t:3306/Sardata    \
--username reader                             \
--password 'xx'                               \
--hive-table 'memory'                         \
--table 'Memory'                              \
--as-parquetfile                              \
--target-dir '/user/sardata/mem'              \
--hive-overwrite                              \
--hive-import

 

I find that the --map-column-hive option is totally ignored. (It doesn't matter if I set it to LoggedTime=string, LoggedTime=blah, or PurpleMonkeyDishwasher=sdfjsaldkjf. It is ignored.)

 

I noticed tonight that if I avoid Parquet and instead use --as-textfile the --map-column-hive option works correctly. I do not know the reason for this behavior at the moment. And I still need to get the files into Parquet format for Impala to report against. Just wanted to share my observation.

--
Cloudera CDH 5.8.2 / Centos 7

avatar
Contributor
Hi @eriks,

I am busy this week but I think I can try to build a fix next week.
Johann

avatar
Hi @kerjo,

Did you fixed the map-column hive ignored issue?
Could you please share your ideas? It would be very helpful

Regards,
Ganeshbabu R

avatar
Explorer

Hi kerjo,

 

I was facing the same issue with map-column-java and map-column-hive while importing date/timestamp column from Oracle RDBMS to Hive parquet table using Sqoop. My Hive column is of type string.

Version CDH 5.12.1, Parquet 1.5.0.

 

A partial workaround I discovered is to convert the Date column to string using a free form query in Sqoop.

 

Input column in oracle (type Date):

 

'16.11.2017 09:44:11'

 

--query

 

select t.trans_id, to_char(t.trans_date_sysdate) trans_date_char from test_table_1 t where $CONDITIONS

Result in Hive looks like this:

 

16-NOV-17

 

 

 

Unfortunately, this is not enough, I am still not able to import the HH:mm:SS part.

 

 

 

avatar
Contributor

Hello,

 

I haven't done anything on this topic. My idea is to fix the code of the sqoop but I need one or two days to do it...

avatar
Explorer

EDIT: I have successfuly imported Date column from Oracle into Hive by modifiyng my query:

 

select t.trans_id, to_char(t.trans_date_sysdate,'YYYYMMDD_HH24MiSS') trans_date_char trans_date_char from test_table_1 t where $CONDITIONS 

 

avatar
Explorer

I'm also facing the same issue, it would be great if cloudera expert team pitches in and confirm whether it is bug or not and shed some light on other possible work arounds.

 

These issues are show stoppers, easy access to resolutions would be win win to everybody.

 

avatar
New Contributor

To add a little detail here, I have a column selection in my source query from Oracle like 

to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
'YYYY-MM-DD HH24:MI:SS.FF6tzh:tzm') AS ACTIVITY_TSTZ_CHAR

 

If I leave the `T` out of the ISO 8601 format (I have a space above), it comes through as a string into my parquet, however if I change this column to (notice the "T" in the date/time format):

to_char(FROM_TZ(ma.ACTIVITY_TS, DECODE(TRIM(ma.ACTIVITY_TZ), 'Z', '00:00', ma.ACTIVITY_TZ)),
'YYYY-MM-DD"T"HH24:MI:SS.FF6tzh:tzm') AS ACTIVITY_TSTZ_CHAR,

 

then even with 

--map-column-java ACTIVITY_TSTZ_CHAR=String

in my sqoop command, I get a timestamp error: 

 

Error: java.io.IOException: SQLException in nextKeyValue
	at org.apache.sqoop.mapreduce.db.DBRecordReader.nextKeyValue(DBRecordReader.java:277)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:565)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:796)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169)
Caused by: java.sql.SQLDataException: ORA-01821: date format not recognized`