I'm new to Hadoop and trying to move many tables from an existing sql database to a Hadoop cluster (and into hive). I'm using sqoop because it should automatically set up all the tables for me, so I don't have to know about the hundreds of columns involved and all their details. For whatever reason, sqoop is automatically changing the datetime columns into varchar columns.
Searching online, it looks like people have encountered this kind of error before, at least with oracle databases, but the only fixes I've found seem specific to problems with the oracle jdbc, or rely on listing out every relevant column's details, which defeats my purpose for using sqoop- and it seems like this is something I should be able to fix in the initial sqoop import in some kind of setting.
Thanks in advance for any help!
Can you try --map-column-hive option. This will Overridethe default mapping from SQL type to Hive type for configured columns
Refer to the documentation here https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_dataintegration/content/using_sqoop_to_m...
Hi! Sorry for the slow response- The --map-column-hive option still would need me to list at least the name of every column that I want to keep as a datetime- I'm really looking for some way of just keeping ALL of the datetime columns as datetime columns. I guess in the case I do need to use -map-column-hive, do you know, can I still use that with import-all, with columns specified as tablename.columname?
**FIXED** Edit/update: Now trying just one table, I seem to be able to get the first column I list stored in the right type, and the rest still store as strings after sqoop tells me it can't find one of the other columns (and thus doesn't actually move any data; the table is just created)
sqoop import --connect 'jdbc:sqlserver://connectionstring' --table sourcetable --hcatalog-table destinationtable --create-hcatalog-table --hcatalog-storage-stanza 'stored as orc' --map-column-hive 'datedata=date, effectivestartdate=date, effectiveenddate=date, startdate=date, enddate=date' -- --schema main
Error: Column effectivestartdate not found while mapping database columns to hcatalog columns
Fix: remove spaces between items in --map-column-hive parameter, e.g. datedata=date,startdate=date
For now I'll use a script to generate the --map-column-hive parameter for each table and run the import command on each individually, but the question still stands, if you know of a way to set the default type mapping
I also have trouble sqoop importing from a Sybase ASE 15.7 with hcatalog table creation. Sqoop/hcat is mapping date and datetime to string.
It seems to identify the column types with hcat.SqoopHCatUtilities
18/07/26 13:09:26 INFO hcat.SqoopHCatUtilities: Database column name - info map : ... date : [Type : 91,Precision : 10,Scale : 0] datetime : [Type : 93,Precision : 23,Scale : 3] ...
But maps to string when creating the table
create table `henrik_o`.`datatypes` ( ... `date` string, `datetime` string ...
I too would like to avoid manually having to override the mapping with --map-column-hive. Can anyone make this work automatically, so hive tables get created with date and datetime types when seen in source db?