04-20-2017 01:52 AM
I'm trying to load data from Oracle to Hive as parquet. Every time i load a table with date/timestamp column to hive, it automatically converts these columns to BIGINT. Is is possible to load timestamp/date formats to hive using sqoop and as a parquet file?
Already tried creating the table first in hive then using impala to LOAD DATA INPATH the parquet file.
Still failed with errors
"file XX has an incompatible Parquet schema for column XX column: TIMESTAMP"
04-20-2017 06:56 AM
Can you share your script? so that you may get desirable answer
Have you tried this option?
--query 'select col1, date_function(col2), etc from table_name where $CONDITIONS'
1. date_function is not an actual function name. You can use the applicable hive function
2. Your query will end with "where $CONDITIONS" by default and this is syntax
04-20-2017 07:30 PM
Already tried using the --query. cast the column as timestamp but still converted TIMESTAMP to BIGINT in the HIVE table.
Here's my query:
sqoop import \
--connect "jdbc:oracle:thin:@192.168.1.17:1521:XE" \
--username "xxxxx" \
--password-file /user/cloudera/credentials/ora.password \
--target-dir /user/cloudera/oracle \
--compression-codec snappy \
--query "select EMPLOYEE_ID, START_DATE, END_DATE, JOB_ID, DEPARTMENT_ID, cast(TSTAMP as TIMESTAMP) from SQOOPDB.JOB_HISTORY where \$CONDITIONS" \
--hive-table job_history9 \
--hive-database oracle \