02-03-2017 01:26 AM
Dear Cloudera Community,
is there any syntax to replace my query below for Impala database instead of Hive? I am intended to ingest table into impala from Oracle.
sqoop import --hive-import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.1.3)(port=1521))(connect_data=(service_name=cdb)))" --table EMPLOYEE1 --username system --password XXXX --m 1
Solved! Go to Solution.
02-18-2017 02:56 AM
Thanks for advice and infact I am doing this only but I have one usecase where I am suppose to import one table from oracle11g to either Hive/Impala then add one field called DIMdatekey in resulting FACT table.
is it supported to add one column ie DIMdatekey in Hive's fact table and populate that field from DateDimension which is there in Hive.
Also, I am afraid of use of Hive knowing this fact below and like to use only Impala with Sqoop.
Limitation of Hive:
1--> All the ANSI SQL standard queries are not supported by HIVE QL(Hive query language)
2--> Sub queries are not supported
3--> No support for update and delete operation
4--> doesn’t support ROW level Insert, Update, Delete. But you can generate new tables from queries or output query results to files.
5--> cannot change the column data type after the table is created
look forward someone to advice me on OLAP usecase with Hive/Impala in tandem with sqoop.
02-19-2017 11:37 PM
I dont fully understand your goal, but sqoop just transfers the data from Oracle to HDFS and updates the metadata. As mentioned above, Impala is absolutely fine with this, just do invalidate metadata or refresh on that table which was imported.
Regarding your case, if you need to extend the table - add one column - I would do it by transfering the imported table into a managed table (via Hive or Impala) and in that simple insert into ... select statement I would do the additional column. For this case wou don't need row level insert neither update or delete. And if you need to change some records (or remove) - you can still overwrite the partition. This approach is common for Hive and Impala (unless you are using Kudu, or Hive new transactions).
But the basics remain the same - HDFS is a append only system, si Hive does nothing else, just keeps the list of changes (inserts, deletes, updates) in separate file and then time by time overwrites the tables in the background.
Subquery is fully supported as far as I know, the data types can be easily changed via alter table replace command.