Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

how to sqoop with Impala

avatar

Dear Cloudera Community,

 

is there any syntax to replace my query below for Impala database instead of Hive? I am intended to ingest table into impala from Oracle.

 

sqoop import --hive-import --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=192.168.1.3)(port=1521))(connect_data=(service_name=cdb)))" --table EMPLOYEE1 --username system --password XXXX --m 1

 

Regards

Anis

1 ACCEPTED SOLUTION

avatar

I dont fully understand your goal, but sqoop just transfers the data from Oracle to HDFS and updates the metadata. As mentioned above, Impala is absolutely fine with this, just do invalidate metadata or refresh on that table which was imported.

 

Regarding your case, if you need to extend the table - add one column - I would do it by transfering the imported table into a managed table (via Hive or Impala) and in that simple insert into ... select statement I would do the additional column.  For this case wou don't need row level insert neither update or delete. And if you need to change some records (or remove) - you can still overwrite the partition. This approach is common for Hive and Impala (unless you are using Kudu, or Hive new transactions).

 

But the basics remain the same - HDFS is a append only system, si Hive does nothing else, just keeps the list of changes (inserts, deletes, updates) in separate file and then time by time overwrites the tables in the background.

 

Subquery is fully supported as far as I know, the data types can be easily changed via alter table replace command.

 

 

View solution in original post

4 REPLIES 4

avatar
Explorer

Hi Anis

Since Impala can use Hive's metadata. I think it's ok to use the command to import data from RDBMS to Hive and use impala to query it.

avatar

Thanks for advice and infact I am doing this only but I have one usecase where I am suppose to import one table from oracle11g to either Hive/Impala then add one field called DIMdatekey in resulting FACT table.

 

is it supported to add one column ie DIMdatekey in Hive's fact table and populate that field from DateDimension which is there in Hive.

 

Also, I am afraid of use of Hive knowing this fact below and like to use only Impala with Sqoop.

 

Limitation of Hive:

 1--> All the ANSI SQL standard queries are not supported by HIVE QL(Hive query language)

 2--> Sub queries are not supported

 3--> No support for update and delete operation

4--> doesn’t support ROW level Insert, Update, Delete. But you can generate new tables   from queries or output query results to files.

 5--> cannot change the column data type after the table is created

 

look forward someone to advice me on OLAP usecase with Hive/Impala in tandem with sqoop.

 

Regards

Anis

 

 

avatar

I dont fully understand your goal, but sqoop just transfers the data from Oracle to HDFS and updates the metadata. As mentioned above, Impala is absolutely fine with this, just do invalidate metadata or refresh on that table which was imported.

 

Regarding your case, if you need to extend the table - add one column - I would do it by transfering the imported table into a managed table (via Hive or Impala) and in that simple insert into ... select statement I would do the additional column.  For this case wou don't need row level insert neither update or delete. And if you need to change some records (or remove) - you can still overwrite the partition. This approach is common for Hive and Impala (unless you are using Kudu, or Hive new transactions).

 

But the basics remain the same - HDFS is a append only system, si Hive does nothing else, just keeps the list of changes (inserts, deletes, updates) in separate file and then time by time overwrites the tables in the background.

 

Subquery is fully supported as far as I know, the data types can be easily changed via alter table replace command.

 

 

avatar
Contributor

Dear @AnisurRehman 

You can import data from RDBMS to HDFS only with SQOOP. Then If you want to manipulate this table through Impala-Shell then you only need to run the following command from a pc where Impala is installed.

impala-shell -d db_name -q "INVALIDATE METADATA tablename";

 

You have to do INVALIDATE because your table is new for Impala daemon metadata. 

Then if you append new data-files to the existing tablename table you only need to do refesh, the command is 

impala-shell -d db_name -q "REFRESH tablename";

Refresh due to the fact that you do not want the whole metadata for the specific table, only the block location for the new data-files.

 

So after that you can quey the table through Impala-shell and Impala query editor.