Support Questions
Find answers, ask questions, and share your expertise

Best way to import a database into impala

Solved Go to solution

Best way to import a database into impala

Explorer

Hi, 

I have a database and i want to find the best way to import it into impala. Please help me.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Re: Best way to import a database into impala

Cloudera Employee

Impala shares metadata and data with Hive. You can use sqoop to import the tables from your database into hive. Don't forget to run 'invalidate metadata' in Impala after the ingestion is done. Otherwise, you can't see the imported tables.

View solution in original post

Re: Best way to import a database into impala

Cloudera Employee

If a sqoop job failed/crashed in the middle of importing a table, the table is imported. When you run this job again, it will start from zero so you need to clear the partially imported data first. 

Alternatively, if you know which rows are not imported yet, you can use the WHERE clause when you restart the job to import the rest rows.

View solution in original post

3 REPLIES 3

Re: Best way to import a database into impala

Cloudera Employee

Impala shares metadata and data with Hive. You can use sqoop to import the tables from your database into hive. Don't forget to run 'invalidate metadata' in Impala after the ingestion is done. Otherwise, you can't see the imported tables.

View solution in original post

Re: Best way to import a database into impala

Explorer

@robbiez 

Thanks for your answer, but I want also to ask something.

If I have a big amount of data which are to be parsed and something goes wrong in the process of parsing, the parsing will start from the zero or I can do something in order to start again from the part which crashed the process? 

Re: Best way to import a database into impala

Cloudera Employee

If a sqoop job failed/crashed in the middle of importing a table, the table is imported. When you run this job again, it will start from zero so you need to clear the partially imported data first. 

Alternatively, if you know which rows are not imported yet, you can use the WHERE clause when you restart the job to import the rest rows.

View solution in original post