Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Exercise 1 error

avatar
New Contributor

Hi, I got the following error when running:

sqoop import-all-tables     -m 12     --connect jdbc:mysql://quickstart.cloudera:3306/retail_db     --username=retail_dba     --password=cloudera     --compression-codec=snappy     --as-parquetfile     --warehouse-dir=/user/hive/warehouse     --hive-importhist

INFO hive.metastore: Connected to metastore.

15/07/24 14:49:16 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset: default.categories

org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset: default.categories

at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:51)

at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77)

 

I guess the problem might be triggered by my previous running of the following command:

 

sqoop import-all-tables   -m 12   --connect jdbc:mysql://quickstart.cloudera:3306/retail_db   --username=retail_dba   --password=cloudera   --compression-codec=snappy   --as-avrodatafile   --warehouse-dir=/user/hive/warehouse

 

I have deleted /user/hive/warehouse, but it didn't help. 

Any hint how to solve this? Thanks!

 

Hadoop321

1 ACCEPTED SOLUTION

avatar
Guru

/user/hive/warehouse stores the data files, but the metadata (information about the structure and location of the data files) is managed by Hive. Connect to either Impala or Hive (you'll find instructions for doing so later in Tutorial Exercise 1, or Tutorial Exercise 2, depending on which version of the tutorial you're using. Once connected run 'show tables;', and you'll see a list of the tables it has metadata for. For each of these tables (assuming there isn't other data, unrelated to the tutorial that is already stored there), run 'drop table <table_name>;' When none of the tables from retail_db are shown when you run 'show tables;', the Sqoop job should be able to succeed.

View solution in original post

3 REPLIES 3

avatar
Guru

/user/hive/warehouse stores the data files, but the metadata (information about the structure and location of the data files) is managed by Hive. Connect to either Impala or Hive (you'll find instructions for doing so later in Tutorial Exercise 1, or Tutorial Exercise 2, depending on which version of the tutorial you're using. Once connected run 'show tables;', and you'll see a list of the tables it has metadata for. For each of these tables (assuming there isn't other data, unrelated to the tutorial that is already stored there), run 'drop table <table_name>;' When none of the tables from retail_db are shown when you run 'show tables;', the Sqoop job should be able to succeed.

avatar
Guru

According to the Sqoop documentation, the --hive-overwrite command should also allow you to do this without manually dropping the tables first, but I haven't tested that myself.

avatar
New Contributor

it works perfectly. Thanks a lot for the tip!