Created on 07-24-2015 06:11 AM - edited 09-16-2022 02:35 AM
Hi, I got the following error when running:
sqoop import-all-tables -m 12 --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --username=retail_dba --password=cloudera --compression-codec=snappy --as-parquetfile --warehouse-dir=/user/hive/warehouse --hive-importhist
INFO hive.metastore: Connected to metastore.
15/07/24 14:49:16 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset: default.categories
org.kitesdk.data.DatasetExistsException: Metadata already exists for dataset: default.categories
at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:51)
at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77)
I guess the problem might be triggered by my previous running of the following command:
sqoop import-all-tables -m 12 --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --username=retail_dba --password=cloudera --compression-codec=snappy --as-avrodatafile --warehouse-dir=/user/hive/warehouse
I have deleted /user/hive/warehouse, but it didn't help.
Any hint how to solve this? Thanks!
Hadoop321
Created 07-24-2015 06:32 AM
/user/hive/warehouse stores the data files, but the metadata (information about the structure and location of the data files) is managed by Hive. Connect to either Impala or Hive (you'll find instructions for doing so later in Tutorial Exercise 1, or Tutorial Exercise 2, depending on which version of the tutorial you're using. Once connected run 'show tables;', and you'll see a list of the tables it has metadata for. For each of these tables (assuming there isn't other data, unrelated to the tutorial that is already stored there), run 'drop table <table_name>;' When none of the tables from retail_db are shown when you run 'show tables;', the Sqoop job should be able to succeed.
Created 07-24-2015 06:32 AM
/user/hive/warehouse stores the data files, but the metadata (information about the structure and location of the data files) is managed by Hive. Connect to either Impala or Hive (you'll find instructions for doing so later in Tutorial Exercise 1, or Tutorial Exercise 2, depending on which version of the tutorial you're using. Once connected run 'show tables;', and you'll see a list of the tables it has metadata for. For each of these tables (assuming there isn't other data, unrelated to the tutorial that is already stored there), run 'drop table <table_name>;' When none of the tables from retail_db are shown when you run 'show tables;', the Sqoop job should be able to succeed.
Created 07-24-2015 06:35 AM
According to the Sqoop documentation, the --hive-overwrite command should also allow you to do this without manually dropping the tables first, but I haven't tested that myself.
Created 07-24-2015 11:29 AM
it works perfectly. Thanks a lot for the tip!