Created on 09-09-2015 09:34 PM - edited 09-16-2022 02:40 AM
Hi,
I m very new to hadoop as well as to distributed computing. I was trying to do the first exercise in cloudera live demo and got stuck in the part where we have to import the data using sqoop. Firstly, i see that the cloud cluster CDH 5.2.0 has the option of only sqoop2 and secondly, I don't see any terminal to run the given command. Sorry if my question is very lame. Can someone please help me with where exactly to input this sqoop query. Thanks !
Created 09-13-2015 09:52 PM
Created 09-10-2015 06:40 AM
Created 09-13-2015 09:01 PM
Got it! Thanks.
But now I m getting the below error while running the following sqoop import command.
[root@g5157-cldramaster-01 ~]# sqoop import-all-tables \
> -m 3 \
> --connect jdbc:mysql://208.113.123.213:3306/retail_db \
> --username=retail_dba \
> --password=cloudera \
> --compression-codec=snappy \
> --as-parquetfile \
> --warehouse-dir=/user/hive/warehouse \
> --hive-import
Warning: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/13 20:47:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
15/09/13 20:47:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/13 20:47:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/09/13 20:47:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/09/13 20:47:04 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
15/09/13 20:47:04 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
15/09/13 20:47:04 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
15/09/13 20:47:04 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
15/09/13 20:47:04 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
15/09/13 20:47:04 WARN tool.BaseSqoopTool: case that you will detect any issues.
15/09/13 20:47:04 INFO manager.SqlManager: Using default fetchSize of 1000
15/09/13 20:47:05 INFO tool.CodeGenTool: Beginning code generation
15/09/13 20:47:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/c621727433ffc0137ec3c8b84f7bd461/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/09/13 20:47:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/c621727433ffc0137ec3c8b84f7bd461/categories.jar
15/09/13 20:47:06 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/09/13 20:47:06 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/09/13 20:47:06 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/09/13 20:47:06 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/09/13 20:47:06 INFO mapreduce.ImportJobBase: Beginning import of categories
15/09/13 20:47:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/09/13 20:47:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:07 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
15/09/13 20:47:07 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=null
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=null
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.create(Datasets.java:189)
at org.kitesdk.data.Datasets.create(Datasets.java:240)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:102)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:105)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Can you please help me with this? Thanks!
Created 09-13-2015 09:52 PM
Created 09-14-2015 07:40 PM
I did exactly what you had mentioned, used the cloudera website tutorial instead of goGrid one. The sqoop import worked fine with avro dataformat. Thanks a lot!
Created 05-21-2016 01:39 AM
Hi, I am having the same problem in quickstart VM 5.7, but quick start VM 5.5 is just working fine with the same command.
I suspect this is a bug linked with quick start VM 5.7.
Please provide a workaround for quickstart VM 5.7.
Created 05-21-2016 10:27 AM
Can you post the output from when you first ran the Sqoop job? It's very unlikely you're hitting the same root problem. If you try it again, be sure you add the --hive-overwrite option since some of the tables & metadata will have already been created.
Created 05-21-2016 12:02 PM
sorry, mistake on my side.
The error I got is different, was caused by having loaded examples, then run the exercise 1, triggered by existing directory in HDFS.