Support Questions

Find answers, ask questions, and share your expertise

Importing sqoop data - cloudera live exercise 1

avatar
New Contributor

Hi,

 

I m very new to hadoop as well as to distributed computing. I was trying to do the first exercise in cloudera live demo and got stuck in the part where we have to import the data using sqoop. Firstly, i see that the cloud cluster CDH 5.2.0 has the option of only sqoop2 and secondly, I don't see any terminal to run the given command. Sorry if my question is very lame. Can someone please help me with where exactly to input this sqoop query. Thanks !
 

1 ACCEPTED SOLUTION

avatar
Guru
There's a known issue in CDH 5.2.0 (that's since been fixed) that prevents
Sqoop from importing Parquet datasets. I think you might be using a version
of the tutorial that's not intended for your specific cluster (which would
also explain why you missed the instructions for logging in on GoGrid,
specifically). Are you using the tutorial hosted on Cloudera's website,
perhaps? It's intended for a generic environment using a more recent
version - I'd recommend using the one that's customized for the version of
CDH on your cluster and running in GoGrid's environment.

You should be able to find a link to the "Guidance Page" in your welcome
email, and the tutorial is one of the resources linked to from that page.
If you pick up where you are now in that version of the tutorial, you
shouldn't have issues like this.

View solution in original post

7 REPLIES 7

avatar
Guru
Your cluster has both Sqoop 1 and Sqoop 2, and they are both managed by
Cloudera Manager (although Sqoop 2 is a service, rather than a command-line
tool, and that service is not started by default - but the tool and
configuration for the command-line tool is deployed on all the machines in
the cluster). I think when you say you see only one, you're referring to
the Sqoop 2 app in Hue? If so, then yes, the Sqoop app in Hue only supports
Sqoop 2.

To run the Sqoop command in the tutorial, you must be logged in to the
manager node via SSH. You can use PuTTY (on Windows) or the ssh command
(OpenSSH, on everything else). A page or two before the Sqoop step in the
tutorial, there should be directions for getting the SSH credentials to
your GoGrid cluster.

avatar
Explorer

Got it! Thanks. 

But now I m getting the below error while running the following sqoop import command. 

 

[root@g5157-cldramaster-01 ~]# sqoop import-all-tables \
> -m 3 \
> --connect jdbc:mysql://208.113.123.213:3306/retail_db \
> --username=retail_dba \
> --password=cloudera \
> --compression-codec=snappy \
> --as-parquetfile \
> --warehouse-dir=/user/hive/warehouse \
> --hive-import
Warning: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/09/13 20:47:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
15/09/13 20:47:04 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/09/13 20:47:04 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
15/09/13 20:47:04 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
15/09/13 20:47:04 WARN tool.BaseSqoopTool: It seems that you're doing hive import directly into default
15/09/13 20:47:04 WARN tool.BaseSqoopTool: hive warehouse directory which is not supported. Sqoop is
15/09/13 20:47:04 WARN tool.BaseSqoopTool: firstly importing data into separate directory and then
15/09/13 20:47:04 WARN tool.BaseSqoopTool: inserting data into hive. Please consider removing
15/09/13 20:47:04 WARN tool.BaseSqoopTool: --target-dir or --warehouse-dir into /user/hive/warehouse in
15/09/13 20:47:04 WARN tool.BaseSqoopTool: case that you will detect any issues.
15/09/13 20:47:04 INFO manager.SqlManager: Using default fetchSize of 1000
15/09/13 20:47:05 INFO tool.CodeGenTool: Beginning code generation
15/09/13 20:47:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:05 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/c621727433ffc0137ec3c8b84f7bd461/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/09/13 20:47:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/c621727433ffc0137ec3c8b84f7bd461/categories.jar
15/09/13 20:47:06 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/09/13 20:47:06 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/09/13 20:47:06 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/09/13 20:47:06 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/09/13 20:47:06 INFO mapreduce.ImportJobBase: Beginning import of categories
15/09/13 20:47:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/09/13 20:47:07 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/09/13 20:47:07 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
15/09/13 20:47:07 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=null
org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=null
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
at org.kitesdk.data.Datasets.create(Datasets.java:189)
at org.kitesdk.data.Datasets.create(Datasets.java:240)
at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:665)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:102)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportAllTablesTool.run(ImportAllTablesTool.java:105)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

 

Can you please help me with this? Thanks!

avatar
Guru
There's a known issue in CDH 5.2.0 (that's since been fixed) that prevents
Sqoop from importing Parquet datasets. I think you might be using a version
of the tutorial that's not intended for your specific cluster (which would
also explain why you missed the instructions for logging in on GoGrid,
specifically). Are you using the tutorial hosted on Cloudera's website,
perhaps? It's intended for a generic environment using a more recent
version - I'd recommend using the one that's customized for the version of
CDH on your cluster and running in GoGrid's environment.

You should be able to find a link to the "Guidance Page" in your welcome
email, and the tutorial is one of the resources linked to from that page.
If you pick up where you are now in that version of the tutorial, you
shouldn't have issues like this.

avatar
Explorer

I did exactly what you had mentioned, used the cloudera website tutorial instead of goGrid one. The sqoop import worked fine with avro dataformat. Thanks a lot!

avatar
New Contributor

Hi, I am having the same problem in quickstart VM 5.7, but quick start VM 5.5 is just working fine with the same command. 

 

I suspect this is a bug linked with quick start VM 5.7.

 

Please provide a workaround for quickstart VM 5.7.

avatar
Rising Star

Can you post the output from when you first ran the Sqoop job? It's very unlikely you're hitting the same root problem. If you try it again, be sure you add the --hive-overwrite option since some of the tables & metadata will have already been created.

avatar
New Contributor

sorry, mistake on my side.

 

The error I got is different, was caused by having loaded examples, then run the exercise 1, triggered by existing directory in HDFS.