Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Lesson 1 of Cloudera Live Tutorial

Lesson 1 of Cloudera Live Tutorial

Explorer

Hello,

 

I am trying to run the initial statement that sets up the environment:

 

sqoop import-all-tables \
    -m 12 \
    --connect jdbc:mysql://216.121.84.2:3306/retail_db \
    --username=retail_dba \
    --password=cloudera \
    --compression-codec=snappy \
    --as-avrodatafile \
    --warehouse-dir=/user/hive/warehouse

 

After running tht statement there are a couple of 'ls' commands to confirm things worked. When I ran those 2 statements I wasn't getting output. After hunting around on the forum I tried replacing the IP address in the connect with the server name. That seemed to help some as I would now see the list of folders.

 

However, I notice and error stating: FileAlreadyExistsException: Output directory /user/hive/warehouse/categories already exists.  I assume this is why when I tried to inspect the contents of the categories folder it appears to be empty.

 

Any ideas? Is there some sort of cleanup script I need to run if I'm running the sqoop import-all-tables command a 2nd time?

 

Thanks in advance for your help.

 

Sincerely,

 

 

Tom

9 REPLIES 9

Re: Lesson 1 of Cloudera Live Tutorial

Master Collaborator
If the Sqoop job fails for any reason and you want to rerun it, you would
need to make sure the directories it creates under /user/hive/warehouse
don't exist. If you haven't created any other Hive tables, you can easily
just clear out the warehouse directory as follows:

sudo -u hdfs hadoop fs -rm -r /user/hive/warehouse/\*

And then the Sqoop job should work just fine...

Re: Lesson 1 of Cloudera Live Tutorial

Explorer

I solved this one on my  own but thought I'd post the solution. Apparently Hadoop (or at least sqoop) does not like deleting or overwriting files. Consequesntly, the directories used in the Sqoop command had to be deleted. Once I did this, it seemed to work. Here are the commands for that worked for me:

 

 

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/categories;

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/customers;

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/products;

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/departments;

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/order_items;

hdfs dfs -rm -r hdfs://g2316-cldramaster-01:8020/user/hive/warehouse/orders;

 

Sorry if this all seems obvious but assuming people reading this would be new to this as I am.

Re: Lesson 1 of Cloudera Live Tutorial

Explorer

Thanks. That is a lot easier than what I suggested.

Re: Lesson 1 of Cloudera Live Tutorial

New Contributor

I too can't get past the very first step of the getting started tutorial. And I'm totally mystified by the messges. Any help is greatly appreciated.

 

When I run the sqoop task to import tables, I get errors. And when I run the verification step, the data files don't exist in HDFS.

 

[cloudera@quickstart ~]$ sqoop import-all-tables -m 1 --connect jdbc:mysql://quickstart.cloudera:3306/retail_db --username=retail_dba --password=cloudera --compression-codec=snappy --as-avrodatafile --warehouse-dir=/user/hive/warehouse
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/05/01 08:07:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
15/05/01 08:07:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/05/01 08:07:45 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/05/01 08:07:46 INFO tool.CodeGenTool: Beginning code generation
15/05/01 08:07:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/05/01 08:07:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/05/01 08:07:46 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
error: error reading /usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar; /usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar (Permission denied)
Note: /tmp/sqoop-cloudera/compile/a91be5e61774406eabbd6c052a475fb7/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/05/01 08:07:48 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/a91be5e61774406eabbd6c052a475fb7/categories.jar
15/05/01 08:07:48 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/05/01 08:07:48 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/05/01 08:07:48 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/05/01 08:07:48 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/05/01 08:07:48 INFO mapreduce.ImportJobBase: Beginning import of categories
15/05/01 08:07:48 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/05/01 08:07:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/05/01 08:07:49 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/a91be5e61774406eabbd6c052a475fb7/sqoop_import_categories.avsc
15/05/01 08:07:49 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/05/01 08:07:50 INFO client.RMProxy: Connecting to ResourceManager at quickstart.cloudera/127.0.0.1:8032
15/05/01 08:07:51 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/cloudera/.staging/job_1430492756628_0002
15/05/01 08:07:51 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.io.FileNotFoundException: /usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar (Permission denied)
15/05/01 08:07:51 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: java.io.FileNotFoundException: /usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar (Permission denied)

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
ls: `/user/hive/warehouse': No such file or directory
[cloudera@quickstart ~]$

Re: Lesson 1 of Cloudera Live Tutorial

Master Collaborator

The reason Sqoop is failing is because it doesn't have permissions to read the file "/usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar", which has been put in its classpath. I don't believe that file is in the VM originally - is it possible you added that file? If you adjust the permission so Sqoop can read that file you shouldn't have a problem (or just remove that file if you don't need Sqoop to use the JDBC driver):

 

sudo chmod 644 /usr/lib/sqoop/lib/postgresql-9.4-1201.jdbc4.jar

Re: Lesson 1 of Cloudera Live Tutorial

New Contributor

I have the same problem, I tried everything that you suggest and It doesn't work, I don't know what happen :(

 

 

[cloudera@quickstart ~]$ sqoop import-all-tables \
>     -m 1 \
>     --connect jdbc:mysql://quickstart:3306/retail_db \
>     --username=retail_dba \
>     --password=cloudera \
>     --compression-codec=snappy \
>     --as-avrodatafile \
>     --warehouse-dir=/user/hive/warehouse
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/10/22 14:39:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2
15/10/22 14:39:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/22 14:39:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/10/22 14:39:58 INFO tool.CodeGenTool: Beginning code generation
15/10/22 14:39:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/10/22 14:39:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/10/22 14:39:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/60b0fc9ea6067e30ab996b33463f0e10/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/22 14:40:05 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/60b0fc9ea6067e30ab996b33463f0e10/categories.jar
15/10/22 14:40:05 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/10/22 14:40:05 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/10/22 14:40:05 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/10/22 14:40:05 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/10/22 14:40:05 INFO mapreduce.ImportJobBase: Beginning import of categories
15/10/22 14:40:05 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/10/22 14:40:06 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/22 14:40:11 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/10/22 14:40:12 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/60b0fc9ea6067e30ab996b33463f0e10/sqoop_import_categories.avsc
15/10/22 14:40:12 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/22 14:40:12 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/10/22 14:40:15 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/cloudera/.staging/job_1445508642837_0007
15/10/22 14:40:15 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/cloudera/.staging/job_1445508642837_0007. Name node is in safe mode.
The reported blocks 381 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 383.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4053)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4011)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3995)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:824)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:306)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:590)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

15/10/22 14:40:15 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot delete /tmp/hadoop-yarn/staging/cloudera/.staging/job_1445508642837_0007. Name node is in safe mode.
The reported blocks 381 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 383.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4053)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4011)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3995)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:824)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:306)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:590)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/categories/
ls: `/user/hive/warehouse/categories/': No such file or directory

Re: Lesson 1 of Cloudera Live Tutorial

Master Collaborator

Your log indicates that the HDFS DataNode is not running. It should be running on startup. You could try 'sudo service hadoop-hdfs-datanode restart' and then try again, and consult the logs in /var/log/hadoop-hdfs for error messages if the problem continues.

Re: Lesson 1 of Cloudera Live Tutorial

New Contributor

I have the same proble and this suggestion was not solved  it.

Re: Lesson 1 of Cloudera Live Tutorial

Master Collaborator
If you're getting identical error messages (I.e. 0 DataNodes online) and
you've tried restarting the hadoop-hdfs-datanode service, can you look in
the DataNode logs in /var/log/hadoop-hdfs for any error messages?

If you're getting other error messages in Sqoop, please post them.