Support Questions

Find answers, ask questions, and share your expertise

Tutorial Excercise 1 - Avro Data Files are Not Created

avatar
New Contributor

I'm sure this is a repeat, but I've searched the forum and either haven't found an answer to my problem, or I have no clue what the proposed solutions are suggesting.

 

I'm completely new to big data, hadoop, and cloudera. I followed the first tutorial to the letter, copied and pasted all commands, and yet I'm sure I'm not getting the expected results. For example, the tutorial states: "

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse

Will show a folder for each of the tables."

 

Yet, when I copy and paste the command all that is returned is: "[cloudera@quickstart ~] $"

 

I've noticed that the sample screens in the tutorial have "rgardner - root@cloudera1:~ - ssh - 105x25," and have read in the forum that the commands given in the tutorial only work when in the home directory, but all the commands in the tutorial have "[cloudera@quickstart~]," so I don't understand why they wouldn't work from that directory. Furthermore, I wouldn't know how to get to the home directory if that were necessary.

 

Here is the terminal including my command. The only difference between mine and the tutorial is that mine is all on one line - a recommendation from the forum.

 

[cloudera@quickstart ~]$ sqoop import-all-tables -m 1 --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba --password=cloudera --compression-codec=snappy --as-avrodatafile --warehouse-dir=/user/hive/warehouse
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/11/24 05:58:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2
15/11/24 05:58:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/11/24 05:58:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/11/24 05:58:49 INFO tool.CodeGenTool: Beginning code generation
15/11/24 05:58:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/11/24 05:58:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/11/24 05:58:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/11/24 05:58:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/categories.jar
15/11/24 05:58:57 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/11/24 05:58:57 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/11/24 05:58:57 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/11/24 05:58:57 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/11/24 05:58:57 INFO mapreduce.ImportJobBase: Beginning import of categories
15/11/24 05:58:57 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
15/11/24 05:58:58 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/11/24 05:59:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/11/24 05:59:02 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/sqoop_import_categories.avsc
15/11/24 05:59:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/11/24 05:59:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/11/24 05:59:04 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/staging/cloudera/.staging. Name node is in safe mode.
The reported blocks 379 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 381.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4302)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4277)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:852)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

15/11/24 05:59:04 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/staging/cloudera/.staging. Name node is in safe mode.
The reported blocks 379 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 381.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4302)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4277)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:852)
    at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/categories
ls: `/user/hive/warehouse/categories': No such file or directory

I would think the issue is in the first two returned lines: "Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation." But no one mentions this as an issue in any of the forum posts. If I knew how to set $ACCUMULO_HOME to the root of my Accumulo installation, I would give that a shot.

 

I apologize for the repeat, but any help would be greatly appreciated.

 

Thank you.

1 ACCEPTED SOLUTION

avatar
New Contributor

Apparently, the datanode was in in safemode. The following command got everything working:

 

sudo -u hdfs hdfs dfsadmin -safemode leave

 

Unfortunately, it seems every time I restart the VM the datanode is placed into safemode and I have to enter the command every time. Not sure why that's the case.

View solution in original post

5 REPLIES 5

avatar
Rising Star

You are in the home directory, and the $ACCUMULO_HOME warning can be ignored - it only matters if you want to use Accumulo.

 

The real problem is this:

 

The number of live datanodes 1 has reached the minimum number 0

 

It may be as simple as running 'sudo service hadoop-hdfs-datanode restart', and then 'sudo service hadoop-hdfs-datanode status' to check that it's still up after a few seconds. If you continue to have problems, check the logs in /var/log/hadoop-hdfs to see any possible errors. If all else fails, a reboot will restart all the services in the correct order and usually correct any little issues like this.

avatar
New Contributor

Thank you for replying, Sean. Unfortunately, that didn't work. I tried restarting the datanode(?), checked the status, and it said the Hadoop datanode was running.

 

Tried running the code from the tutorial and got the same results as before. I checked the status of the datanode again and was told it was still running.

 

I tried shutting everything down and starting over - that did not work.

 

I checked the /var/log/hadoop-hdfs folder. There are 24 files in it, 9 were created today. None of them are obvious error logs. Checking each reveals various errors:

 

 "ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM" appears repeatedly in two of those files.

 

" ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint" appears repeatedly in another.

 

Thank you again for your help.

 

avatar
New Contributor

Apparently, the datanode was in in safemode. The following command got everything working:

 

sudo -u hdfs hdfs dfsadmin -safemode leave

 

Unfortunately, it seems every time I restart the VM the datanode is placed into safemode and I have to enter the command every time. Not sure why that's the case.

avatar
New Contributor

Thank you. This did the trick.

avatar
New Contributor
apparently is needed restart and execute the exercise step again, but it works thanks