- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Tutorial Excercise 1 - Avro Data Files are Not Created
Created on ‎11-24-2015 06:27 AM - edited ‎09-16-2022 02:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm sure this is a repeat, but I've searched the forum and either haven't found an answer to my problem, or I have no clue what the proposed solutions are suggesting.
I'm completely new to big data, hadoop, and cloudera. I followed the first tutorial to the letter, copied and pasted all commands, and yet I'm sure I'm not getting the expected results. For example, the tutorial states: "
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
Will show a folder for each of the tables."
Yet, when I copy and paste the command all that is returned is: "[cloudera@quickstart ~] $"
I've noticed that the sample screens in the tutorial have "rgardner - root@cloudera1:~ - ssh - 105x25," and have read in the forum that the commands given in the tutorial only work when in the home directory, but all the commands in the tutorial have "[cloudera@quickstart~]," so I don't understand why they wouldn't work from that directory. Furthermore, I wouldn't know how to get to the home directory if that were necessary.
Here is the terminal including my command. The only difference between mine and the tutorial is that mine is all on one line - a recommendation from the forum.
[cloudera@quickstart ~]$ sqoop import-all-tables -m 1 --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba --password=cloudera --compression-codec=snappy --as-avrodatafile --warehouse-dir=/user/hive/warehouse Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 15/11/24 05:58:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.4.2 15/11/24 05:58:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 15/11/24 05:58:48 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 15/11/24 05:58:49 INFO tool.CodeGenTool: Beginning code generation 15/11/24 05:58:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/11/24 05:58:49 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/11/24 05:58:49 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/categories.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 15/11/24 05:58:57 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/categories.jar 15/11/24 05:58:57 WARN manager.MySQLManager: It looks like you are importing from mysql. 15/11/24 05:58:57 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 15/11/24 05:58:57 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 15/11/24 05:58:57 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 15/11/24 05:58:57 INFO mapreduce.ImportJobBase: Beginning import of categories 15/11/24 05:58:57 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 15/11/24 05:58:58 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 15/11/24 05:59:01 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/11/24 05:59:02 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/c490e5a7cb4bc3d3cc154027c260f157/sqoop_import_categories.avsc 15/11/24 05:59:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 15/11/24 05:59:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 15/11/24 05:59:04 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/staging/cloudera/.staging. Name node is in safe mode. The reported blocks 379 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 381. The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4302) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4277) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:852) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) 15/11/24 05:59:04 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hadoop-yarn/staging/cloudera/.staging. Name node is in safe mode. The reported blocks 379 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 381. The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1413) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4302) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4277) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:852) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:321) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:601) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038) [cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse [cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/categories ls: `/user/hive/warehouse/categories': No such file or directory
I would think the issue is in the first two returned lines: "Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation." But no one mentions this as an issue in any of the forum posts. If I knew how to set $ACCUMULO_HOME to the root of my Accumulo installation, I would give that a shot.
I apologize for the repeat, but any help would be greatly appreciated.
Thank you.
Created on ‎12-03-2015 08:23 PM - edited ‎12-03-2015 08:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently, the datanode was in in safemode. The following command got everything working:
sudo -u hdfs hdfs dfsadmin -safemode leave
Unfortunately, it seems every time I restart the VM the datanode is placed into safemode and I have to enter the command every time. Not sure why that's the case.
Created ‎11-24-2015 06:43 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are in the home directory, and the $ACCUMULO_HOME warning can be ignored - it only matters if you want to use Accumulo.
The real problem is this:
The number of live datanodes 1 has reached the minimum number 0
It may be as simple as running 'sudo service hadoop-hdfs-datanode restart', and then 'sudo service hadoop-hdfs-datanode status' to check that it's still up after a few seconds. If you continue to have problems, check the logs in /var/log/hadoop-hdfs to see any possible errors. If all else fails, a reboot will restart all the services in the correct order and usually correct any little issues like this.
Created ‎11-28-2015 09:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for replying, Sean. Unfortunately, that didn't work. I tried restarting the datanode(?), checked the status, and it said the Hadoop datanode was running.
Tried running the code from the tutorial and got the same results as before. I checked the status of the datanode again and was told it was still running.
I tried shutting everything down and starting over - that did not work.
I checked the /var/log/hadoop-hdfs folder. There are 24 files in it, 9 were created today. None of them are obvious error logs. Checking each reveals various errors:
"ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM" appears repeatedly in two of those files.
" ERROR org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in doCheckpoint" appears repeatedly in another.
Thank you again for your help.
Created on ‎12-03-2015 08:23 PM - edited ‎12-03-2015 08:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Apparently, the datanode was in in safemode. The following command got everything working:
sudo -u hdfs hdfs dfsadmin -safemode leave
Unfortunately, it seems every time I restart the VM the datanode is placed into safemode and I have to enter the command every time. Not sure why that's the case.
Created ‎01-03-2016 08:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. This did the trick.
Created ‎02-15-2016 04:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
