03-20-2018 01:39 AM
Apologies if my lingo is a bit off, this is actually my first post.
My Cloudera Quickstart VM has become very unstable lately especially when I try to run Hive and/or Impala-shell.
I run the simplest query, in this case "DESC <table name>" and it pretty much crashes.
When I try to execute other queries and it gets past this stage, I receive the "There are 0 datanode(s) running and no node(s) are excluded in this operation" error message. Yes I have restarted the datanode on each of these occassions and that hasn't helped. I've literally had to shutdown the VM and restart it (U know this can be a pain).
Question: Is there a quick patch to fix this? if not, is there an entirely new VM with improved performance in this regard?
Thanks and please forgive the lingo. I'm new here.
03-20-2018 05:44 AM
Welcome to the community @BlaQBobby. :)
To help others answer your question, could you provide a little more detail?
Cy Jervis, Community Manager
03-20-2018 06:42 AM
Thanks for reaching out cjervis;
I'll try as best I can to provide required details.
Running Cloudera VM on Windows 10, 64 Bit (Base OS) but VM is loaded on Linux 64 Bit as mimicked OS (Hope this makes sense)
Using Cloudera VM 5.4.2
Allocated 2536MB of memory to VM. 4 Processors,
Previously all I've done is run the HUE examples out of the box as indicated in the Cloudera incorporated Tutorial. Recently I've gotten a bit more adventurous and created a few (NOT COMPLICATED) databases and tried querying from IMPALA and HIVE all CLI. That's when I started having these issues.
Negative. Not running Cloudera Manager.
03-21-2018 05:50 AM
Thank you for the information.
The first thing I see if that you are using an old QuickStart VM. Is that for testing purposes around a similar production cluster or another reason?
The second thing that catches my eye is the amount of RAM allocated to the VM. As outlined in our community article on how to setup the QuickStart VM the minimum RAM required to run CDH 5 is 4 GB. Depending on the host system's available RAM I would look into allocating more to the VM. Keep in mind that you need to leave some RAM for the host system to run Windows and such. So, if your system only has 4GB of RAM, you are not going to be able to allocate more to the VM.
My suggestions would be to at a minimum, allocate more RAM to the VM if possible. I also suggest starting fresh with the latest QuickStart VM unless you need to stay with 5.4.2 for some reason.
Cy Jervis, Community Manager
03-23-2018 04:31 AM
I have 8GB or RAM available. I did as you suggested. Downloaded the latest version of ClouderaVM.
Allocated 4096MB to it.
And I've tried running a few sqoop jobs.
All of them failed. Here's the error report I'm getting:
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 18/03/23 03:02:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0 18/03/23 03:02:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 18/03/23 03:02:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 18/03/23 03:02:48 INFO tool.CodeGenTool: Beginning code generation 18/03/23 03:02:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products_replica` AS t LIMIT 1 18/03/23 03:02:52 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `products_replica` AS t LIMIT 1 18/03/23 03:02:53 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce Note: /tmp/sqoop-cloudera/compile/83f36a488de14e592950ed57ba3d2a91/products_replica.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 18/03/23 03:03:24 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/83f36a488de14e592950ed57ba3d2a91/products_replica.jar 18/03/23 03:03:31 WARN manager.MySQLManager: It looks like you are importing from mysql. 18/03/23 03:03:31 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 18/03/23 03:03:31 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 18/03/23 03:03:31 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 18/03/23 03:03:37 INFO mapreduce.ImportJobBase: Beginning import of products_replica 18/03/23 03:03:37 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 18/03/23 03:04:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 18/03/23 03:04:52 WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: select min(product_id),max(product_id) from products_replica where product_id>1111; splits may not partition data. 18/03/23 03:06:26 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 18/03/23 03:06:56 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 18/03/23 03:09:36 WARN hdfs.DFSClient: Slow waitForAckedSeqno took 33145ms (threshold=30000ms). File being written: /tmp/hadoop-yarn/staging/cloudera/.staging/job_1521798877433_0001/libjars/kite-data-hive.jar, block: BP-1067413441-127.0.0.1-1508775264580:blk_1073743036_2219, Write pipeline datanodes: null 18/03/23 03:10:58 INFO db.DBInputFormat: Using read commited transaction isolation 18/03/23 03:10:58 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: select min(product_id),max(product_id) from products_replica where product_id>1111 18/03/23 03:11:00 INFO db.IntegerSplitter: Split size: 46; Num splits: 5 from: 1112 to: 1345 18/03/23 03:11:04 INFO mapreduce.JobSubmitter: number of splits:5 18/03/23 03:11:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1521798877433_0001 18/03/23 03:11:27 INFO impl.YarnClientImpl: Submitted application application_1521798877433_0001 18/03/23 03:11:37 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1521798877433_0001/ 18/03/23 03:11:37 INFO mapreduce.Job: Running job: job_1521798877433_0001 18/03/23 03:27:12 INFO mapreduce.Job: Job job_1521798877433_0001 running in uber mode : false 18/03/23 03:27:13 INFO mapreduce.Job: map 0% reduce 0% 18/03/23 03:27:15 INFO mapreduce.Job: Job job_1521798877433_0001 failed with state FAILED due to: Application application_1521798877433_0001 failed 2 times due to ApplicationMaster for attempt appattempt_1521798877433_0001_000002 timed out. Failing the application. 18/03/23 03:27:23 INFO mapreduce.Job: Counters: 0 18/03/23 03:27:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 18/03/23 03:27:28 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 1,256.0951 seconds (0 bytes/sec) 18/03/23 03:27:31 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 18/03/23 03:27:31 INFO mapreduce.ImportJobBase: Retrieved 0 records. 18/03/23 03:27:31 ERROR tool.ImportTool: Import failed: Import job failed!
Looks like the task is timing out for some reason.
This has happened consistently. No sqoop jobs have run completely.
I need help. Thanks a lot.
For reference sakes: Here's the sqoop job run most recently:
[cloudera@quickstart ~]$ sqoop import \ > --connect jdbc:mysql://quickstart.cloudera:3306/retail_db \ > --username retail_dba \ > --password cloudera \ > --table products_replica \ > -m 5 \ > --null-string "NA" \ > --null-non-string -1000 \ > --where 'product_id > 1111' \ > --as-textfile \ > --target-dir /user/cloudera/problem5/products-text-part2 \ > --boundary-query 'select min(product_id),max(product_id) from products_replica where product_id>1111' \ > --outdir ~/problem5
03-24-2018 06:55 AM
Hmm. I'm pretty good at spotting QuickStart VM setup issues but now you are getting beyond my level of knowledge as Community Manager. I'll see if I can find someone to take a look.
Cy Jervis, Community Manager
03-24-2018 07:24 PM - edited 03-24-2018 07:53 PM
Could you please click on the below url.from the log stack Trace
"The url to track the job: " and
Share the logs
also could you remove " where " from the boundary query and try it
--boundary-query 'select min(product_id),max(product_id) from products_replica
03-30-2018 02:56 AM
Thanks for your help @cjervis
The VM remains quite unstable but sometimes manages to perform properly.
It's abit here and there actually.
Hopefully someone can spot what's wrong. Appreciate it.