About eugenerory

eugenerory · ‎03-17-2015

Sean, Your procedure for stopping MR2/YARN and starting MR1 solved the problem. I am not sure if you are familiar with R but my purpose is to set up R and Hadoop together. I did this using that blog I sent the link for. The mapreduce jobs run now and and output file is created as a result of a very simple 3 line R test code. But when I try to access that file, I get an "output file does not exist" error, which is given below. Any comments here that could help me proceed would be very very appreciated. Thanks. -ER Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/0 at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/128432 at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/422 at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread "main" java.io.FileNotFoundException: File does not exist: hdfs://localhost:8020/user/cloudera/122 at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1093) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085) at org.apache.hadoop.streaming.DumpTypedBytes.run(DumpTypedBytes.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:41) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

eugenerory · ‎03-13-2015

Sean, Thank you for your response. I am running these commands on the VM. I am just using the terminal on my VM's desktop. I am reading the text that says "login to the master node of your cluster using SSH" in the web browser that is opened automatically upon starting the VM at the address (http://quickstart.cloudera/#/tutorial/ingest_structured_data). I am using Oracle VM VirtualBox Manager 4.3.20. I don't think I made any configuration changes before running sqoop. I just opened the cloudera-quickstart-vm-5.3.0-0-virtualbox-disk1.vmdk using my Virtual Box. I made some changes to use R and hadoop together using the blog at http://blogr-cs.blogspot.com/2012/12/integration-of-r-rstudio-and-hadoop-in.html but I think those are irrelevant. I do not have sqoop on my host machine. I'd really really appreciate if you could please suggest some solutions I can understand/implement. Thank you. ER

eugenerory · ‎03-13-2015

Morgan, Thanks for your repy. I can connect to mysql - I don't think that is the problem. The problem is I can't connect to something and mapreduce jobs cannot be performed. In the very first tutorial on cloudera, it reads "You should first log in to the Master Node of your cluster using SSH - you can get the credentials using the instructions on Your Cloudera Cluster. ". I don't know how to do this. I'm just using the cloudera quickstart VM via Virtual Box. I start the VM and open the terminal, and enter the lines sqoop import-all-tables \ -m 1 \ --connect jdbc:mysql://quickstart.cloudera:3306/retail_db \ --username=retail_dba \ --password=cloudera \ --compression-codec=snappy \ --as-avrodatafile \ --warehouse-dir=/user/hive/warehouse and I get a connection refused error. SAme error happens if I try to use R and Hadoop together. It seems like I can't connect to a server or something? Do I have to really login to a server after starting my VM? I'm just trying to learn and all of it is new to me. Thanks for your help. Here is the complete outpue I get after I run the above comments. I am on MacOSX 10.7.5, using cloudera quickstart VM via Virtual Box. Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 15/03/13 09:44:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0 15/03/13 09:44:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 15/03/13 09:44:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 15/03/13 09:44:56 INFO tool.CodeGenTool: Beginning code generation 15/03/13 09:44:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/03/13 09:44:56 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/03/13 09:44:56 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce Note: /tmp/sqoop-cloudera/compile/47d81c933f89fd992607ae4a35707074/categories.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 15/03/13 09:44:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/47d81c933f89fd992607ae4a35707074/categories.jar 15/03/13 09:44:59 WARN manager.MySQLManager: It looks like you are importing from mysql. 15/03/13 09:44:59 WARN manager.MySQLManager: This transfer can be faster! Use the --direct 15/03/13 09:44:59 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path. 15/03/13 09:44:59 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql) 15/03/13 09:44:59 INFO mapreduce.ImportJobBase: Beginning import of categories 15/03/13 09:45:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1 15/03/13 09:45:00 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-cloudera/compile/47d81c933f89fd992607ae4a35707074/sqoop_import_categories.avsc 15/03/13 09:45:02 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:03 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:04 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:05 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:06 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8021. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 15/03/13 09:45:11 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:java.net.ConnectException: Call From quickstart.cloudera/127.0.0.1 to localhost:8021 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 15/03/13 09:45:11 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: java.net.ConnectException: Call From quickstart.cloudera/127.0.0.1 to localhost:8021 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused thanks, ER

eugenerory · ‎03-12-2015

Hi, I start my quickstart VM and enter the terminal the following: sqoop import-all-tables \ -m 1 \ --connect jdbc:mysql://quickstart.cloudera:3306/retail_db \ --username=retail_dba \ --password=cloudera \ --compression-codec=snappy \ --as-avrodatafile \ --warehouse-dir=/user/hive/warehouse but get the following error failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused Streaming Command Failed! Error in mr(map = map, reduce = reduce, combine = combine, in.folder = if (is.list(input)) { : hadoop streaming failed with error code 5 any help is appreciated. I am just a beginner and I dont know anything thanks.

Online	Offline
Last Visited	‎03-17-2015 01:24 PM

Member Since	‎03-12-2015 07:54 PM
Last Visited	‎03-17-2015 01:24 PM
Posts	5

Cloudera Community

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

Re: I try to run commands at the terminal but get ...

I try to run commands at the terminal but get a co...