Created on 04-08-2016 07:44 AM - edited 09-16-2022 03:12 AM
I have Cloudera CDH Quickstart 5.1 running in a VM in my network.
From my local machine I am accessing this VM via spark-shell in yarn-client mode. When I run
sqlContext.sql("show tables").show()
a list of my tables are shown, as expected. But when I run
sqlContext.sql("select * from test").show()
I got the following error:
sqlContext.sql("select * from test").show() 16/04/08 11:38:03 INFO parse.ParseDriver: Parsing command: select * from test 16/04/08 11:38:04 INFO parse.ParseDriver: Parse Completed 16/04/08 11:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(439200) called with curMem=0, maxMem=556038881 16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 428.9 KB, free 529.9 MB) 16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(42646) called with curMem=439200, maxMem=556038881 16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 41.6 KB, free 529.8 MB) 16/04/08 11:38:04 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.1.36:43673 (size: 41.6 KB, free: 530.2 MB) 16/04/08 11:38:04 INFO spark.SparkContext: Created broadcast 0 from show at <console>:20 java.lang.IllegalArgumentException: java.net.UnknownHostException: quickstart.cloudera at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:193) at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207) at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386) at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385) at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1315) at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1378) at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:402) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363) at org.apache.spark.sql.DataFrame.show(DataFrame.scala:371) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31) at $iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC.<init>(<console>:35) at $iwC.<init>(<console>:37) at <init>(<console>:39) at .<init>(<console>:43) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.UnknownHostException: quickstart.cloudera ... 94 more
From inside my VM, I can run SELECTs without any problem. I suspect I am missing some configuration. I got my configuration files from Cloudera Manager, acessing Hive > Actions > Download Client Configuration. Extract all files at HADOOP_CONF_DIR and replaced quickstart.cloudera to the actual VM IP using sed.
Any thoughts guys?
Created 04-08-2016 12:47 PM
In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:
quickstart.cloudera 10.10.10.123
10.10.10.123 begin the VM IP address. That solved the problem.
Created 04-08-2016 12:47 PM
In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:
quickstart.cloudera 10.10.10.123
10.10.10.123 begin the VM IP address. That solved the problem.