Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can "show tables" but don't "SELECT FROM" Hive tables is spark-shell yarn-client

avatar
Explorer

I have Cloudera CDH Quickstart 5.1 running in a VM in my network.

 

From my local machine I am accessing this VM via spark-shell in yarn-client mode. When I run

 

sqlContext.sql("show tables").show()

a list of my tables are shown, as expected. But when I run

 

sqlContext.sql("select * from test").show()

I got the following error:

 

sqlContext.sql("select * from test").show()
16/04/08 11:38:03 INFO parse.ParseDriver: Parsing command: select * from test
16/04/08 11:38:04 INFO parse.ParseDriver: Parse Completed
16/04/08 11:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(439200) called with curMem=0, maxMem=556038881
16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 428.9 KB, free 529.9 MB)
16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(42646) called with curMem=439200, maxMem=556038881
16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 41.6 KB, free 529.8 MB)
16/04/08 11:38:04 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.1.36:43673 (size: 41.6 KB, free: 530.2 MB)
16/04/08 11:38:04 INFO spark.SparkContext: Created broadcast 0 from show at <console>:20
java.lang.IllegalArgumentException: java.net.UnknownHostException: quickstart.cloudera
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:193)
	at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
	at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386)
	at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904)
	at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385)
	at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1315)
	at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1378)
	at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:402)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:371)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
	at $iwC$$iwC$$iwC.<init>(<console>:33)
	at $iwC$$iwC.<init>(<console>:35)
	at $iwC.<init>(<console>:37)
	at <init>(<console>:39)
	at .<init>(<console>:43)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: quickstart.cloudera
	... 94 more

From inside my VM, I can run SELECTs without any problem. I suspect I am missing some configuration. I got my configuration files from Cloudera Manager, acessing Hive > Actions > Download Client Configuration. Extract all files at HADOOP_CONF_DIR and replaced quickstart.cloudera to the actual VM IP using sed.

 

Any thoughts guys?

1 ACCEPTED SOLUTION

avatar
Explorer

In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:

 

quickstart.cloudera 10.10.10.123

10.10.10.123 begin the VM IP address. That solved the problem.

 

View solution in original post

1 REPLY 1

avatar
Explorer

In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:

 

quickstart.cloudera 10.10.10.123

10.10.10.123 begin the VM IP address. That solved the problem.