Support Questions

Find answers, ask questions, and share your expertise

Can "show tables" but don't "SELECT FROM" Hive tables is spark-shell yarn-client

avatar
Explorer

I have Cloudera CDH Quickstart 5.1 running in a VM in my network.

 

From my local machine I am accessing this VM via spark-shell in yarn-client mode. When I run

 

sqlContext.sql("show tables").show()

a list of my tables are shown, as expected. But when I run

 

sqlContext.sql("select * from test").show()

I got the following error:

 

sqlContext.sql("select * from test").show()
16/04/08 11:38:03 INFO parse.ParseDriver: Parsing command: select * from test
16/04/08 11:38:04 INFO parse.ParseDriver: Parse Completed
16/04/08 11:38:04 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(439200) called with curMem=0, maxMem=556038881
16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 428.9 KB, free 529.9 MB)
16/04/08 11:38:04 INFO storage.MemoryStore: ensureFreeSpace(42646) called with curMem=439200, maxMem=556038881
16/04/08 11:38:04 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 41.6 KB, free 529.8 MB)
16/04/08 11:38:04 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.1.1.36:43673 (size: 41.6 KB, free: 530.2 MB)
16/04/08 11:38:04 INFO spark.SparkContext: Created broadcast 0 from show at <console>:20
java.lang.IllegalArgumentException: java.net.UnknownHostException: quickstart.cloudera
	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
	at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)
	at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)
	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)
	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
	at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
	at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
	at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:193)
	at org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:207)
	at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386)
	at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1386)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
	at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1904)
	at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1385)
	at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1315)
	at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1378)
	at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:178)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:402)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)
	at org.apache.spark.sql.DataFrame.show(DataFrame.scala:371)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
	at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
	at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
	at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
	at $iwC$$iwC$$iwC.<init>(<console>:33)
	at $iwC$$iwC.<init>(<console>:35)
	at $iwC.<init>(<console>:37)
	at <init>(<console>:39)
	at .<init>(<console>:43)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
	at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
	at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
	at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
	at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
	at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
	at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
	at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
	at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
	at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
	at org.apache.spark.repl.Main$.main(Main.scala:31)
	at org.apache.spark.repl.Main.main(Main.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.UnknownHostException: quickstart.cloudera
	... 94 more

From inside my VM, I can run SELECTs without any problem. I suspect I am missing some configuration. I got my configuration files from Cloudera Manager, acessing Hive > Actions > Download Client Configuration. Extract all files at HADOOP_CONF_DIR and replaced quickstart.cloudera to the actual VM IP using sed.

 

Any thoughts guys?

1 ACCEPTED SOLUTION

avatar
Explorer

In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:

 

quickstart.cloudera 10.10.10.123

10.10.10.123 begin the VM IP address. That solved the problem.

 

View solution in original post

1 REPLY 1

avatar
Explorer

In the end, my local machine wasn't being able to locate the VM by the name quickstart.cloudera, so I added it to my /etc/hosts:

 

quickstart.cloudera 10.10.10.123

10.10.10.123 begin the VM IP address. That solved the problem.