Support Questions

DaliborJ · ‎05-28-2014

Hi there,

I have installed hadoop cluster together with spark using cloudera free manager, i used all default values.

Spark ios working, i can start spark-shell and on the web UI i see master and all the workers but everytime i try to run anything in spark-shell i get:

java.lang.IllegalArgumentException: java.net.UnknownHostException: user
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:576)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:521)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:355)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:14)
at $iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC.<init>(<console>:21)
at $iwC.<init>(<console>:23)
at <init>(<console>:25)
at .<init>(<console>:29)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:795)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:840)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:752)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:607)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:610)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:935)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:883)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:981)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.net.UnknownHostException: user
... 69 more

Does anybody have a clue what is the issue here?

i did find few posts that were taliking about localhost name and HADOOP_CONF_DIR defined on workers but nothing helped.

Help is appreciated.

regards,

DaliborJ

srowen · ‎05-28-2014

The proximate problem is that some host name is configured as "user", which doesn't sound like a host name. I would first look through all the host-related settings in your Hadoop conf to see where "user" appears and identify if any of these should be something else.

DaliborJ · ‎05-28-2014

hi srowen,

thanks for replay.

i m running 5.0.0-1.cdh5.0.0.p0.47 on two clusters (prod and dev) both installed using everything defualt.

I have no idea where this setting might be, have you had any other question like this?

DaliborJ · ‎06-05-2014

so the issue was:

val file = spark.textFile("hdfs://...")

where after hdfs:// comes the full host name where spark master is.

RoyalChetan1 · ‎08-05-2014

I too have the same problem.
I am using the Cloudera quick start VM.
I am not sure what is the host name to provide in here after hdfs://
I tried providing the host name I found at Cloudera manager.
"Spark Master at spark://localhost.localdomain:7077"
After using this host name, I am getting a different exception

java.io.IOException: Failed on local exception: java.io.EOFException;

RoyalChetan1 · ‎08-05-2014

Issue resolved.
One of the errors suggested the host name to be used.
Its working now..

soldier · ‎08-03-2016

can you descript the fix solution more detaily? I met same issue

evinhas · ‎09-07-2016

I guess that they were using a path

hdfs://user

or

//user

The Spark interpreter is understanding that this path is an hdfs path, but the server was not included in this path, because hdfs path must include the hostname of the server where the file is located, I mean

hdfs://hostname/user

rather than

hdfs://user

escapedcanadian · ‎09-21-2017

If you are running on a node with hdfs gateway or in one of our training VMs, or in other situations where the host that provides access to HDFS is 'localhost', then you use three forward slashes, as in 'hdfs:///user/'. Its a common mistake to miss the third slash.

PandurangB · ‎01-18-2022

For my case, I observed the spark job was working fine on some hosts and hitting the above exception for a couple of worker hosts.

Found that the issue with spark-submit --version on hosts. working hosts spark-submit version was version 2.4.7.7.1.7.0-551 and non-working hosts spark-submit version was version 3.1.2

I created the symbolic link with the correct spark-submit version file and the issue got resolved.

```

[root@host bin]# cd /usr/local/bin
[root@hostbin]# ln -s /etc/alternatives/spark-submit spark-submit

```