Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark authentication ldap

Solved Go to solution

spark authentication ldap

Expert Contributor

I'm new to Spark and appreciate if i can get some answers.

How can i enable spark authentication against LDAP in a non kerberized environment?

Looks like spark is connecting to Metastore directly...how can we force it to connect to hiveserver2?

Is there any way to suppress all the info it prints when we start/exit spark-sql?

every session i start it start using 4040 +1 and http & sparkdriver with randon ports, is there any way we can force spark to use subs et of ports instead of random ports?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: spark authentication ldap

@Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)

    View solution in original post

    1 REPLY 1
    Highlighted

    Re: spark authentication ldap

    @Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)

    View solution in original post

    Don't have an account?
    Coming from Hortonworks? Activate your account here