Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

spark authentication ldap

avatar
Super Collaborator

I'm new to Spark and appreciate if i can get some answers.

How can i enable spark authentication against LDAP in a non kerberized environment?

Looks like spark is connecting to Metastore directly...how can we force it to connect to hiveserver2?

Is there any way to suppress all the info it prints when we start/exit spark-sql?

every session i start it start using 4040 +1 and http & sparkdriver with randon ports, is there any way we can force spark to use subs et of ports instead of random ports?

1 ACCEPTED SOLUTION

avatar

@Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)

    View solution in original post

    1 REPLY 1

    avatar

    @Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)