Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

spark authentication ldap

avatar
Super Collaborator

I'm new to Spark and appreciate if i can get some answers.

How can i enable spark authentication against LDAP in a non kerberized environment?

Looks like spark is connecting to Metastore directly...how can we force it to connect to hiveserver2?

Is there any way to suppress all the info it prints when we start/exit spark-sql?

every session i start it start using 4040 +1 and http & sparkdriver with randon ports, is there any way we can force spark to use subs et of ports instead of random ports?

1 ACCEPTED SOLUTION

avatar

@Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)

    View solution in original post

    1 REPLY 1

    avatar

    @Raja Sekhar Chintalapati

  • There is no Spark authentication against LDAP in a non kerberized environment. If a Spark job reads from HDFS and the user running the job does not have sufficient HDFS permission, Spark will fail to read data.
  • Spark HiveContext does not connect to HiveServer2. It connects to Hive metastore once you provide the Hive configuration (hive-site.xml) to Spark, else it creates its own metastore in it's working directory
  • I don't know a way to suppress the info in sparl-sql
  • The Spark Master UI is typically on the node with Driver running on port 4040. You can define ports for the Driver, File Server, Executor, UI etc. See doc here
  • See also setting Spark Configuratin here: https://spark.apache.org/docs/1.1.0/configuration.html

  • See also for YARN Mode: http://spark.apache.org/docs/latest/security.html
  • Example

  • SPARK_MASTER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"
    
    SPARK_WORKER_OPTS="-Dspark.driver.port=7001 -Dspark.fileserver.port=7002 
     -Dspark.broadcast.port=7003 -Dspark.replClassServer.port=7004 
     -Dspark.blockManager.port=7005 -Dspark.executor.port=7006 
     -Dspark.ui.port=4040 -Dspark.broadcast.factory=org.apache.spark.broadcast.HttpBroadcastFactory"

    Programmatic Example

    import org.apache.spark.SparkConf
    import org,apache.spark.SparkContext
    
    val conf = new SparkConf()   
    .setMaster(master)   
    .setAppName("namexxx")   
    .set("spark.driver.port", "7001")   
    .set("spark.fileserver.port", "7002")  
    .set("spark.broadcast.port", "7003")   
    .set("spark.replClassServer.port", "7004")   
    .set("spark.blockManager.port", "7005")   
    .set("spark.executor.port", "7006")
    
    val sc= new SparkContext(conf)