Created 05-26-2016 03:03 PM
How do I set parameters for hive in sparksql context? For example I have a hive table which I want to query from sparksql. I want to set the following parameter
mapred.input.dir.recursive=true
To read all directories recursively. How to set this in spark context?
Created 05-26-2016 04:07 PM
@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"
sqlContext.sql(sql)
Here is my SparkConf:
conf = (SparkConf()
.setAppName(“ucs_data_profiling")
.set("spark.executor.instances", “50”)
.set("spark.executor.cores", 4)
.set("spark.driver.memory", “2g")
.set("spark.executor.memory", “6g")
.set("spark.dynamicAllocation.enabled", “false”)
.set("spark.shuffle.service.enabled", "true")
.set("spark.io.compression.codec", "snappy")
.set("spark.shuffle.compress", "true"))
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
## the rest of code parses files and converts to SchemaRDD
## lines of code etc........
## lines of code etc........
## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work
sqlContext.sql(sql)
sql = """
set hive.exec.dynamic.partition.mode=nonstrict
"""
Created 05-26-2016 03:07 PM
Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well
sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")
Created 05-26-2016 03:10 PM
Can you please try this?
sqlContext.setConf("mapred.input.dir.recursive","true")
OR
sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive","true")
Created 05-26-2016 04:07 PM
@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"
sqlContext.sql(sql)
Here is my SparkConf:
conf = (SparkConf()
.setAppName(“ucs_data_profiling")
.set("spark.executor.instances", “50”)
.set("spark.executor.cores", 4)
.set("spark.driver.memory", “2g")
.set("spark.executor.memory", “6g")
.set("spark.dynamicAllocation.enabled", “false”)
.set("spark.shuffle.service.enabled", "true")
.set("spark.io.compression.codec", "snappy")
.set("spark.shuffle.compress", "true"))
sc = SparkContext(conf = conf)
sqlContext = HiveContext(sc)
## the rest of code parses files and converts to SchemaRDD
## lines of code etc........
## lines of code etc........
## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work
sqlContext.sql(sql)
sql = """
set hive.exec.dynamic.partition.mode=nonstrict
"""
Created 12-04-2017 09:25 AM
I'm still facing the issue. Can anyone help?
Created 01-03-2018 08:06 AM
I am also facing the same issue.
Created 07-05-2018 12:01 AM
https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration
use spark config key as spark.hadoop.*