Archives of Support Questions (Read Only)

sunile_manjee · ‎05-26-2016

How do I set parameters for hive in sparksql context? For example I have a hive table which I want to query from sparksql. I want to set the following parameter

mapred.input.dir.recursive=true

To read all directories recursively. How to set this in spark context?

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

View solution in original post

ravi1 · ‎05-26-2016

Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well

sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

jyadav · ‎05-26-2016

@Sunile Manjee

Can you please try this?

sqlContext.setConf("mapred.input.dir.recursive","true")

OR

sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive","true")

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

raghavcomp3 · ‎12-04-2017

I'm still facing the issue. Can anyone help?

dhavalmodi24 · ‎01-03-2018

I am also facing the same issue.

hrushikesh_iitb · ‎07-05-2018

https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration

use spark config key as spark.hadoop.*

Cloudera Community

Archives of Support Questions (Read Only)

Set hive parameter in sparksql?