Support Questions

sunile_manjee · ‎05-26-2016

How do I set parameters for hive in sparksql context? For example I have a hive table which I want to query from sparksql. I want to set the following parameter

mapred.input.dir.recursive=true

To read all directories recursively. How to set this in spark context?

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

View solution in original post

ravi1 · ‎05-26-2016

Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well

sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

jyadav · ‎05-26-2016

@Sunile Manjee

Can you please try this?

sqlContext.setConf("mapred.input.dir.recursive","true")

OR

sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive","true")

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

raghavcomp3 · ‎12-04-2017

I'm still facing the issue. Can anyone help?

dhavalmodi24 · ‎01-03-2018

I am also facing the same issue.

hrushikesh_iitb · ‎07-05-2018

https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration

use spark config key as spark.hadoop.*

Cloudera Community

Support Questions

Set hive parameter in sparksql?

Using Hive UDF/UDAF/UDTF with SparkSQL

Spark RDDs vs DataFrames vs SparkSQL

SparkSQL jdbc Federation

How to change column Type in SparkSQL?

SparkSQL and ESRI Geospatial UDFs for Hive

Hive Performance Tuning Parameters

How to set a processor to DEBUG when on Cloudera D...

How to pass Hive configuration parameters to Knox ...

Where to set hive parameters like .hiverc

Set the Global Parameter Processor Configuration v...