Support Questions

sunile_manjee · ‎05-26-2016

How do I set parameters for hive in sparksql context? For example I have a hive table which I want to query from sparksql. I want to set the following parameter

mapred.input.dir.recursive=true

To read all directories recursively. How to set this in spark context?

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

View solution in original post

ravi1 · ‎05-26-2016

Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well

sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

jyadav · ‎05-26-2016

@Sunile Manjee

Can you please try this?

sqlContext.setConf("mapred.input.dir.recursive","true")

OR

sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive","true")

bmathew · ‎05-26-2016

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

raghavcomp3 · ‎12-04-2017

I'm still facing the issue. Can anyone help?

dhavalmodi24 · ‎01-03-2018

I am also facing the same issue.

hrushikesh_iitb · ‎07-05-2018

https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration

use spark config key as spark.hadoop.*

Cloudera Community

Support Questions

Set hive parameter in sparksql?

Spark RDDs vs DataFrames vs SparkSQL

SparkSQL jdbc Federation

How to change column Type in SparkSQL?

SparkSQL and ESRI Geospatial UDFs for Hive

Working with CDE Spark Job Parameters in Cloudera ...

Querying Data via SparkSQL with ODBC Tools

How to pass Hive configuration parameters to Knox ...

How to set a processor to DEBUG when on Cloudera D...

Where to set hive parameters like .hiverc

Set the Global Parameter Processor Configuration v...