Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Set hive parameter in sparksql?

Solved Go to solution
Highlighted

Set hive parameter in sparksql?

Super Guru

How do I set parameters for hive in sparksql context? For example I have a hive table which I want to query from sparksql. I want to set the following parameter

mapred.input.dir.recursive=true

To read all directories recursively. How to set this in spark context?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Set hive parameter in sparksql?

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

View solution in original post

6 REPLIES 6
Highlighted

Re: Set hive parameter in sparksql?

Guru

Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well

sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true")

Highlighted

Re: Set hive parameter in sparksql?

@Sunile Manjee

Can you please try this?

sqlContext.setConf("mapred.input.dir.recursive","true")

OR

sqlContext.setConf("mapreduce.input.fileinputformat.input.dir.recursive","true")
Highlighted

Re: Set hive parameter in sparksql?

@Sunile Manjee - Below is some sections from working PySpark code. Notice how I set SparkConf with specific settings and then later in my code I execute Hive statements. In those Hive statements you could do: sql = "set mapred.input.dir.recursive=true"

sqlContext.sql(sql)

Here is my SparkConf:

conf = (SparkConf()

.setAppName(“ucs_data_profiling")

.set("spark.executor.instances", “50”)

.set("spark.executor.cores", 4)

.set("spark.driver.memory", “2g")

.set("spark.executor.memory", “6g")

.set("spark.dynamicAllocation.enabled", “false”)

.set("spark.shuffle.service.enabled", "true")

.set("spark.io.compression.codec", "snappy")

.set("spark.shuffle.compress", "true"))

sc = SparkContext(conf = conf)

sqlContext = HiveContext(sc)

## the rest of code parses files and converts to SchemaRDD

## lines of code etc........

## lines of code etc........

## here i set some hive properties before I load my data into a hive table ## i have more HiveQL statements, i just show one here to demonstrate that this will work

sqlContext.sql(sql)

sql = """

set hive.exec.dynamic.partition.mode=nonstrict

"""

View solution in original post

Re: Set hive parameter in sparksql?

New Contributor

I'm still facing the issue. Can anyone help?

Highlighted

Re: Set hive parameter in sparksql?

I am also facing the same issue.

Highlighted

Re: Set hive parameter in sparksql?

New Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here