Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark can excluding columns in hive table from pyspark, but not from spark-submit

Spark can excluding columns in hive table from pyspark, but not from spark-submit

New Contributor

I have a python script named "script.py" and when running from pyspark, it works fine.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import HiveContext

spark = SparkSession.builder.getOrCreate()
hive_context = HiveContext(spark)

hive_context.setConf("hive.support.quoted.identifiers", "none")

summary = spark.sql("""select `(datex)?+.+`, to_date(datex) datex
from default.data_hive
where datex = '2020-03-25'""")

 

But, when i run script using spark-submit --master yarn script.py, it gives me an error:

Traceback (most recent call last):
File "/home/script/Script/profile/profile_export.py", line 24, in <module>
where datex = '2020-03-25'""")
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 778, in sql
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
pyspark.sql.utils.AnalysisException: u"cannot resolve '`(datex)?+.+`' given input columns:

 

What am I doing wrong? Please help

Don't have an account?
Coming from Hortonworks? Activate your account here