Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark sql taking very long time in query parsing

Spark sql taking very long time in query parsing

I am running hive query on PROD env on spark using hive context/Sparksession like this : sparksession.sql("query")

it takes approx 10-15 mins to parse the query and then query runs which looks abnormal to me becuase same query i run on UAT env and it takes less than 1 mins in parsing.


this relaly looks very abnormal. when i see hive.log of UAT while i execute that spark sql , cant see logs are moving.

but when i see hive.log of prod , i see logs are moving and see this kind of logs :


gettable tablename

get partitions

initialize called  using direct sql underlying db oracle.... this kind of stememnts i see in log for all the tables involved in query.


now this is is really strange.... if we can see its trying to load metadata from hive metastore which will used in query parsing then why its not happening in UAT... 


Please help, its very serious issue.



Re: Spark sql taking very long time in query parsing

Master Collaborator
I am quite sure, that based on this description nobody can give answer to this issue.
The usual difference betweeen environments can be the data volume. You did not mentioned whether those two environments are the same in terms of sizing and tables. I would guess, that maybe your query in UAT env is working with much less partitions, than in production. But that is just a guess.

For real troubleshooting you have to examine the spark driver's log, executor's log and the hive metastore's log. If you did that, then you can try to publish here and maybe it will be more clear why it takes so long.

Re: Spark sql taking very long time in query parsing



UAT has same number of tables and columns as in PROD but UAT has more data compare to PROD. and the problem is while query parsing and optimizer.once parsing is done there is no problem in execution, so did not talked about driver or executers.


i executed same query in both UAT and PROD and started monitoring the logs... in UAT i saw parsing was done within a minut of time and in PROD it took 20 mins. i was also watvhing UAT and PROD hive log. UAT log was not moving but PROD hive.log was moving. PROD was fetching table  metdata from hive metastore but UAT was not. 

Don't have an account?
Coming from Hortonworks? Activate your account here