About gsthina

gsthina · ‎02-10-2021

It looks like you are running Spark shell on a Windows machine, maybe your local laptop. Is there anywhere in the code you are mentioning the hostname, "dclvmsbigdmd01"? If not, where is your 172.30.294.196 (hive.metastore.uris)? Does this IP resolve the name dclvmsbigdmd01? Can you review if the host/domain is reachable from your local?

gsthina · ‎07-14-2020

Hey, Are there any parameters used in the spark-shell command? Usually, this delay happens for a lot of reasons from a connection time to resource availability. However, we cannot confirm anything with just the driver logs. In order to narrow this down, could you share the yarn log of the application for this application using the command, "yarn logs -applicationId application_1594337770867_0003"? We will have more clarity on what has been happening during the delay. Thanks

gsthina · ‎07-14-2020

Hello, AFAIK, the Stanford CoreNLP wrapper for Apache Spark should not be a bottleneck in terms of parallel processing. Spark would take care of running it parallelly on multiple documents. Regardless of the number of documents, the number of API requests to the CoreNLP server would remain the same.

gsthina · ‎07-14-2020

Hey, could you share the exact trace of output that you receive? If the issue is on the WebUI, could you also share the Screenshot of what you see?

gsthina · ‎05-08-2020

Okay, let me know if changing HiveContext to SparkContext makes any difference. It could give a lead to resolution.

gsthina · ‎05-08-2020

Hi @clvi, Try adding the --appOwner <username> to the yarn logs command. However, I think the application states are erased from the RM state store, probably due to an RM State Restore.

gsthina · ‎04-12-2020

Hey @hicha , what is the version of Spark you are using? What do you receive as the output, when using the `Spark Session`, instead of `Hive Context`?

gsthina · ‎03-23-2020

Hello RIshab, Can you please mention the error you are facing exactly?

gsthina · ‎11-14-2019

Hey @gnish, Thanks for asking. I haven't tried it before, but hope you have come across the Conversion utility from Zeppelin notes to Jupyter notebooks [1]. However, I notice that the JIRA - ZEPPELIN-2616 [2] for documentation for this feature seems yet to be resolved. [1] https://github.com/rdblue/jupyter-zeppelin [2] https://issues.apache.org/jira/browse/ZEPPELIN-2616

gsthina · ‎11-14-2019

Hey @avengers, Just thought, this could add some more value to this question here. Spark SQL uses a Hive Metastore to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) in a relational database (for fast access) [1]. Also, I don't think there would be a MetaStore crash if we use it along with HiveOnSpark. [1] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hive-metastore.html

Online	Offline
Last Visited	‎03-07-2025 03:50 AM

Member Since	‎08-29-2018 12:11 AM
Last Visited	‎03-07-2025 03:50 AM
Posts	127
Kudos received	2

Cloudera Community

Re: Can I use SparkSQL on a cluster using Hive on ...

Re: Yarn container logs

Re: Py4JJavaError: An error occurred while calling...

Re: Loading spark shell takes like 6 mins from my ...

Re: CoreNLP Server on Spark

Re: mapreduce job does not have tracking url

Re: Quering hive table from spark-shell

Re: Difference between yarn ui running application...

Re: Quering hive table from spark-shell

Re: Failed to receive heartbeat from agent.

Re: How to convert jupyter notebook to Zeppelin no...

Re: Can I use SparkSQL on a cluster using Hive on ...