Member since
08-29-2018
91
Posts
3
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1642 | 11-14-2019 02:54 AM | |
5571 | 11-05-2019 07:51 PM |
02-10-2021
09:41 AM
It looks like you are running Spark shell on a Windows machine, maybe your local laptop. Is there anywhere in the code you are mentioning the hostname, "dclvmsbigdmd01"? If not, where is your 172.30.294.196 (hive.metastore.uris)? Does this IP resolve the name dclvmsbigdmd01? Can you review if the host/domain is reachable from your local?
... View more
07-14-2020
10:56 AM
Hey, Are there any parameters used in the spark-shell command? Usually, this delay happens for a lot of reasons from a connection time to resource availability. However, we cannot confirm anything with just the driver logs. In order to narrow this down, could you share the yarn log of the application for this application using the command, "yarn logs -applicationId application_1594337770867_0003"? We will have more clarity on what has been happening during the delay. Thanks
... View more
07-14-2020
06:11 AM
Hello, AFAIK, the Stanford CoreNLP wrapper for Apache Spark should not be a bottleneck in terms of parallel processing. Spark would take care of running it parallelly on multiple documents. Regardless of the number of documents, the number of API requests to the CoreNLP server would remain the same.
... View more
07-14-2020
04:02 AM
Hey, could you share the exact trace of output that you receive? If the issue is on the WebUI, could you also share the Screenshot of what you see?
... View more
05-08-2020
10:55 PM
Okay, let me know if changing HiveContext to SparkContext makes any difference. It could give a lead to resolution.
... View more
05-08-2020
02:20 AM
Hi @clvi, Try adding the --appOwner <username> to the yarn logs command. However, I think the application states are erased from the RM state store, probably due to an RM State Restore.
... View more
04-12-2020
06:36 AM
Hey @hicha , what is the version of Spark you are using? What do you receive as the output, when using the `Spark Session`, instead of `Hive Context`?
... View more
03-23-2020
06:50 AM
Hi, Based on the documentation [1], we notice that the HDP -2.6.5 already has Apache Spark 2.3.2 in it. We recommend upgrading HDP stack so that you can get an appropriate version of spark which comes with stack instead of manually installing spark. [1] https://docs.cloudera.com/HDPDocuments/HDPforCloud/HDPforCloud-2.6.5/hdp-release-notes/content/hdp_comp_versions.html
... View more
03-23-2020
05:29 AM
Hello RIshab, Can you please mention the error you are facing exactly?
... View more
11-14-2019
03:06 AM
Hey @gnish, Thanks for asking. I haven't tried it before, but hope you have come across the Conversion utility from Zeppelin notes to Jupyter notebooks [1]. However, I notice that the JIRA - ZEPPELIN-2616 [2] for documentation for this feature seems yet to be resolved. [1] https://github.com/rdblue/jupyter-zeppelin [2] https://issues.apache.org/jira/browse/ZEPPELIN-2616
... View more
11-14-2019
02:54 AM
Hey @avengers, Just thought, this could add some more value to this question here. Spark SQL uses a Hive Metastore to manage the metadata of persistent relational entities (e.g. databases, tables, columns, partitions) in a relational database (for fast access) [1]. Also, I don't think there would be a MetaStore crash if we use it along with HiveOnSpark. [1] https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-hive-metastore.html
... View more
11-05-2019
08:21 PM
Hi @wret_1311, Thanks for your response and I appreciate for confirming the solution. I'm glad, it helped you 🙂
... View more
11-05-2019
08:13 PM
Hey, Can you provide us with a little more context on what the API is? When I quickly research on "MergeRecords", it shows about NiFi. However, from the current understanding, I hope the URL [1] would be helpful for you. Thanks! [1] https://stackoverflow.com/a/55124212/4340139
... View more
11-05-2019
07:51 PM
Hi @wert_1311, There is an option to just stop a single NodeManager (NM) and clean that usercache there. So, there will not be any applications affected due to this. However, it is worth keeping in mind, even if you stop a single NodeManager, it has some effect on the currently running jobs. The jobs running on that NM will be stopped and will be restarted on another NM. So, jobs will run longer than expected because the containers have to start again somewhere else. Hope this helps.
... View more
11-01-2019
02:50 AM
Hi @wert_1311 , Thanks for asking. Currently yarn.nodemanager.localizer.cache.target-size-mb and yarn.nodemanager.localizer.cache.cleanup.interval-ms triggers deletion service for non-running containers. However, for containers that are running and spilling data to {'yarn.nodemanager.local-dirs'}/usercache/<user>/appcache/<app_id>, the deletion service does not come into action, as a result, filesystem gets full, nodes are marked unhealthy and application gets stuck. I suggest you refer to an internal community article [1] which speaks about something similar. I think that the upstream JIRA [YARN-4540] [2] has this documented and is yet to be unresolved. The general recommendation is to just make that FS big and if it gets full, debug the job that writes too much data into it. Also, It is ok about deleting the usercache dir. Use the following steps to delete the usercache: Stop the YARN service. Log in to all nodes and delete the content of the usercache directories. For example: for i in `cat list_of_nodes_in_cluster`; do ssh $i rm -rf /data?/yarn/nm/usercache/* ; done Verify all usercache directories on all nodes are empty. Start the YARN service. Please let us know if this is helpful. [1] https://community.cloudera.com/t5/Support-Questions/yarn-usercache-folder-became-with-huge-size/td-p/178648 [2] https://issues.apache.org/jira/browse/YARN-4540
... View more
10-31-2019
10:56 AM
Hey, I think, deleting container logs may be a good option to save space. However, if you would like to grab the yarn logs for analysing the old jobs, then you may need those container logs. Also, I think those analyses are required, when a job fails. So, if you think those jobs will not be dug again to gather any of the historic insights, then you may feel free to clear them.
... View more
10-31-2019
10:51 AM
Hey, Can you please share the following? 1. Resource Manager (RM) logs 2. Scheduler.xml 3. Screenshot of the Dynamic Resource Pool Configuration 4. Screenshot of the Queue Representation in RM Web UI Hope, the above-mentioned artefacts would be sufficient for initial analysis of the issue.
... View more
10-24-2019
09:00 AM
Hey, Can you share the configuration value for "hadoop_authorized_users"? Is it left to the default value, or was there any modification?
... View more
10-22-2019
03:33 AM
Hey @axk , Thanks for letting us know. I'm glad it was helpful 🙂
... View more
10-21-2019
11:33 AM
Hey, Thanks for responding and confirming. I understand that the output is in float and not an integer. I will update here when I find anything more helpful. Much appreciated.
... View more
10-21-2019
11:27 AM
1 Kudo
Hey, Can you once review, if you have configured the Hbase Service ( in Hive Service) dependency [1]? I have come across scenarios where, if the dependency is not configured, then there is a possibility of such error [2] to occur. [1] https://docs.cloudera.com/documentation/enterprise/5-16-x/topics/cdh_ig_hive_hbase.html [2] org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the locations
... View more
10-21-2019
11:15 AM
Hi Nicks, I understand that you would like to get an alert when a user executes a query that returns an output with the number of rows more than a pre-defined number of rows. Is my understanding of your query correct? As the limit is pre-defined, I usually append a LIMIT clause with the query, if the LIMIT clause is not already mentioned in the query. I would like to bring to your notice that Hue returns a partial result and load more data when scrolled down. However, when Executing the query from Hue, I do not see an alerting mechanism based on the number of rows returned, at the moment. I will update here, once I come across something more helpful.
... View more
10-15-2019
07:48 AM
1 Kudo
Hey, I just came across this link [1], which speaks about the NiFi configurations to ensure Apache NiFi Behind an AWS Load Balancer? Hope it is useful. [1] https://everymansravings.wordpress.com/2018/07/27/apache-nifi-behind-an-aws-load-balancer-w-minifi/
... View more
10-15-2019
07:39 AM
Hi, Have you tried to set JsonTreeReader in Reader property and AvroRecordSetWriter in Writer property and then use ConvertRecord processor, instead of directly using `ConvertJsonToAvro` and `ConverAvroToJson`?
... View more
08-26-2019
06:20 AM
I have faced the same error and these steps have worked for me.
... View more
08-19-2019
08:45 AM
Hey, Which is the version of CloudBreak you are using? Just came across the "Using a proxy" section of this PDF [1] and thought it could be helpful for you to proceed further. Let me know if it helped. [1] https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.8.0/configure/cb_configure.pdf
... View more
08-19-2019
08:29 AM
Hey Sankar, Can you tell me if this user had permissions before and you have reinstated the access for the test-user now or it is the first time you are giving access to him? Thanks, Thina
... View more
12-04-2018
07:43 AM
Hi, After clearing all the caches and Force quitting the Chrome, executing the following command works fine with Mac Chrome: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --auth-server-whitelist="*.example.com" The difference in this command and the one in the documentation is that, here the command has "*.example.com" and in the documentation it is "hostname/domain" Hope that helps. Regards, Thina
... View more