About SuriNuthalapati

Sam5986 · ‎02-06-2023

Hello ozzielabrat, I am also facing same issue, what is the solution for it. Thanks

Indeep · ‎02-07-2019

Hi, We are running Spark Streaming job on cluster managed by CM 6. After the spark streaming job has been run for like 4-5 days, the Spark UI for that particular job does not open. It says, logs like this in my nohup driver output file. servlet.ServletHandler: Error for /streaming/ java.lang.OutOfMemoryError: Java heap space These logs are logged many times in a continuous series. But my job keeps on running fine. Its just that I am not able to open up the UI by clicking the Application Master link when I open the job from YARN Running Applications UI.

SuriNuthalapati · ‎04-19-2017

Need to use this command as kafka user.

shuffle · ‎04-14-2017

It's true that you can aggreate logs to hdfs when the job is still running, however, the minimun log uploading interval (yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds) you can set is 3600 seconds which is 1 hour. The design is trying to protect namenode from being spamed. You may have to use an external service to do the log aggregation. Either write your own or find other tools. Below is the proof from yarn-default.xml in hadoop-common source code (cdh5-2.6.0_5.7.1). <property> <description>Defines how often NMs wake up to upload log files. The default value is -1. By default, the logs will be uploaded when the application is finished. By setting this configure, logs can be uploaded periodically when the application is running. The minimum rolling-interval-seconds can be set is 3600. </description> <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name> <value>-1</value> </property>

SuriNuthalapati · ‎02-22-2017

You achieve it by setting appropriate value: in yarn-site.xml yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds Then yarn will aggreagate the logs for the running jobs too. https://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml Suri

Harsh J · ‎10-05-2016

Cloudera offers Backup and Disaster Recovery (BDR) features as part of its enterprise offering that can do HDFS replication to other clusters, Hive metadata and data replication to other clusters, and also HBase snapshot backups to S3. This is documented in detail at https://www.cloudera.com/documentation/enterprise/latest/topics/cm_bdr_about.html Outside of this you can try to use DistCp for HDFS replication but for Hive replication you will need to manually propagate DDL-associated metadata.

Online	Offline
Last Visited	‎08-18-2022 11:30 AM

Member Since	‎09-22-2016 09:36 AM
Last Visited	‎08-18-2022 11:30 AM
Posts	33
Kudos received	3

Cloudera Community

Re: kafka-sentry command is not working

Re: Log managmement for Long-running Spark Streami...

Re: Log aggregation for Long running Spark Streami...

Re: Running "terasort" on brand new CDH 5.6.0 Clus...

Re: Log aggregation for Long running Spark Streami...

Re: kafka-sentry command is not working

Re: Log managmement for Long-running Spark Stream...

Re: Log managmement for Long-running Spark Streami...

Re: What are the best practices for replicating HD...