About jagadeesan

jagadeesan · ‎02-09-2021

Hi @joyabrata I think you are looking in the Data Lake tab which is a different one, you can go to the Summary tab, then scroll down to FreeIPA session then click Actions and get Get FreeIPA Certificate from the drop-down menu. Hope this will help you.

jagadeesan · ‎02-08-2021

Hi @joyabrata Obtain the FreeIPA certificate of your environment: From the CDP Home Page navigate to Management Console > Environments. Locate and select your environment from the list of available environments. Go to the Summary tab, scroll down to FreeIPA session then click Actions Select Get FreeIPA Certificate from the drop-down menu. The FreeIPA certificate downloads and follows this document

jagadeesan · ‎02-03-2021

Hi @ryu, Cloudera Manager trigger is what you need. You can create it here: CM -> YARN -> Status -> Create trigger -> Edit manually Examples: 1) It will alert if there are more than 50 applications in the pending state Expression: IF (select total_apps_pending_across_yarn_pools WHERE entityName=$SERVICENAME and LAST( total_apps_pending_across_yarn_pools) > 50) DO health:concerning Metric Evaluation Window: 10 minutes 2) It will alert if more than 5 applications are failing Expression: IF (select total_apps_failed_rate_across_yarn_pools WHERE entityName=$SERVICENAME and LAST( total_apps_failed_rate_across_yarn_pools) > 5) DO health:concerning Here is the documentation about CM triggers: http://www.cloudera.com/documentation/enterprise/latest/topics/cm_dg_triggers.html Here is the documentation about CM reports: https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_dg_reports.html

jagadeesan · ‎02-03-2021

Hi @adrijand Thanks for your detailed explanation here. Yeah indeed, we need all versions to be the same to avoid some classnotfoundexception because of jar conflicts. We encourage you to explore these and provide feedback on your experiences.

jagadeesan · ‎01-23-2021

Hi @adrijand Yeah, it seems some jar conflicts somewhere. You are trying to load Hive 1.1.0 classes before the ones included with Spark, and as such, they might fail to reference a Hive configuration that didn't exist in 1.1.0. like below. : java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:195) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:286) at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66) But here in the description mentioned you are using CDH 5.15v but in your log snippets it showing Apache Spark (spark-2.3.0-bin-without-hadoop) and Apache Hive (apache-hive-1.1.0-bin) version which is not a pre-built package version that comes along with CDH stack compatibility. Are you trying with building with varying versions of Hive which you would like to connect from a remote airflow docker container?

jagadeesan · ‎12-22-2020

Hi, @murali2425 @vchhipa It seems some dependency issue while building your custom NiFi Processor, org.apache.nifi:nifi-standard-services-api-nar dependency needs to be added in pom.xml of nifi-*-nar module. Ref here <dependency> <groupId>org.apache.nifi</groupId> <artifactId>nifi-standard-services-api-nar</artifactId> <version>1.11.3</version> <type>nar</type> </dependency> Please modify your pom.xml and rebuild and see whether that fixes the issue. Please accept the answer you found most useful.

jagadeesan · ‎12-22-2020

Hi @TimmehG, Spark has a configurable metrics system based on the Dropwizard Metrics Library. This allows users to report Spark metrics to a variety of sinks including HTTP, JMX, and CSV files. The metrics are generated by sources embedded in the Spark codebase. They provide instrumentation for specific activities and Spark components. The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties. A custom file location can be specified via the spark.metrics.conf configuration property. Instead of using the configuration file, a set of configuration parameters with prefix spark.metrics.conf. can be used. I agree with you, running spark applications continuously & reliably is a challenging task, and a good performance monitoring system is needed. Several external tools can be used to help profile the performance of Spark jobs: Cluster-wide monitoring tools, such as Ganglia, can provide insight into overall cluster utilization and resource bottlenecks. For instance, a Ganglia dashboard can quickly reveal whether a particular workload is disk-bound, network bound, or CPU bound. OS profiling tools such as dstat, iostat, and iotop can provide fine-grained profiling on individual nodes. JVM utilities such as jstack for providing stack traces, jmap for creating heap-dumps, jstat for reporting time-series statistics and jconsole for visually exploring various JVM properties are useful for those comfortable with JVM internals. For more insights you can refer to the below links: https://spark.apache.org/docs/latest/monitoring.html https://blog.cloudera.com/demystifying-spark-jobs-to-optimize-for-cost-and-performance/ https://www.infoq.com/articles/spark-application-monitoring-influxdb-grafana/ https://db-blog.web.cern.ch/blog/luca-canali/2017-03-measuring-apache-spark-workload-metrics-performance-troubleshooting Please accept the answer you found most useful.

jagadeesan · ‎07-20-2020

Can you verify one, whether did you followed all the steps listed in the documentation? https://docs.cloudera.com/runtime/7.1.1/ozone-storing-data/topics/ozone-setting-up-ozonefs.html

jagadeesan · ‎06-06-2020

Hi @Ettery Can you try to add those properties in nifi.properties? the Docker configuration has been updated to allow proxy whitelisting from the run command the host header protection is only enforced on "secured" NiFi instances. This should make it much easier for users to quickly deploy sandbox environments like you are doing in this case Even you can try with: -e NIFI_WEB_HTTP_HOST=<host> in docker run command docker run --name nifi -p 9090:9090 -d -e NIFI_WEB_HTTP_PORT='9090' -e NIFI_WEB_HTTP_HOST=<host> apache/nifi:latest In GitHub example configuration and documentation for NiFi running behind a reverse proxy that you may be interested in. For more detail refer stackoverflow1 and stackoverflow2

jagadeesan · ‎06-06-2020

Glad to hear that you have finally found the root cause of this issue. Thanks for sharing @Heri

Online	Offline
Last Visited	‎02-28-2025 02:12 PM

Member Since	‎11-12-2018 10:00 AM
Last Visited	‎02-28-2025 02:12 PM
Posts	203
Kudos received	177

Cloudera Community

Re: Apache Storm support in Cloudera

Re: Complete example for using spark MLlib for twi...

Re: CDP - Zeppeling: Spark + Livy + Hive - HWC

Re: CDP - Zeppelin - Livy Error

Re: Spark3 connection to HIVE ACID Tables

Re: How to connect Datahub Kafka from Cloudera Mac...

Re: How to connect Datahub Kafka from Cloudera Mac...

Re: Any suggestion to monitor yarn jobs?

Re: remote pyspark shell and spark-submit error ja...

Re: remote pyspark shell and spark-submit error ja...

Re: Custom nifi Processor build Error: [ERROR] Cou...

Re: Measuring Spark job performance

Re: Cannot run YARN on Ozone in CDP 7.1.1

Re: How do I configure Apache NiFi nifi.web.proxy....

Re: Drop table not working as expected in Hive