About jagadeesan

jagadeesan · ‎07-05-2022

Hi @Jessica_cisco, it looks like conflicts in the versions/classes. Kindly locate duplicate dependencies and then clean rebuild again.

jagadeesan · ‎07-04-2022

Hi @mamoune, you can inject multiple concurrent data source types to the Cloudera CDP platform but make sure you have an inbound connection configured apparently from the source to the destination CDP cluster. There are various components/connectors that are useful for both moving and transforming data from source systems. To use for ingestion, store, and process the new data sources, typically requires a considerable amount of planning, which is one of the challenges of data pipeline integration. For example, Cloudera Morphlines is an open-source framework that reduces the time and skills required to build or change Search indexing applications. A morphline is a rich configuration file that simplifies defining an ETL transformation chain. Use these chains to consume any kind of data from any data source, process the data, and load the results into Cloudera Search. Executing in a small, embeddable Java runtime system, morphlines can be used for near real-time applications as well as batch processing applications.

jagadeesan · ‎07-04-2022

Hi @Jessica_cisco can you try guava version 14.0.1 from the group com.google.guava? compile group: 'com.google.guava', name: 'guava', version: '14.0.1' You can add the above dependence in your build.gradle and try again?

jagadeesan · ‎07-04-2022

Hi @naymar, to grant Livy the ability to impersonate the originating user, add the following property to <HADOOP_HOME>/etc/hadoop/core-site.xml: <property> <name>hadoop.proxyuser.livy.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.livy.hosts</name> <value>*</value> </property> Ref: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/configuration-properties/topics/cm_props_cdh710_coreconfiguration.html https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.3/bk_command-line-installation/content/grant_livy_impersonate.html

jagadeesan · ‎06-30-2022

Hi @suri789 these both are different values, I didn't see any duplicate in these. so plainfield s plainfiled Also from the output, I didn't see any duplicate values, all are distinct by the values..! +----------------+ | value | +----------------+ | s plaindield| | n plainfield| | west home land| | newyork| | so plainfield| |north plainfield| +----------------+ Please note: "n plainfield & north plainfield or s plainfield & so plainfield" are different values, because we didn't write any custom logic like 'n' means 'north' or 's' means 'so'.

jagadeesan · ‎06-28-2022

Hi @ajaybabum, Yes we can able run Spark in local mode against the Kerberized cluster. For a quick test, can you directly open spark-shell to try reading the CSV file from the HDFS location and show the output of the contents to verify whether do you have any issue in the Cluster / Spark configuration or if it's more on your application code? >> Will it possible in local mode without run kinit command before spark-submit. -- By passing --keytab --principal details in your spark-submit, you don't need to run kinit command before spark-submit. Thanks

jagadeesan · ‎06-28-2022

Hi @NaniSK, Please can you reach out to the Cloudera Certification Team at certification@cloudera.com regarding any feedback and/or concerns about your certificate and license. Thanks.

jagadeesan · ‎06-28-2022

Hi @dfdf, I tried in my cluster with both Spark2 and Spark3 on the same version which you tried but I can able to get the results without any issues. Spark2: 2.4.7.7.1.7.1000-141 Spark3 : 3.2.1.3.2.7171000.1-1 Are you still seeing this issue? Please can you share the reproduce steps that I can try from my side to reproduce this issue in my cluster? Thanks

jagadeesan · ‎06-27-2022

Hi @sss123, this seems to be a bug. Please refer to https://issues.cloudera.org/browse/LIVY-3. Kindly note that Spark Notebook is not currently supported. Also please review the discussion in https://github.com/cloudera/hue/issues/254

jagadeesan · ‎06-27-2022

Hi @ds_explorer, it seems because the edit log is too big and cannot be read by NameNode completely on the default/configured timeout. 2022-06-25 08:32:24,872 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error replaying edit log at offset 554705629. Expected transaction ID was 60366342312 Recent opcode offsets: 554704754 554705115 554705361 554705629 ..... Caused by: java.io.IOException: Premature EOF from inputStream at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203) at org.apache.hadoop.hdfs.server.namenode.FSEditLogOp$LengthPrefixedReader.decodeOpFrame(FSEditLogOp.java:4488) To fix this, can you add the below parameter and value (if you already have then kindly increase the value) HDFS > Configuration > JournalNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml hadoop.http.idle_timeout.ms=180000 And then restart the required services.

Online	Offline
Last Visited	‎04-29-2026 08:47 AM

Member Since	‎11-12-2018 10:00 AM
Last Visited	‎04-29-2026 08:47 AM
Posts	218
Kudos received	179

Cloudera Community

Re: Migrating workloads from Spark 2 to Spark 3

Re: Looking for a supported version of Spark 3 for...

Re: Spark 3 Parcel Compatibility with CDP Private ...

Re: Apache Storm support in Cloudera

Re: Complete example for using spark MLlib for twi...

Re: Guava jar giving runtime error while deploymen...

Re: 2 concurrent data sources types for the same c...

Re: Guava jar giving runtime error while deploymen...

Re: {msg":"User 'UserX' not allowed to impersonate...

Re: How to remove the space and dots and convert i...

Re: Spark submit in local mode against a kerberiz...

Re: Cloudera Certification URL is not working

Re: spark3 for cdp error

Re: The Spark session could not be created in the ...

Re: Both the namenodes are down