About _Umesh

_Umesh · ‎02-29-2024

How about submitting a CDE job from CML to a private cloud base cluster?

_Umesh · ‎07-30-2019

Since the "list" commands gets the apps from the ResourceManager and doesn't set any explicit filters and limits (except those provided with it) on the request, technically it returns all the applications which are present with RM at the moment. That number is controlled by "yarn.resourcemanager.max-completed-applications" config. Hope that clarifies.

_Umesh · ‎06-07-2019

As your intent seems to capture the driver logs in a separate file while executing the app in the cluster mode, make sure that '/some/path/to/edgeNode/' dir is present on all of the NodeManager essentially as in cluster mode the driver will be running in the Yarn app's application master. If you can't make sure that follow a general practice to provide log file path to some pre-existing paths e.g. "/var/log/SparkDriver.log".

_Umesh · ‎05-14-2019

Please check if numpy is actually installed on all of the nodemanagers, if not, install it using below command (for python2.x) : pip install numpy If already installed, let us know the following: 1) Can you execute the same command outside of hue i.e. using Spark2-submit ? Mention the full command here. 2) What spark command you use in Hue?

_Umesh · ‎04-14-2017

It is the below line which is setting the data types for both the fields as StringType: val schema = StructType( schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) You can define your custom schema as follows : val customSchema = StructType(Array( StructField("name", StringType, true), StructField("age", IntegerType, true))) You can add additional fields as well in the above schema definition. And then you can use this customSchema while creating the dataframe as follows: val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema) Also for details, please see this page.

_Umesh · ‎03-06-2017

By Spark 2.1 do you mean Cloudera Spark 2.0 Release 1 or Apache Spark 2.1 ? Regarding Cloudera Spark 2.0 Release 1 or Release 2, I would like to tell that minimum required CDH version is CDH 5.7.x but you are on CDH 5.5.4.

_Umesh · ‎08-30-2016

The valuable information is at very bottom: NameError: name 'master' is not defined Please make sure you have defined variable "master" in your code. Or if you are specifying master via spark-submit, you should not set it in code.

_Umesh · ‎08-03-2016

You don't need to export as a JAR for unit testing. You can do : SparkConf().setMaster(local[2]) and run the program as usual Java application in IDE. Also make sure that you have all the dependent libraries in the classpath.

_Umesh · ‎08-03-2016

You are getting this exception because "sc.testFile" reads a text file from HDFS, a local file system (available on all nodes), or any Hadoop-supported file system URI. You said that you want to get the data from URL and want to save it to HDFS, then you should do: val data = scala.io.Source.fromURL("http://10.3.9.34:9900/messages").mkString val list = data.split("\n").filter(_ != "") val rdds = sc.parallelize(list) rdds.saveAsTextFile(outputDirectory)

_Umesh · ‎06-22-2016

The attached log indicates that application is accepted by cluster manager (YARN) but unable to execute due to resource crunch. Please make sure there are enough resources available in your cluster while submitting the job. Do check following and configure based on hosts: yarn.nodemanager.resource.memory-mb yarn.nodemanager.resource.cpu-vcores yarn.scheduler.maximum-allocation-mb yarn.scheduler.minimum-allocation-mb yarn-scheduler.maximum-allocation-vcores yarn.scheduler.minimum-allocation-vcores

Online	Offline
Last Visited	‎12-11-2024 06:25 AM

Member Since	‎04-05-2016 11:32 PM
Last Visited	‎12-11-2024 06:25 AM
Posts	37
Kudos received	8

Cloudera Community

Re: How deep does command "yarn application -list ...

Re: Config log4j in Spark - Driver Logs

Re: How to define datatype when creating dataframe...

Re: create Analytics from http usng spark streamin...

Re: not able to launch PYSPARK CDH 5.5.0 and spark...

Re: Using Cloudera Machine Learning to run a Data ...

Re: How deep does command "yarn application -list ...

Re: Config log4j in Spark - Driver Logs

Re: ImportError: No module named numpy

Re: How to define datatype when creating dataframe...

Re: PySpark - Error initializing SparkContext

Re: Issue running spark application in Yarn-cluste...

Re: Spark program in eclipse

Re: create Analytics from http usng spark streamin...

Re: not able to launch PYSPARK CDH 5.5.0 and spark...