About _Umesh

_Umesh · ‎02-29-2024

How about submitting a CDE job from CML to a private cloud base cluster?

Marek · ‎07-31-2020

In a CDH 6.3.2 cluster have an Anaconda parcel distributed and activated, which of course has the numpy module installed. However the Spark nodes seem to ignore the CDH configuration and keep using the system wide Python from /usr/bin/python. Nevertheless I have installed numpy in system wide Python across all cluster nodes. However I still experience the "ImportError: No module named numpy". Would appreciate any further advice how to solve the problem. Not sure how to implement the solution referred in https://stackoverflow.com/questions/46857090/adding-pyspark-python-path-in-oozie.

gvbnn · ‎02-05-2020

while i running this programming i got these error can anyone help me with this [cloudera@quickstart ~]$ spark-shell --master yarn-client Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/flume-ng/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/parquet/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/avro/avro-tools-1.7.6-cdh5.13.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.6.0 /_/ Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67) Type in expressions to have them evaluated. Type :help for more information. 20/02/05 22:03:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/05 22:03:42 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. Spark context available as sc (master = yarn-client, app id = application_1580968178673_0001). SQL context available as sqlContext.

_Umesh · ‎07-30-2019

Since the "list" commands gets the apps from the ResourceManager and doesn't set any explicit filters and limits (except those provided with it) on the request, technically it returns all the applications which are present with RM at the moment. That number is controlled by "yarn.resourcemanager.max-completed-applications" config. Hope that clarifies.

_Umesh · ‎06-07-2019

As your intent seems to capture the driver logs in a separate file while executing the app in the cluster mode, make sure that '/some/path/to/edgeNode/' dir is present on all of the NodeManager essentially as in cluster mode the driver will be running in the Yarn app's application master. If you can't make sure that follow a general practice to provide log file path to some pre-existing paths e.g. "/var/log/SparkDriver.log".

MuthuVenkatesh · ‎09-24-2018

Hello Experts, We are upgrading Our Cloudera Hive from 1.3 to 2.0, Could you please let us know, if there is known issues related to this, i did a search in Tableau and Cloudera Community, but i didn't found any issues. Thanks in Advance!!! Regards, Muthu Venkatesh

rmadhavan123 · ‎07-28-2017

How do I query hive tables from spark 2.0 . Could you share the steps.

arpitbh · ‎05-01-2017

Hi, i am facing the below mentioned issue. Please help me to solve it 17/05/02 11:07:13 ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b java.io.IOException: Failed to delete: C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

_Umesh · ‎04-14-2017

It is the below line which is setting the data types for both the fields as StringType: val schema = StructType( schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, true))) You can define your custom schema as follows : val customSchema = StructType(Array( StructField("name", StringType, true), StructField("age", IntegerType, true))) You can add additional fields as well in the above schema definition. And then you can use this customSchema while creating the dataframe as follows: val peopleDataFrame = sqlContext.createDataFrame(rowRDD, customSchema) Also for details, please see this page.

athtsang · ‎04-05-2017

The cause of my case was described in Message 4-5 of the thread. Here are some possible solutions set spark.local.dir to somewhere else outside /tmp . Refer to Spark Configuration for how to configure the value. disable housekeeping of /tmp/spark-... periodic restart your spark streaming job

Online	Offline
Last Visited	‎12-11-2024 06:25 AM

Member Since	‎04-05-2016 11:32 PM
Last Visited	‎12-11-2024 06:25 AM
Posts	37
Kudos received	8

Cloudera Community

Re: How deep does command "yarn application -list ...

Re: Config log4j in Spark - Driver Logs

Re: How to define datatype when creating dataframe...

Re: create Analytics from http usng spark streamin...

Re: not able to launch PYSPARK CDH 5.5.0 and spark...

Re: Using Cloudera Machine Learning to run a Data ...

Re: ImportError: No module named numpy

Re: PySpark - Error initializing SparkContext

Re: How deep does command "yarn application -list ...

Re: Config log4j in Spark - Driver Logs

Compatibility Issues between Tableau 10.3 and Clou...

Re: Multiple Spark version on the same cluster

Re: Spark program in eclipse

Re: How to define datatype when creating dataframe...

Re: Spark Streaming: FileNotFoundException on file...