About AutoIN

AutoIN · ‎04-10-2018

Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed. While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.

Aswanth11 · ‎03-12-2018

Is it possible to write into a location directly instead of hdfs path ?

AutoIN · ‎12-27-2017

Spark2.x comes bundled with its own scala (version 2.11). You do NOT need to install scala 2.11 separately or upgrade your existing scala 2.10 version. The Spark 2 installation will take care of the scala version for you. Once you install Spark2 (just ensure to review the pre-requisites and known issues.) you can find Scala 2.11 libraries under /opt/cloudera/parcels/SPARK2/lib/spark2/jars # ls -l /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala* -rw-r--r-- 1 root root 15487351 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-compiler-2.11.8.jar -rw-r--r-- 1 root root 5744974 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-library-2.11.8.jar -rw-r--r-- 1 root root 423753 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-parser-combinators_2.11-1.0.4.jar -rw-r--r-- 1 root root 4573750 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-reflect-2.11.8.jar -rw-r--r-- 1 root root 648678 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-xml_2.11-1.0.2.jar -rw-r--r-- 1 root root 802818 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scalap-2.11.8.jar The reason both Spark1.6 and Spark2.x can coexist is attributed to them having separate parcels and have separate ways of calling. Example to run an application with Spark2, you need to use spark2-shell, spark2-submit, or pyspark2. Likewise, if you want to run an application using Spark1.6 (CDH bundled), you need to use spark-shell, spark-submit, or pyspark.

AutoIN · ‎11-25-2017

Sure. One way I can think of achieving this is by creating a UDF using random and calling the udf within withColumn using coalesce. See below: scala> df1.show() +----+--------+----+ | id| name| age| +----+--------+----+ |1201| satish|39 | |1202| krishna|null| << |1203| amith|47 | |1204| javed|null| << |1205| prudvi|null| << +----+--------+----+ scala> val arr = udf(() => scala.util.Random.nextInt(10).toString()) scala> val df2 = df1.withColumn("age", coalesce(df1("age"), arr())) scala> df2.show() +----+--------+---+ | id| name|age| +----+--------+---+ |1201| satish| 39| |1202| krishna| 2 | << |1203| amith| 47| |1204| javed| 9 | << |1205| prudvi| 7 | << +----+--------+---+

Hitesh88 · ‎10-19-2017

Also, is there way to confirm csd file is properly deployed. Also, I don't see scala 11 libraries under /opt/cloudera/parcels/CDH/jars and only scala 10 libraries. I heard that scala 10 and 11 both are installed with CDH 5.7 and later. Shouldn't scala 11 be available, Is this also cause for spark2 service not appearing. I did all steps as mentioned and all steps did completely successfully, spark2 parcel is activated now. Regards, Hitesh

AutoIN · ‎07-27-2017

If I understand it correctly - you are able to get past the earlier messages complaining about "Yarn application has already ended!" and now when you try to run pyspark2 it gives you a shell prompt, however, running simple commands to convert a list of strings to all upper case results in containers getting killed with Exit Status:1 . $ pyspark2 Using Python version 3.6.1 (default, Jul 27 2017 11:07:01) sparkSession available as 'spark'. >>> strings=['old'] >>> s2=sc.parallelize(strings) >>> s3=s2.map(lambda x:x.upper()) >>> s3.collect() [Stage 0:> (0 + 0) / 2]17/07/27 14:52:18 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000002 on host: Slave3. Exit status: 1. To review why the application failed, we need to look at the container logs. Container logs are available from command line by running yarn logs command # yarn logs -applicationId <application ID> | less OR Cloudera Manager > Yarn > WebUI > Resource Manager WebUI > application_1501131033996_0005 > Check for Logs at the bottom > stderr and stdout

AutoIN · ‎07-26-2017

Sorry, Spark 2.x is only available in form of parcels. We do not ship or have packages (RPM etc) for it currently.

Pavanpark · ‎07-20-2017

Thank you so much for the Response. All this time I was not using Cloudera Express and I am just getting used to it. Now the Spark-shell started with no problem. I would highly Appreciate your Help. Thank you Pavan

AutoIN · ‎03-02-2017

Bit late to reply, but if the cluster is secure, try pointing the hbase configuration to the spark driver and executor classpath explicitly using 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath'. Also make sure that the host from where you are running the spark command has the gateway role added. Example: $ pyspark --jars /opt/cloudera/parcels/CDH/jars/spark-examples-1.6.0-cdh5.7.3-hadoop2.6.0-cdh5.7.3.jar,/opt/cloudera/parcels/CDH/jars/hbase-examples-1.2.0-cdh5.7.3.jar --conf "spark.executor.extraClassPath=/etc/hbase/conf/" --conf "spark.driver.extraClassPath=/etc/hbase/conf/"

AutoIN · ‎04-21-2016

As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions ) Example: When the table is pre-splitted with 6 regions: hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1 ... .... Job Counters Launched map tasks=1 Launched reduce tasks=6 <<<

Online	Offline
Last Visited	‎06-16-2022 06:25 AM

Member Since	‎11-16-2015 10:11 PM
Last Visited	‎06-16-2022 06:25 AM
Posts	195
Kudos received	36

Cloudera Community

Re: Problem starting CDSW sessions after deleting ...

Re: cdsw containers crashing

Re: cdsw -tcp-ingress controller failing..grpc_sta...

Re: CDSW 1.6 Release Date and OS Requirements

Re: Allocating more than 50% of memory in cdsw

Re: Spark 2 - attemptFailuresValidityInterval issu...

Re: Generating File by reading Hive data in Spark

Re: How to upgrade Scala version from 2.10.5 to 2....

Re: Generating unique Ids for hive table using Sca...

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: Failed to initialize pyspark2.2 in CDH5.12

Re: Spark 2.2 as a package

Re: spark-shell stuck

Re: Hbase Soket TimeOut Exception

Re: HBase increase num of reducers for bulk loadin...