Member since
11-16-2015
195
Posts
36
Kudos Received
16
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2094 | 10-23-2019 08:44 PM | |
2174 | 09-18-2019 09:48 AM | |
8386 | 09-18-2019 09:37 AM | |
1896 | 07-16-2019 10:58 AM | |
2723 | 04-05-2019 12:06 AM |
04-10-2018
11:08 PM
1 Kudo
Sorry, this is a bug described in SPARK-22876 which suggests that the current logic of spark.yarn.am.attemptFailuresValidityInterval is flawed. While the jira is still being worked on, looking at the comments, I don't foresee a fix anytime soon.
... View more
03-12-2018
06:02 AM
Is it possible to write into a location directly instead of hdfs path ?
... View more
12-27-2017
01:53 AM
1 Kudo
Spark2.x comes bundled with its own scala (version 2.11). You do NOT need to install scala 2.11 separately or upgrade your existing scala 2.10 version. The Spark 2 installation will take care of the scala version for you. Once you install Spark2 (just ensure to review the pre-requisites and known issues.) you can find Scala 2.11 libraries under /opt/cloudera/parcels/SPARK2/lib/spark2/jars # ls -l /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala*
-rw-r--r-- 1 root root 15487351 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-compiler-2.11.8.jar
-rw-r--r-- 1 root root 5744974 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-library-2.11.8.jar
-rw-r--r-- 1 root root 423753 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-parser-combinators_2.11-1.0.4.jar
-rw-r--r-- 1 root root 4573750 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-reflect-2.11.8.jar
-rw-r--r-- 1 root root 648678 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scala-xml_2.11-1.0.2.jar
-rw-r--r-- 1 root root 802818 Jul 12 19:16 /opt/cloudera/parcels/SPARK2/lib/spark2/jars/scalap-2.11.8.jar The reason both Spark1.6 and Spark2.x can coexist is attributed to them having separate parcels and have separate ways of calling. Example to run an application with Spark2, you need to use spark2-shell, spark2-submit, or pyspark2. Likewise, if you want to run an application using Spark1.6 (CDH bundled), you need to use spark-shell, spark-submit, or pyspark.
... View more
11-25-2017
10:40 PM
Sure. One way I can think of achieving this is by creating a UDF using random and calling the udf within withColumn using coalesce. See below: scala> df1.show()
+----+--------+----+
| id| name| age|
+----+--------+----+
|1201| satish|39 |
|1202| krishna|null| <<
|1203| amith|47 |
|1204| javed|null| <<
|1205| prudvi|null| <<
+----+--------+----+
scala> val arr = udf(() => scala.util.Random.nextInt(10).toString())
scala> val df2 = df1.withColumn("age", coalesce(df1("age"), arr()))
scala> df2.show()
+----+--------+---+
| id| name|age|
+----+--------+---+
|1201| satish| 39|
|1202| krishna| 2 | <<
|1203| amith| 47|
|1204| javed| 9 | <<
|1205| prudvi| 7 | <<
+----+--------+---+
... View more
10-19-2017
07:25 AM
Also, is there way to confirm csd file is properly deployed. Also, I don't see scala 11 libraries under /opt/cloudera/parcels/CDH/jars and only scala 10 libraries. I heard that scala 10 and 11 both are installed with CDH 5.7 and later. Shouldn't scala 11 be available, Is this also cause for spark2 service not appearing. I did all steps as mentioned and all steps did completely successfully, spark2 parcel is activated now. Regards, Hitesh
... View more
07-27-2017
09:28 AM
If I understand it correctly - you are able to get past the earlier messages complaining about "Yarn application has already ended!" and now when you try to run pyspark2 it gives you a shell prompt, however, running simple commands to convert a list of strings to all upper case results in containers getting killed with Exit Status:1 . $ pyspark2
Using Python version 3.6.1 (default, Jul 27 2017 11:07:01)
sparkSession available as 'spark'.
>>> strings=['old']
>>> s2=sc.parallelize(strings)
>>> s3=s2.map(lambda x:x.upper())
>>> s3.collect()
[Stage 0:> (0 + 0) / 2]17/07/27 14:52:18 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1501131033996_0005_01_000002 on host: Slave3. Exit status: 1. To review why the application failed, we need to look at the container logs. Container logs are available from command line by running yarn logs command # yarn logs -applicationId <application ID> | less OR Cloudera Manager > Yarn > WebUI > Resource Manager WebUI > application_1501131033996_0005 > Check for Logs at the bottom > stderr and stdout
... View more
07-26-2017
05:53 AM
Sorry, Spark 2.x is only available in form of parcels. We do not ship or have packages (RPM etc) for it currently.
... View more
07-20-2017
07:06 AM
Thank you so much for the Response. All this time I was not using Cloudera Express and I am just getting used to it. Now the Spark-shell started with no problem. I would highly Appreciate your Help. Thank you Pavan
... View more
03-02-2017
07:23 AM
2 Kudos
Bit late to reply, but if the cluster is secure, try pointing the hbase configuration to the spark driver and executor classpath explicitly using 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath'. Also make sure that the host from where you are running the spark command has the gateway role added. Example: $ pyspark --jars /opt/cloudera/parcels/CDH/jars/spark-examples-1.6.0-cdh5.7.3-hadoop2.6.0-cdh5.7.3.jar,/opt/cloudera/parcels/CDH/jars/hbase-examples-1.2.0-cdh5.7.3.jar --conf "spark.executor.extraClassPath=/etc/hbase/conf/" --conf "spark.driver.extraClassPath=/etc/hbase/conf/"
... View more
04-21-2016
12:01 AM
As Harsh suggested, for a new (empty) table without any 'split' defined, the number of reducers will always be 1. If you pre-split the table before import, you'd get that many number of reducers (Of course pre-splitting needs a good idea of how your row keys are designed and is broad topic in itself. See HBase The Definitive Guide > Chapter 11 > Optimizing Splits and Compactions >Presplitting Regions ) Example: When the table is pre-splitted with 6 regions: hbase(main):002:0>create 'hly_temp2', {NAME => 't', VERSIONS => 1}, {SPLITS => ['USW000138290206', 'USW000149290623', 'USW000231870807', 'USW000242331116', 'USW000937411119']} # hadoop jar /usr/lib/hbase/hbase-server.jar importtsv -Dimporttsv.bulk.output=/user/hac/output/2-4 -Dimporttsv.columns=HBASE_ROW_KEY,t:v01 hly_temp2 /user/hac/input/2-1 ... .... Job Counters Launched map tasks=1 Launched reduce tasks=6 <<<
... View more
- « Previous
- Next »