About srowen

Fawze · ‎02-07-2017

How i can know if the parcels files should has sha or sha1 signature? When Spark2 will be part of CDH4 parcels, which version? currently i have 1.6, when i add 2.0 will the spark history server be 2.0 also? if i used 2.0 parcels, which one will be the active one on the cluster 1.6 or 2.0?

mbigelow · ‎01-16-2017

@justin3113 to run jobs across all nodes a user must exist on each node, I'd justin3113 for example. And each user needs a HDFS user directory under /user in HDFS, the user must have read and write access. This is so the job can write temporary data to HDFS from whatever node the job is running. The error is stating that it is trying to create that user directory but only the hdfs user has that permission. Opening up access gets around it but that is not advisable. You should run for each user su - hdfs hdfs dfs -mkdir /user/justin3113.

jack0188 · ‎12-29-2016

i'm using sbt, should i use spark-submit everytime we need to run a project? SBT run, is catering my needs for now, as im using it in local mode.

srowen · ‎12-16-2016

It won't be terribly different -- like a maintenance release generally contains a small number of fixes -- but yes you will want to update it in general. You will need the GA version if you want production support, too.

Senzopt · ‎11-17-2016

Hi, I am following steps from the following link for RHadoop installation on cloudera https://ashokharnal.wordpress.com/2013/08/25/installing-r-rhadoop-and-rstudio-over-cloudera-hadoop-ecosystem/#comment-2441 Will it work for cloudera 1.6? Thanks.

zhuangmz · ‎11-16-2016

<repository> <id>Cloudera Repository</id> <url>https://repository.cloudera.com/content/repositories/releases/</url> </repository> <repository> <id>Cloudera Beta Repository</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> I'm using this links 🙂

nbutt · ‎11-09-2016

hi were you able to resolve this problem? ip-10-0-0-5.ec2.internal, executor 1): java.lang.AbstractMethodError at org.apache.spark.Logging$class.log(Logging.scala:50) at org.apache.spark.streaming.twitter.TwitterReceiver.log(TwitterInputDStream.scala:60) at org.apache.spark.Logging$class.logInfo(Logging.scala:58) at org.apache.spark.streaming.twitter.TwitterReceiver.logInfo(TwitterInputDStream.scala:60) at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:96) at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.s

srowen · ‎09-26-2016

The essential point here is that you want to avoid a shuffle, and you can avoid a shuffle if both RDDs are partitioned in the same way, because then all values for the same key are already on 1 partition in each RDD. join calls cogroup so yes both can accomplish this, as long as both RDDs have the same partitioner. This won't be true, however, if you first flatMap one of the RDDs which can't be known to retain the partitioning.

cimox · ‎09-09-2016

Anyone please with detailed explanation?

RMG · ‎09-03-2016

Yes, exactly, I mean this. I would like to copy the file result in my local machine.

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Spark 2

Re: Permission Error while running spark-shell

Re: What dependencies to submit Spark jobs program...

Re: Spark 2 - official and beta

Re: how to install RHadoop on CDH5.3

Re: Maven Repository for Spark2.0 beta?

Re: Lost executor error

Re: join two grouped by key RDD's

Re: Oryx max-age params

Re: Copy the contents of "output/ part-00000" in a...