About ofermend

ofermend · ‎12-11-2015

R is an RDD. So r1 is also an RDD. So you are trying to call "parallelize()" on an RDD, where you should not do that. Usually, use parallelize() on a local python object, like a list.

ofermend · ‎12-10-2015

A few clarifying questions about rawTrainData: - How is this RDD generated? - Is it cached? - how many partitions does it have? Also, what is the variable "valores"?

ofermend · ‎12-08-2015

Can u please post the full code and error log?

ofermend · ‎11-11-2015

The pyspark shell is just Python too. So using dir() should show all existing python variables (although it also shows all imports and a bunch of things you may not be looking for).

ofermend · ‎10-29-2015

Sorry don't have anything ready, but sounds like a good idea to make this. What criteria are we looking to compare by?

ofermend · ‎10-23-2015

Looks like some conflict b/w Spark and Phoenix jars. No? Googling on the data in the stack trace, it looks related to Jackson. I'm not familair with Phoenix - does it use it's own version?

ofermend · ‎10-22-2015

Can you please share the code and full error log?

Online	Offline
Last Visited	‎04-14-2016 04:38 PM

Member Since	‎09-25-2015 05:18 PM
Last Visited	‎04-14-2016 04:38 PM
Posts	9
Kudos received	7

Cloudera Community

Re: Getting Error while executing this command

Re: Getting Error while executing this command

Re: Best way to select distinct values from multip...

Re: Run RDD operations on SQL Dataframe in 1.3.1

Re: Do the Spark REPLs have a way to list current ...

Re: Zeppelin vs. IPython notebook(Jupyter)

Re: Spark to Phoenix

Re: Spark to Phoenix