About BrianWhite

prakharpanwaria · ‎05-11-2017

Ran into the same problem, resolved by enabling 'Hive Service' in Spark2.

hubbarja · ‎10-12-2016

Great, I'm glad the udf worked. As for the numpy issue, I'm not familiar enough with using numpy within spark to give any insights, but the workaround seems trivial enough. If you are looking for a more elegant solution, you may want to create a new thread and include the error. You may also want to take a look at sparks mllib statistics functions[1], though they operate across rows instead of within a single column. 1. http://spark.apache.org/docs/latest/mllib-statistics.html

Online	Offline
Last Visited	‎02-17-2017 10:37 AM

Member Since	‎10-04-2016 07:02 AM
Last Visited	‎02-17-2017 10:37 AM
Posts	3
Kudos received	3

Cloudera Community

Re: Spark 2 beta load or save Hive managed table

Re: PySpark: How to add column to dataframe with c...