question Re: Rowwise manipulation of a DataFrame in PySpark. in Support Questions

question Re: Rowwise manipulation of a DataFrame in PySpark. in Support Questions https://community.cloudera.com/t5/Support-Questions/Rowwise-manipulation-of-a-DataFrame-in-PySpark/m-p/226334#M188194 <A rel="user" href="https://community.cloudera.com/users/29248/lukasmueller0289.html" nodeid="29248">@Lukas Müller</A> This should work for you: <PRE>from pyspark.sql.types import * from pyspark.sql.functions import udf # Create your UDF object (which accepts your python function called "my_udf") udf_object = udf(my_udf, ArrayType(StringType())) # Apply the UDF to your Dataframe (called "df") new_df = df.withColumn("new_column", udf_object(struct([df[x] for x in df.columns])))</PRE>That should work for you. If you want to make this better, replace "ArrayType(StringType())" with a schema such as:<PRE>schema = ArrayType(StructType([ StructField("mychar", StringType(), False), StructField("myint", IntegerType(), False) ])) </PRE>Hope this helps! Tue, 22 Aug 2017 20:47:40 GMT dzaratsian 2017-08-22T20:47:40Z