<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pandas_udf with a tuple? (pyspark) in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Pandas-udf-with-a-tuple-pyspark/m-p/190143#M152232</link>
    <description>&lt;P&gt;It looks like you are using a scalar pandas_udf type, which doesn't support returning structs currently.  I believe the return type you want is an array of strings, which is supported, so this should work.  Try this:&lt;/P&gt;&lt;PRE&gt;@pandas_udf("array&amp;lt;string&amp;gt;")
def stringClassifier(x,y,z):
        # return a pandas series of a list of strings, that is same length as input - for example
        s = pd.Series([[u"a", u"b"]] * len(x))
	return s&lt;/PRE&gt;&lt;P&gt;If you are using Python 2, make sure your strings are in unicode otherwise they might get interpreted as bytes.  Hope that helps!&lt;/P&gt;</description>
    <pubDate>Fri, 13 Jul 2018 01:13:47 GMT</pubDate>
    <dc:creator>o912451</dc:creator>
    <dc:date>2018-07-13T01:13:47Z</dc:date>
  </channel>
</rss>

