<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How do you write a RDD as a tab delimited file in pyspark? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126347#M34572</link>
    <description>&lt;P&gt;Is your RDD  an RDD of strings? &lt;/P&gt;&lt;P&gt;On the second part of the question,  if you are using the &lt;A target="_blank" href="https://github.com/databricks/spark-csv"&gt;spark-csv&lt;/A&gt;, the package supports saving simple (non-nested) DataFrame. There is an option to specify the delimiter which is , by default but can be changed.&lt;/P&gt;&lt;P&gt;eg - .save('filename.csv', 'com.databricks.spark.csv',delimiter="DELIM")&lt;/P&gt;</description>
    <pubDate>Thu, 14 Jul 2016 03:58:21 GMT</pubDate>
    <dc:creator>arunak</dc:creator>
    <dc:date>2016-07-14T03:58:21Z</dc:date>
    <item>
      <title>How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126346#M34571</link>
      <description>&lt;P&gt;I have a RDD I'd like to write as tab delimited. I also want to write a data frame as tab delimited. How do I do this?&lt;/P&gt;</description>
      <pubDate>Mon, 19 Sep 2022 18:45:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126346#M34571</guid>
      <dc:creator>don_jernigan</dc:creator>
      <dc:date>2022-09-19T18:45:28Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126347#M34572</link>
      <description>&lt;P&gt;Is your RDD  an RDD of strings? &lt;/P&gt;&lt;P&gt;On the second part of the question,  if you are using the &lt;A target="_blank" href="https://github.com/databricks/spark-csv"&gt;spark-csv&lt;/A&gt;, the package supports saving simple (non-nested) DataFrame. There is an option to specify the delimiter which is , by default but can be changed.&lt;/P&gt;&lt;P&gt;eg - .save('filename.csv', 'com.databricks.spark.csv',delimiter="DELIM")&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2016 03:58:21 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126347#M34572</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-07-14T03:58:21Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126348#M34573</link>
      <description>&lt;P style="margin-left: 40px;"&gt; &lt;A rel="user" href="https://community.cloudera.com/users/3076/bmathew.html" nodeid="3076"&gt;@Binu Mathew&lt;/A&gt; do you have any thoughts?&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2016 10:24:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126348#M34573</guid>
      <dc:creator>sunile_manjee</dc:creator>
      <dc:date>2016-07-14T10:24:58Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126349#M34574</link>
      <description>&lt;P&gt;Could you provide more details on the your RDD that you would like to save tab delimited? On the question about storing the DataFrames as a tab delimited file, below is what I have in scala using the package &lt;A target="_blank" href="https://github.com/databricks/spark-csv"&gt;spark-csv&lt;/A&gt;&lt;/P&gt;&lt;PRE&gt;df.write.format("com.databricks.spark.csv").option("delimiter", "\t").save("output path")&lt;/PRE&gt;&lt;P&gt;&lt;STRONG&gt;EDIT&lt;/STRONG&gt;


With the RDD of tuples, as you mentioned, either you could join by "\t" on the tuple or use mkString if you prefer not to use an additional library. On your RDD of tuple you could do something like &lt;/P&gt;&lt;PRE&gt;.map { x =&amp;gt;x.productIterator.mkString("\t") }.saveAsTextFile("path-to-store")&lt;/PRE&gt;&lt;P&gt;
&lt;A rel="user" href="https://community.cloudera.com/users/11581/donjernigan.html" nodeid="11581"&gt;@Don Jernigan&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2016 21:23:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126349#M34574</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-07-14T21:23:02Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126350#M34575</link>
      <description>&lt;P&gt;Essentially I have a python tuple ('a','b','c','x','y','z') that are all strings.  I could just map them into a single concatenation of ('a\tb\tc\tx\ty\tz'), then saveAsTextFile(path).  But I was wondering if there was a better way than using an external package which could just be encapsulating that .map(lambda x: "\t'".join(x) ).&lt;/P&gt;</description>
      <pubDate>Thu, 14 Jul 2016 23:04:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126350#M34575</guid>
      <dc:creator>don_jernigan</dc:creator>
      <dc:date>2016-07-14T23:04:12Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126351#M34576</link>
      <description>&lt;P&gt;I guess, if the data set does not contain a '\t' char then '\t'.join and saveAsTextFile should work for you. Else, you just need to wrap the strings within " as with normal CSVs. &lt;/P&gt;</description>
      <pubDate>Fri, 15 Jul 2016 01:48:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126351#M34576</guid>
      <dc:creator>arunak</dc:creator>
      <dc:date>2016-07-15T01:48:18Z</dc:date>
    </item>
    <item>
      <title>Re: How do you write a RDD as a tab delimited file in pyspark?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126352#M34577</link>
      <description>&lt;P&gt;Try this, but this version is for version 1.5 and up&lt;/P&gt;&lt;P&gt;data.write.format('com.databricks.spark.csv').options(delimiter="\t", codec="org.apache.hadoop.io.compress.GzipCodec").save('s3a://myBucket/myPath')&lt;/P&gt;</description>
      <pubDate>Mon, 18 Jul 2016 22:42:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-do-you-write-a-RDD-as-a-tab-delimited-file-in-pyspark/m-p/126352#M34577</guid>
      <dc:creator>doug_mengistu</dc:creator>
      <dc:date>2016-07-18T22:42:41Z</dc:date>
    </item>
  </channel>
</rss>

