<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Whats the best way to read multiline cvs and  transpose it to columns in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157685#M33027</link>
    <description>&lt;P&gt;I have a requirement where in I need to ingest multiline CSV with semistructured records with some rows need to be converted to column and some rows needs to be both rows and column.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;below is the input CSV file look like:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;a,a1,a11,7/1/2008&lt;/P&gt;&lt;P&gt;b,b1,b11,8:53:00&lt;/P&gt;&lt;P&gt;c,c1,c11,25&lt;/P&gt;&lt;P&gt;d,d1,d11,1&lt;/P&gt;&lt;P&gt;e,e1,e11, ABCDEF&lt;/P&gt;&lt;P&gt;f,f1,f11,&lt;/P&gt;&lt;P&gt;sn1,msg,ref_sn_01,abc&lt;/P&gt;&lt;P&gt;sn2,msg,ref_sn_02,def&lt;/P&gt;&lt;P&gt;sn3,msg,ref_sn_02,ghi&lt;/P&gt;&lt;P&gt;sn4,msg,ref_sn_04,jkl&lt;/P&gt;&lt;P&gt;sn5,msg,ref_sn_05,mno&lt;/P&gt;&lt;P&gt;sn6,msg,ref_sn_06,pqr&lt;/P&gt;&lt;P&gt;sn7,msg,ref_sn_07,stu&lt;/P&gt;&lt;P&gt;sn8,msg,ref_sn_08,vwx&lt;/P&gt;&lt;P&gt;sn9,msg,ref_sn_09,yza&lt;/P&gt;&lt;P&gt;sn9,msg,ref_sn_09,yza&lt;/P&gt;&lt;P&gt;sn10,msg,ref_sn_010,&lt;/P&gt;&lt;P&gt;sn11,msg,ref_sn_011&lt;/P&gt;&lt;P&gt;cp1,ana,pw01,1.1&lt;/P&gt;&lt;P&gt;cp2,ana,pw02,1.1&lt;/P&gt;&lt;P&gt;cp3,ana,pw03,1.1&lt;/P&gt;&lt;P&gt;cp4,ana,pw04,1.1&lt;/P&gt;&lt;P&gt;cp5,ana,pw05,1.1&lt;/P&gt;&lt;P&gt;cp6,ana,pw06,1.1&lt;/P&gt;&lt;P&gt;cp7,ana,pw07,1.1&lt;/P&gt;&lt;P&gt;cp8,ana,pw08,1.1&lt;/P&gt;&lt;P&gt;cp9,ana,pw09,1.1&lt;/P&gt;&lt;P&gt;cp10,ana,pw10,1.1&lt;/P&gt;&lt;P&gt;cp11,ana,pw11,1.1&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Below is the expected output:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="5241-screen-shot-2016-06-25-at-43154-pm.png" style="width: 652px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20844i55184CB9230E74C3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="5241-screen-shot-2016-06-25-at-43154-pm.png" alt="5241-screen-shot-2016-06-25-at-43154-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;please let me know whats the best to read it and load it in HDFS/Hive.&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://community.cloudera.com/t5/image/serverpage/image-id/6330i185C16C56B804614/image-size/large?v=1.0&amp;amp;px=999" border="0" alt="screen-shot-2016-06-25-at-44836-pm.png" title="screen-shot-2016-06-25-at-44836-pm.png" /&gt;</description>
    <pubDate>Sun, 18 Aug 2019 12:14:25 GMT</pubDate>
    <dc:creator>GeeKay2015</dc:creator>
    <dc:date>2019-08-18T12:14:25Z</dc:date>
    <item>
      <title>Whats the best way to read multiline cvs and  transpose it to columns</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157685#M33027</link>
      <description>&lt;P&gt;I have a requirement where in I need to ingest multiline CSV with semistructured records with some rows need to be converted to column and some rows needs to be both rows and column.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;below is the input CSV file look like:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;a,a1,a11,7/1/2008&lt;/P&gt;&lt;P&gt;b,b1,b11,8:53:00&lt;/P&gt;&lt;P&gt;c,c1,c11,25&lt;/P&gt;&lt;P&gt;d,d1,d11,1&lt;/P&gt;&lt;P&gt;e,e1,e11, ABCDEF&lt;/P&gt;&lt;P&gt;f,f1,f11,&lt;/P&gt;&lt;P&gt;sn1,msg,ref_sn_01,abc&lt;/P&gt;&lt;P&gt;sn2,msg,ref_sn_02,def&lt;/P&gt;&lt;P&gt;sn3,msg,ref_sn_02,ghi&lt;/P&gt;&lt;P&gt;sn4,msg,ref_sn_04,jkl&lt;/P&gt;&lt;P&gt;sn5,msg,ref_sn_05,mno&lt;/P&gt;&lt;P&gt;sn6,msg,ref_sn_06,pqr&lt;/P&gt;&lt;P&gt;sn7,msg,ref_sn_07,stu&lt;/P&gt;&lt;P&gt;sn8,msg,ref_sn_08,vwx&lt;/P&gt;&lt;P&gt;sn9,msg,ref_sn_09,yza&lt;/P&gt;&lt;P&gt;sn9,msg,ref_sn_09,yza&lt;/P&gt;&lt;P&gt;sn10,msg,ref_sn_010,&lt;/P&gt;&lt;P&gt;sn11,msg,ref_sn_011&lt;/P&gt;&lt;P&gt;cp1,ana,pw01,1.1&lt;/P&gt;&lt;P&gt;cp2,ana,pw02,1.1&lt;/P&gt;&lt;P&gt;cp3,ana,pw03,1.1&lt;/P&gt;&lt;P&gt;cp4,ana,pw04,1.1&lt;/P&gt;&lt;P&gt;cp5,ana,pw05,1.1&lt;/P&gt;&lt;P&gt;cp6,ana,pw06,1.1&lt;/P&gt;&lt;P&gt;cp7,ana,pw07,1.1&lt;/P&gt;&lt;P&gt;cp8,ana,pw08,1.1&lt;/P&gt;&lt;P&gt;cp9,ana,pw09,1.1&lt;/P&gt;&lt;P&gt;cp10,ana,pw10,1.1&lt;/P&gt;&lt;P&gt;cp11,ana,pw11,1.1&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Below is the expected output:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="5241-screen-shot-2016-06-25-at-43154-pm.png" style="width: 652px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/20844i55184CB9230E74C3/image-size/medium?v=v2&amp;amp;px=400" role="button" title="5241-screen-shot-2016-06-25-at-43154-pm.png" alt="5241-screen-shot-2016-06-25-at-43154-pm.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;please let me know whats the best to read it and load it in HDFS/Hive.&lt;/P&gt;&lt;BR /&gt;&lt;IMG src="https://community.cloudera.com/t5/image/serverpage/image-id/6330i185C16C56B804614/image-size/large?v=1.0&amp;amp;px=999" border="0" alt="screen-shot-2016-06-25-at-44836-pm.png" title="screen-shot-2016-06-25-at-44836-pm.png" /&gt;</description>
      <pubDate>Sun, 18 Aug 2019 12:14:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157685#M33027</guid>
      <dc:creator>GeeKay2015</dc:creator>
      <dc:date>2019-08-18T12:14:25Z</dc:date>
    </item>
    <item>
      <title>Re: Whats the best way to read multiline cvs and  transpose it to columns</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157686#M33028</link>
      <description>&lt;P&gt;This is quite a custom requirement that you are converting some rows to column and other  rows to both rows and column. You'll have to write a lot of your code but take advantage of pivot functionality in Spark. Check following link.&lt;/P&gt;&lt;P&gt;&lt;A href="https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html" target="_blank"&gt;https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;sc.parallelize(rdd.collect.toSeq.transpose)&lt;/P&gt;&lt;P&gt;See the link &lt;A href="http://stackoverflow.com/questions/29390717/how-to-transpose-an-rdd-in-spark"&gt;here&lt;/A&gt; for more details.&lt;/P&gt;</description>
      <pubDate>Sun, 26 Jun 2016 12:18:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157686#M33028</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2016-06-26T12:18:52Z</dc:date>
    </item>
    <item>
      <title>Re: Whats the best way to read multiline cvs and  transpose it to columns</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157687#M33029</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969"&gt;@mqureshi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks for your response. yes it is quite a custom requirement. I thought its better to check with the community if anyone has implemented this kinda stuff.&lt;/P&gt;&lt;P&gt;I am trying to use either hadoop custom input format or python UDF's  to get this done. There seems to be no straightforward  way of doing this in spark. I can not use spark pivot also as it supports only column  as of now right?. &lt;/P&gt;</description>
      <pubDate>Mon, 27 Jun 2016 22:43:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Whats-the-best-way-to-read-multiline-cvs-and-transpose-it-to/m-p/157687#M33029</guid>
      <dc:creator>GeeKay2015</dc:creator>
      <dc:date>2016-06-27T22:43:09Z</dc:date>
    </item>
  </channel>
</rss>

