<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Hi, Is there a way to load xlsx file into hive table? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135721#M27532</link>
    <description>&lt;P&gt;Not directly I am afraid. You can write a MapReduce job that transforms them into normal delimited data. Similar to the way it was done with Tika here. ( Assuming you have lots of small files ) &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html" target="_blank"&gt;https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You would however need to use a Java library like POI instead of Tika&lt;/P&gt;&lt;P&gt;&lt;A href="https://poi.apache.org/" target="_blank"&gt;https://poi.apache.org/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;To read it directly in Hive you need to write an HiveInputFormat. You can use this inputformat class as an example: &lt;/P&gt;&lt;P&gt;&lt;A href="https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/" target="_blank"&gt;https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you return a row for each record that is delimited and pretend to the Hive Serde that its a text inputformat you might be able to get it working.&lt;/P&gt;</description>
    <pubDate>Fri, 06 May 2016 23:26:08 GMT</pubDate>
    <dc:creator>bleonhardi</dc:creator>
    <dc:date>2016-05-06T23:26:08Z</dc:date>
    <item>
      <title>Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135718#M27529</link>
      <description />
      <pubDate>Fri, 06 May 2016 23:18:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135718#M27529</guid>
      <dc:creator>akilavel</dc:creator>
      <dc:date>2016-05-06T23:18:27Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135719#M27530</link>
      <description>&lt;P&gt;its better to export it as csv or any delimited format and load it into hive table.&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 23:21:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135719#M27530</guid>
      <dc:creator>rajkumar_singh</dc:creator>
      <dc:date>2016-05-06T23:21:02Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135720#M27531</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3805/akilavel.html" nodeid="3805"&gt;@AKILA VEL&lt;/A&gt;
&lt;/P&gt;&lt;P&gt;I don't think we have any direct method to do that, however there are few workaround to achieve this&lt;/P&gt;&lt;P&gt;One way is to write a custom java mapreduce job to convert xls to csv or create your own custom serd to access xls. &lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 23:21:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135720#M27531</guid>
      <dc:creator>jyadav</dc:creator>
      <dc:date>2016-05-06T23:21:44Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135721#M27532</link>
      <description>&lt;P&gt;Not directly I am afraid. You can write a MapReduce job that transforms them into normal delimited data. Similar to the way it was done with Tika here. ( Assuming you have lots of small files ) &lt;/P&gt;&lt;P&gt;&lt;A href="https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html" target="_blank"&gt;https://community.hortonworks.com/repos/4576/apache-tika-integration-with-mapreduce.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;You would however need to use a Java library like POI instead of Tika&lt;/P&gt;&lt;P&gt;&lt;A href="https://poi.apache.org/" target="_blank"&gt;https://poi.apache.org/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;To read it directly in Hive you need to write an HiveInputFormat. You can use this inputformat class as an example: &lt;/P&gt;&lt;P&gt;&lt;A href="https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/" target="_blank"&gt;https://sreejithrpillai.wordpress.com/2014/11/06/excel-inputformat-for-hadoop-mapreduce/&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you return a row for each record that is delimited and pretend to the Hive Serde that its a text inputformat you might be able to get it working.&lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 23:26:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135721#M27532</guid>
      <dc:creator>bleonhardi</dc:creator>
      <dc:date>2016-05-06T23:26:08Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135722#M27533</link>
      <description>&lt;P&gt;There are multiple options: &lt;/P&gt;&lt;P&gt;1. You can use apache tika (using a programming language like Java) to read the xlxs and load into hive. &lt;/P&gt;&lt;P&gt;2. If its a single xls sheet, then you can use pig's CSVExcelStorage() and insert into hive table using HCatStorer()&lt;/P&gt;&lt;P&gt;3. Convert to a delimited CSV and load it. &lt;/P&gt;</description>
      <pubDate>Fri, 06 May 2016 23:26:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135722#M27533</guid>
      <dc:creator>grajagopal</dc:creator>
      <dc:date>2016-05-06T23:26:42Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135723#M27534</link>
      <description>&lt;P&gt;Mine is xlsx files with single sheets. Can you please explain how to use pig's CSVExcelStorage() and insert into hive table using HCatStorer().&lt;/P&gt;</description>
      <pubDate>Sat, 07 May 2016 00:58:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135723#M27534</guid>
      <dc:creator>akilavel</dc:creator>
      <dc:date>2016-05-07T00:58:24Z</dc:date>
    </item>
    <item>
      <title>Re: Hi, Is there a way to load xlsx file into hive table?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135724#M27535</link>
      <description>&lt;P&gt;here's an example of CSVExcelStorage &lt;A href="https://community.hortonworks.com/questions/5775/best-practice-for-extractoutput-data-generated-by.html" target="_blank"&gt;https://community.hortonworks.com/questions/5775/best-practice-for-extractoutput-data-generated-by.html&lt;/A&gt; and then you can execute sql commands in pig using &lt;A href="https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 07 May 2016 01:06:59 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Hi-Is-there-a-way-to-load-xlsx-file-into-hive-table/m-p/135724#M27535</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-05-07T01:06:59Z</dc:date>
    </item>
  </channel>
</rss>

