<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Apache PIG - Create a Schema or the Schema is already created? in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169165#M37222</link>
    <description>&lt;P&gt;Hi experts,

Probably is a dummy question (but since I have &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ).

I want to know how Pig read the headers from the following dataset that is stored in .csv:

&lt;/P&gt;&lt;P&gt;ID,Name,Function

1,Johnny,Student

2,Peter,Engineer

3,Cloud,Teacher

4,Angel,Consultant

I want to have the first row as a Header of my file. There I need to put:
A = LOAD 'file' using PIGStorage(',') as (ID:Int,....etc) ?

Or I only need to put:

A = LOAD 'file' using PIGStorage(',')&lt;/P&gt;&lt;P&gt;And only with this pache PIG already know that the first line are the headers of my table.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
    <pubDate>Mon, 08 Aug 2016 23:30:23 GMT</pubDate>
    <dc:creator>Stewart12586</dc:creator>
    <dc:date>2016-08-08T23:30:23Z</dc:date>
    <item>
      <title>Apache PIG - Create a Schema or the Schema is already created?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169165#M37222</link>
      <description>&lt;P&gt;Hi experts,

Probably is a dummy question (but since I have &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; ).

I want to know how Pig read the headers from the following dataset that is stored in .csv:

&lt;/P&gt;&lt;P&gt;ID,Name,Function

1,Johnny,Student

2,Peter,Engineer

3,Cloud,Teacher

4,Angel,Consultant

I want to have the first row as a Header of my file. There I need to put:
A = LOAD 'file' using PIGStorage(',') as (ID:Int,....etc) ?

Or I only need to put:

A = LOAD 'file' using PIGStorage(',')&lt;/P&gt;&lt;P&gt;And only with this pache PIG already know that the first line are the headers of my table.&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 23:30:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169165#M37222</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-08-08T23:30:23Z</dc:date>
    </item>
    <item>
      <title>Re: Apache PIG - Create a Schema or the Schema is already created?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169166#M37223</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10082/vilaresantonio.html" nodeid="10082"&gt;@Pedro Rodgers&lt;/A&gt; &lt;/P&gt;&lt;P&gt;Pig won't automatically interpret the header line of your file, so you need to specify the "as (field1:type, field2:type)" definition.  If you just load the file, you will get the header line as a row of data, which you don't want.  There are a couple of ways you can deal with that, but using the CSVExcelStorage module from PiggyBank allows you to skip the header row.&lt;/P&gt;&lt;PRE&gt;REGISTER '/tmp/piggybank.jar';

A  = LOAD 'input.csv' USING org.apache.pig.piggybank.storage.CSVExcelStorage(',', 'NO_MULTILINE', 'UNIX', 'SKIP_INPUT_HEADER') AS (field1: int, field2: chararray);

DUMP A;

&lt;/PRE&gt;&lt;P&gt;Another way to do it is:&lt;/P&gt;&lt;PRE&gt;input_file = load 'input' USING PigStorage(',') as (row1:chararay, row2:chararray);
ranked = rank input_file;
NoHeader = Filter ranked by (rank_input_file &amp;gt; 1);
New_input_file = foreach NoHeader generate row1, row2;&lt;/PRE&gt;</description>
      <pubDate>Mon, 08 Aug 2016 23:41:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169166#M37223</guid>
      <dc:creator>myoung</dc:creator>
      <dc:date>2016-08-08T23:41:42Z</dc:date>
    </item>
    <item>
      <title>Re: Apache PIG - Create a Schema or the Schema is already created?</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169167#M37224</link>
      <description>&lt;P&gt;Try to use: CSVExcelStorage instead of regular PigStorage, CSVExcelStorage has option to consider or skip the header row.&lt;/P&gt;&lt;P&gt;Eg: &lt;A href="https://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html" target="_blank"&gt;https://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 23:48:04 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Apache-PIG-Create-a-Schema-or-the-Schema-is-already-created/m-p/169167#M37224</guid>
      <dc:creator>kkopparapu</dc:creator>
      <dc:date>2016-08-08T23:48:04Z</dc:date>
    </item>
  </channel>
</rss>

