<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: How to remove header and footer from a CSV file in PIG in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133274#M43635</link>
    <description>&lt;P&gt;Thats correct. Your full file is what you load initially. &lt;/P&gt;&lt;P&gt;Try the following: &lt;/P&gt;&lt;PRE&gt;filterHeader = STREAM fullFile THROUGH `tail -n +10| head -n -1`;&lt;/PRE&gt;&lt;P&gt; and &lt;/P&gt;&lt;PRE&gt;DUMP filterHeader;   to verify the same. &lt;/PRE&gt;</description>
    <pubDate>Sun, 16 Oct 2016 22:22:15 GMT</pubDate>
    <dc:creator>grajagopal</dc:creator>
    <dc:date>2016-10-16T22:22:15Z</dc:date>
    <item>
      <title>How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133270#M43631</link>
      <description>&lt;P&gt;I have a CSV file that looks like this:&lt;/P&gt;&lt;TABLE&gt;
	
	
	
	
	
	
	
	
	&lt;TBODY&gt;&lt;TR&gt;
		&lt;TD&gt;Report Name: XYZ&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Report Time: 11/11/1111&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Time Zone: (GMT+05:30) i&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Last Completed &lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Last Completed Available Hour: &lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Report Aggregation: Daily&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Report Filter: &lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Potential Incomplete Data: true&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;Rows: 1&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;
	
	&lt;TR&gt;
		&lt;TD&gt;GregorianDate&lt;/TD&gt;
		&lt;TD&gt;AccountId&lt;/TD&gt;
		&lt;TD&gt;AccountName&lt;/TD&gt;
		&lt;TD&gt;Clicks&lt;/TD&gt;
		&lt;TD&gt;Impressions&lt;/TD&gt;
		&lt;TD&gt;Ctr&lt;/TD&gt;
		&lt;TD&gt;AverageCpc&lt;/TD&gt;
		&lt;TD&gt;Spend&lt;/TD&gt;
	&lt;/TR&gt;
	&lt;TR&gt;
		&lt;TD&gt;10/15/2016&lt;/TD&gt;
		&lt;TD&gt;1234556&lt;/TD&gt;
		&lt;TD&gt;ABC&lt;/TD&gt;
		
	
	&lt;/TR&gt;&lt;TR&gt;
		&lt;TD&gt;©2016 Microsoft Corporation. All rights reserved. 
&lt;/TD&gt;
		
		
		
		
		
		
		
	&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;I need all header and footer taken off and only the actual data with column names to stay in this file. How do I do it in Pig?I would need this to be mapped to a Hive table so cannot have it this way. &lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:00:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133270#M43631</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-16T22:00:22Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133271#M43632</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/10486/simrank.html" nodeid="10486"&gt;@Simran Kaur&lt;/A&gt; - If the headers and trailers are static, you can eliminate them using PIG STREAM.  &lt;/P&gt;&lt;P&gt;For example, Once you load the file to a relation, you can stream through the file to remove the first 10 lines as follows: &lt;/P&gt;&lt;PRE&gt;filterHeader = STREAM fullFile THROUGH `tail -n +10`;&lt;/PRE&gt;&lt;P&gt;Hope this helps!! &lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:14:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133271#M43632</guid>
      <dc:creator>grajagopal</dc:creator>
      <dc:date>2016-10-16T22:14:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133272#M43633</link>
      <description>&lt;P&gt;That helps. What about the footer? Yes, headers and footers are static. &lt;A rel="user" href="https://community.cloudera.com/users/33/grajagopal.html" nodeid="33"&gt;@grajagopal&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:17:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133272#M43633</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-16T22:17:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133273#M43634</link>
      <description>&lt;PRE&gt;Also, I am going to be loading file through a CSV loader, so fullfile here is fullfile = LOAD 'Path_to_File' USING PigStorage(',') ?&lt;/PRE&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:18:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133273#M43634</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-16T22:18:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133274#M43635</link>
      <description>&lt;P&gt;Thats correct. Your full file is what you load initially. &lt;/P&gt;&lt;P&gt;Try the following: &lt;/P&gt;&lt;PRE&gt;filterHeader = STREAM fullFile THROUGH `tail -n +10| head -n -1`;&lt;/PRE&gt;&lt;P&gt; and &lt;/P&gt;&lt;PRE&gt;DUMP filterHeader;   to verify the same. &lt;/PRE&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:22:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133274#M43635</guid>
      <dc:creator>grajagopal</dc:creator>
      <dc:date>2016-10-16T22:22:15Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133275#M43636</link>
      <description>&lt;P&gt;Thanks. Also, where can I find how exactly tail and head work here? Looks a little confusing to me . Any good resources?&lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:23:56 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133275#M43636</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-16T22:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133276#M43637</link>
      <description>&lt;P&gt;I can iterate filterHeader using forEach as usual we do for file loaded using PigStorage right? There should be no difference?&lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:26:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133276#M43637</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-16T22:26:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133277#M43638</link>
      <description>&lt;P&gt;Just DUMP the relation filteHeader to verify the same. &lt;/P&gt;</description>
      <pubDate>Sun, 16 Oct 2016 22:30:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133277#M43638</guid>
      <dc:creator>grajagopal</dc:creator>
      <dc:date>2016-10-16T22:30:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133278#M43639</link>
      <description>&lt;P&gt;Awesome! It worked perfect. Thanks &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 12:58:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133278#M43639</guid>
      <dc:creator>simran_k</dc:creator>
      <dc:date>2016-10-17T12:58:37Z</dc:date>
    </item>
    <item>
      <title>Re: How to remove header and footer from a CSV file in PIG</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133279#M43640</link>
      <description>&lt;P&gt;HI After filtering the file I am not able to load it in Hive please help &lt;/P&gt;&lt;P&gt;Pig Stack Trace&lt;BR /&gt;---------------&lt;BR /&gt;ERROR 1002: Unable to store alias C&lt;/P&gt;&lt;P&gt;org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias C&lt;BR /&gt;  at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1647)&lt;BR /&gt;  at org.apache.pig.PigServer.registerQuery(PigServer.java:587)&lt;BR /&gt;  at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1093)&lt;BR /&gt;  at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)&lt;BR /&gt;  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:198)&lt;BR /&gt;  at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173)&lt;BR /&gt;  at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)&lt;BR /&gt;  at org.apache.pig.Main.run(Main.java:547)&lt;BR /&gt;  at org.apache.pig.Main.main(Main.java:158)&lt;BR /&gt;  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)&lt;BR /&gt;  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)&lt;BR /&gt;  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)&lt;BR /&gt;  at java.lang.reflect.Method.invoke(Method.java:606)&lt;BR /&gt;  at org.apache.hadoop.util.RunJar.run(RunJar.java:221)&lt;BR /&gt;  at org.apache.hadoop.util.RunJar.main(RunJar.java:136)&lt;BR /&gt;Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 0:&lt;BR /&gt;&amp;lt;line 51, column 0&amp;gt; Output Location Validation Failed for: 'haasbat0200_10215.dslam_dlm_table_nokia_test More info to follow:&lt;BR /&gt;Pig 'bytearray' type in column 0(0-based) cannot map to HCat 'STRING'type.  Target filed must be of HCat type {BINARY}&lt;BR /&gt;  at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:75)&lt;BR /&gt;  at org.apache.pig.newplan.logical.relational.LOStore.accept(LOStore.java:66)&lt;BR /&gt;  at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)&lt;BR /&gt;  at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)&lt;BR /&gt;  at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)&lt;BR /&gt;  at org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:66)&lt;BR /&gt;  at org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)&lt;BR /&gt;  at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)&lt;BR /&gt;  at org.apache.pig.newplan.logical.rules.InputOutputFileValidator.validate(InputOutputFileValidator.java:45)&lt;BR /&gt;  at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:311)&lt;BR /&gt;  at org.apache.pig.PigServer.compilePp(PigServer.java:1392)&lt;BR /&gt;  at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1317)&lt;BR /&gt;  at org.apache.pig.PigServer.execute(PigServer.java:1309)&lt;BR /&gt;  at org.apache.pig.PigServer.access$400(PigServer.java:122)&lt;BR /&gt;  at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1642)&lt;BR /&gt;  ... 14 more&lt;BR /&gt;Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 0: Pig 'bytearray' type in column 0(0-based) cannot map to HCat 'STRING'type.  Target filed must be of HCat type {BINARY}&lt;BR /&gt;  at org.apache.hive.hcatalog.pig.HCatBaseStorer.throwTypeMismatchException(HCatBaseStorer.java:602)&lt;BR /&gt;  at org.apache.hive.hcatalog.pig.HCatBaseStorer.validateSchema(HCatBaseStorer.java:558)&lt;BR /&gt;  at org.apache.hive.hcatalog.pig.HCatBaseStorer.doSchemaValidations(HCatBaseStorer.java:495)&lt;BR /&gt;  at org.apache.hive.hcatalog.pig.HCatStorer.setStoreLocation(HCatStorer.java:201)&lt;BR /&gt;  at org.apache.pig.newplan.logical.rules.InputOutputFileValidator$InputOutputFileVisitor.visit(InputOutputFileValidator.java:68)&lt;BR /&gt;  ... 28 more&lt;/P&gt;</description>
      <pubDate>Sat, 09 Dec 2017 02:44:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/How-to-remove-header-and-footer-from-a-CSV-file-in-PIG/m-p/133279#M43640</guid>
      <dc:creator>nikhilkasturkar</dc:creator>
      <dc:date>2017-12-09T02:44:01Z</dc:date>
    </item>
  </channel>
</rss>

