<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Structured Unstructured Data for Pig and Hive in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148415#M28327</link>
    <description>&lt;P&gt;good points; an example of some of the "corner cases" on CSV files (especially those generated by tools like Excel) are discussed in &lt;A href="https://martin.atlassian.net/wiki/x/WYBmAQ" target="_blank"&gt;https://martin.atlassian.net/wiki/x/WYBmAQ&lt;/A&gt;. &lt;/P&gt;</description>
    <pubDate>Wed, 18 May 2016 03:51:22 GMT</pubDate>
    <dc:creator>LesterMartin</dc:creator>
    <dc:date>2016-05-18T03:51:22Z</dc:date>
    <item>
      <title>Structured Unstructured Data for Pig and Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148412#M28324</link>
      <description>&lt;P&gt;Hi, &lt;/P&gt;&lt;P&gt;Can anyone elaborate on why pig and hive are better suited for unstructured and structured respectively?&lt;/P&gt;&lt;P&gt;My understanding of structured data is data that follows a particular schema and after that I've very little knowledge. &lt;/P&gt;&lt;P&gt;Is there a limitation with CSV files and variable length fields that Pig can handle easily?   &lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 17:32:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148412#M28324</guid>
      <dc:creator>jgarrigan</dc:creator>
      <dc:date>2016-05-14T17:32:34Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Unstructured Data for Pig and Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148413#M28325</link>
      <description>&lt;P&gt;Pig is great for data discovery as you don't need schema on top of data, you can tell Pig how to consume the raw data by specifying delimeters. With Hive you still need to cleanse the data a bit before you can apply some sort of schema. So for dirty data, Pig is the first tool to use and then for familiar SQL functionality you can switch to Hive. Both can consume same datasets.&lt;/P&gt;&lt;P&gt;Hadoop is designed to be a generic data processing framework and for that it's designed so  schema is applied at read stage as opposed to relational sources where schema is applied on write. Check out a few of the intro tutorials we have on Pig and Hive and you will see right away the concepts in action. its unlike anything you've worked with before.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 18:45:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148413#M28325</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-05-14T18:45:11Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Unstructured Data for Pig and Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148414#M28326</link>
      <description>&lt;P&gt;The limitation in Pig as far as CSV is concerned is that with default PigStorage loader function, it can handle a limited set of delimiters and corner cases. For wider array of cases, use CSVStorage loader function.&lt;/P&gt;</description>
      <pubDate>Sat, 14 May 2016 18:50:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148414#M28326</guid>
      <dc:creator>aervits</dc:creator>
      <dc:date>2016-05-14T18:50:13Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Unstructured Data for Pig and Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148415#M28327</link>
      <description>&lt;P&gt;good points; an example of some of the "corner cases" on CSV files (especially those generated by tools like Excel) are discussed in &lt;A href="https://martin.atlassian.net/wiki/x/WYBmAQ" target="_blank"&gt;https://martin.atlassian.net/wiki/x/WYBmAQ&lt;/A&gt;. &lt;/P&gt;</description>
      <pubDate>Wed, 18 May 2016 03:51:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148415#M28327</guid>
      <dc:creator>LesterMartin</dc:creator>
      <dc:date>2016-05-18T03:51:22Z</dc:date>
    </item>
    <item>
      <title>Re: Structured Unstructured Data for Pig and Hive</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148416#M28328</link>
      <description>&lt;P&gt;Can someone tell me a scenario where PIG is only option and a scenario where HIVE is only option?&lt;/P&gt;</description>
      <pubDate>Wed, 01 May 2019 22:18:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Structured-Unstructured-Data-for-Pig-and-Hive/m-p/148416#M28328</guid>
      <dc:creator>arujit_das</dc:creator>
      <dc:date>2019-05-01T22:18:14Z</dc:date>
    </item>
  </channel>
</rss>

