<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Pig Statement its taking a long time in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42719#M32330</link>
    <description>&lt;P&gt;I'll need to install notebook to use Spark and Python (there exists any tutorial to do that?). After that I think I will use your idea &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 08 Jul 2016 17:14:24 GMT</pubDate>
    <dc:creator>Stewart12586</dc:creator>
    <dc:date>2016-07-08T17:14:24Z</dc:date>
    <item>
      <title>Pig Statement its taking a long time</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42092#M32328</link>
      <description>&lt;P&gt;I hive 45 text files with 5 columns and I'm using Pig to add a new column to each file based on it filename.&lt;BR /&gt;&lt;BR /&gt;First question: I upload all the files into HDFS manually. Do you think is a better option upload a compress file?&lt;BR /&gt;&lt;BR /&gt;Second question: I put my code bellow. In your opinion it is the best way to add a new column to my files?&lt;BR /&gt;&lt;BR /&gt;I submit this code and it taking hours processing... All of my files are in Data directory...&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Data = LOAD '/user/data' using PigStorage(' ','-tagFile')&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;STORE DATA INTO '/user/data/Data_Transformation/SourceFiles' USING PigStorage(' ');&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Thanks!!!&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:26:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42092#M32328</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2022-09-16T10:26:05Z</dc:date>
    </item>
    <item>
      <title>Re: Pig Statement its taking a long time</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42598#M32329</link>
      <description>&lt;P&gt;If I may take a different approach on your problem I would use Spark to do the job. Load the data of each file into a separate Spark Data Frame add a new column with the desired value write everything back to HDFS preferably in a format such as Parquet and compressed with snappy.&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jul 2016 15:54:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42598#M32329</guid>
      <dc:creator>MVERVUURT</dc:creator>
      <dc:date>2016-07-04T15:54:49Z</dc:date>
    </item>
    <item>
      <title>Re: Pig Statement its taking a long time</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42719#M32330</link>
      <description>&lt;P&gt;I'll need to install notebook to use Spark and Python (there exists any tutorial to do that?). After that I think I will use your idea &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 08 Jul 2016 17:14:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42719#M32330</guid>
      <dc:creator>Stewart12586</dc:creator>
      <dc:date>2016-07-08T17:14:24Z</dc:date>
    </item>
    <item>
      <title>Re: Pig Statement its taking a long time</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42725#M32331</link>
      <description>&lt;P&gt;The easiest way I know to get Spark working with Ipython and the Jupyter Notebook is by setting the following two environment variables as described in the book "Learning Spark":&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;IPYTHON=1&lt;/P&gt;&lt;P&gt;IPYTHON_OPTS="notebook"&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Afterwards running ./bin/pyspark&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NB: it's possible to pass more Jupyter options using IPYTHON_OPTS; by googling a bit you'll find them.&lt;/P&gt;</description>
      <pubDate>Sat, 09 Jul 2016 00:04:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42725#M32331</guid>
      <dc:creator>MVERVUURT</dc:creator>
      <dc:date>2016-07-09T00:04:22Z</dc:date>
    </item>
    <item>
      <title>Re: Pig Statement its taking a long time</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42728#M32332</link>
      <description>&lt;P&gt;Dear Stewart,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here you can read about Spark notebooks:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="http://www.cloudera.com/documentation/enterprise/latest/topics/spark_ipython.html" target="_blank"&gt;http://www.cloudera.com/documentation/enterprise/latest/topics/spark_ipython.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;Gabor&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 09 Jul 2016 07:31:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Pig-Statement-its-taking-a-long-time/m-p/42728#M32332</guid>
      <dc:creator>roczei</dc:creator>
      <dc:date>2016-07-09T07:31:17Z</dc:date>
    </item>
  </channel>
</rss>

