<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Transformation with Pig in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137978#M27721</link>
    <description>&lt;P&gt;Well, this goes now into bash programming: The part between the lines "cat &amp;lt;&amp;lt; EOF" and "EOF" is a so called "here doc" that writes the actual pig script. Everything starting with $ is a variable ($0,$1,... are predefined in bash with $0 containing the script/program name, $1 the first actual parameter, $2 the second and so on; $@ gives back all provided parameters joined with a space ' ' by default). Note: Setting variables (e.g. FUN, TAB) is done without $, referencing with $&lt;/P&gt;&lt;P&gt;So you can add any logic before "cat &amp;lt;&amp;lt; EOF" to set variables leveraging your input parameters and reference them in the "here doc" to get the pig script you want.&lt;/P&gt;&lt;P&gt;For more see e.g. &lt;A href="http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html" target="_blank"&gt;http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html&lt;/A&gt; and &lt;A href="http://www.tldp.org/LDP/abs/html/index.html" target="_blank"&gt;http://www.tldp.org/LDP/abs/html/index.html&lt;/A&gt; (and at many other locations). &lt;/P&gt;</description>
    <pubDate>Mon, 09 May 2016 23:36:47 GMT</pubDate>
    <dc:creator>bwalter1</dc:creator>
    <dc:date>2016-05-09T23:36:47Z</dc:date>
    <item>
      <title>Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137973#M27716</link>
      <description>&lt;P&gt;Hi, I imported the MS SQL Server data.
Currently I struggle to effect change (sum, average, ...). Because I will wish to create a method based on parameter: the name of the db, the column or columns to convert and transform operator. Currently I use Pig but that does not make me do.&lt;/P&gt;&lt;P&gt;pig code:&lt;/P&gt;&lt;P&gt;grunt &amp;gt; donneees = LOAD '/xxx/xxx/xxx/'$input_dbb'/'$input_table'' USING ParquetLoader; &lt;/P&gt;&lt;P&gt;grunt &amp;gt; Blg_avg = FOREACH donneees GENERATE '$col1' , '$col2';&lt;/P&gt;&lt;P&gt;grunt &amp;gt; SET parquet.compression gzip; &lt;/P&gt;&lt;P&gt;
grunt &amp;gt; STORE Blg_avg INTO '/xxx/xxx/xxx/xxx/$input_table' USING ParqueStorer; &lt;/P&gt;&lt;P&gt;grunt &amp;gt; DUMP Blg_moyenne;&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 16:15:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137973#M27716</guid>
      <dc:creator>nanyim_alain</dc:creator>
      <dc:date>2016-05-09T16:15:35Z</dc:date>
    </item>
    <item>
      <title>Re: Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137974#M27717</link>
      <description>&lt;P&gt;So, what's wrong? Do ParquetLoader and Storer work? &lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 17:13:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137974#M27717</guid>
      <dc:creator>pminovic</dc:creator>
      <dc:date>2016-05-09T17:13:33Z</dc:date>
    </item>
    <item>
      <title>Re: Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137975#M27718</link>
      <description>&lt;P&gt;The difficulty is that I would like to ls generic function (AVG, SUM, ...) as a parameter so has running with oozie, the user can specify the operation (or function AVG, SUM, .. .) to be performed on the column they will also be specified as a parameter.&lt;/P&gt;&lt;P&gt;Basically what I want is to implement it is a function of tansformation that will be called each time the user will need. This function will take the parameter table that will undergo processing, columns and transformed the operator (AVG, SUM, AVG, ...)&lt;/P&gt;&lt;P&gt;Thank you !&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 19:40:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137975#M27718</guid>
      <dc:creator>nanyim_alain</dc:creator>
      <dc:date>2016-05-09T19:40:25Z</dc:date>
    </item>
    <item>
      <title>Re: Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137976#M27719</link>
      <description>&lt;P&gt;Have you considered wrapping it into a shell script? Here is a simple example (test.sh)&lt;/P&gt;&lt;PRE&gt;#!/bin/bash

TMPFILE="/tmp/script.pig"

FUN=$1 # pass the pig function as first parameter
TAB=$2 # pass the column as second parameter

cat &amp;lt;&amp;lt;EOF &amp;gt; "$TMPFILE"
iris = load '/tmp/iris.data' using PigStorage(',') 
       as (sl:double, sw:double, pl:double, pw:double, species:chararray);
by_species = group iris by species;
result = foreach by_species generate group as species, $FUN(iris.$TAB);
dump result;
EOF

pig -x tez "$TMPFILE"&lt;/PRE&gt;&lt;P&gt;You can call it e.g. as &lt;/P&gt;&lt;PRE&gt;bash ./test.sh MAX sw&lt;/PRE&gt;&lt;P&gt;to get the maximum of column "sw", or&lt;/P&gt;&lt;PRE&gt;bash ./test.sh AVG sl&lt;/PRE&gt;&lt;P&gt;to get the average of column "sl"&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 21:02:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137976#M27719</guid>
      <dc:creator>bwalter1</dc:creator>
      <dc:date>2016-05-09T21:02:47Z</dc:date>
    </item>
    <item>
      <title>Re: Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137977#M27720</link>
      <description>&lt;P&gt;Thank you &lt;A href="https://community.hortonworks.com/users/452/bwalter.html"&gt;Bernhard Walter&lt;/A&gt;,  for your suggestions. &lt;/P&gt;&lt;P&gt;But I have some more for you please: &lt;/P&gt;&lt;P&gt;1. Given that I should have the ability to upload multiple files one after another, the schemes will be totally different. How can I manage case columns with the 'as'? &lt;/P&gt;&lt;P&gt;2. Can I replace TAB = $ 2 by TAB = $* in order to spend several column parameter?&lt;/P&gt;&lt;P&gt;Thank you !&lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 23:05:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137977#M27720</guid>
      <dc:creator>nanyim_alain</dc:creator>
      <dc:date>2016-05-09T23:05:22Z</dc:date>
    </item>
    <item>
      <title>Re: Transformation with Pig</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137978#M27721</link>
      <description>&lt;P&gt;Well, this goes now into bash programming: The part between the lines "cat &amp;lt;&amp;lt; EOF" and "EOF" is a so called "here doc" that writes the actual pig script. Everything starting with $ is a variable ($0,$1,... are predefined in bash with $0 containing the script/program name, $1 the first actual parameter, $2 the second and so on; $@ gives back all provided parameters joined with a space ' ' by default). Note: Setting variables (e.g. FUN, TAB) is done without $, referencing with $&lt;/P&gt;&lt;P&gt;So you can add any logic before "cat &amp;lt;&amp;lt; EOF" to set variables leveraging your input parameters and reference them in the "here doc" to get the pig script you want.&lt;/P&gt;&lt;P&gt;For more see e.g. &lt;A href="http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html" target="_blank"&gt;http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html&lt;/A&gt; and &lt;A href="http://www.tldp.org/LDP/abs/html/index.html" target="_blank"&gt;http://www.tldp.org/LDP/abs/html/index.html&lt;/A&gt; (and at many other locations). &lt;/P&gt;</description>
      <pubDate>Mon, 09 May 2016 23:36:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Transformation-with-Pig/m-p/137978#M27721</guid>
      <dc:creator>bwalter1</dc:creator>
      <dc:date>2016-05-09T23:36:47Z</dc:date>
    </item>
  </channel>
</rss>

