<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Is Python Script better or Hive UDF? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108781#M71634</link>
    <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1479/michaellyoung.html" nodeid="1479"&gt;@Michael Young&lt;/A&gt; Due to complexity going for Python would be better than Java. Thank you  for the suggestion.&lt;/P&gt;</description>
    <pubDate>Sun, 03 Jul 2016 14:30:28 GMT</pubDate>
    <dc:creator>vijaysinghparma</dc:creator>
    <dc:date>2016-07-03T14:30:28Z</dc:date>
    <item>
      <title>Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108775#M71628</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have a job which needs me to pull JSON file from a Hive table. After calling the file, there are business logic(calculations) which needs to be done on the file. Once the process is done the result needs to be captured in a JSON fle and store it back in Hive table. After processing (in the code) for every ID taken in there will 100 -to- 5000 records generated. Which needs to be taken in JSON File and inserted back in Hive.To accomplish the above task will writing a Python script be beneficial or a Hive UDF(Java code)?
Business wants it to be done in Hive. Any help or suggestion is highly appreciated.&lt;/P&gt;</description>
      <pubDate>Sat, 02 Jul 2016 00:27:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108775#M71628</guid>
      <dc:creator>vijaysinghparma</dc:creator>
      <dc:date>2016-07-02T00:27:41Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108776#M71629</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/11083/vijaysinghparmar.html" nodeid="11083"&gt;@Vijay Parmar&lt;/A&gt;&lt;P&gt;If I was solving the problem I would look at using pig for the job.&lt;/P&gt;&lt;P&gt;Use HCatLoader to load the data from hive table. Do all sorts of operation; ideally complex:)&lt;/P&gt;&lt;P&gt;Then store it back to hive using HCatStorer.&lt;/P&gt;&lt;P&gt;Look at : &lt;A href="https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-HCatLoader"&gt;https://cwiki.apache.org/confluence/display/Hive/HCatalog+LoadStore#HCatalogLoadStore-HCatLoader&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Why Pig: 3 main reasons:&lt;/P&gt;&lt;P&gt;1. Very easy to program and easy to maintain for the same reason.&lt;/P&gt;&lt;P&gt;2. Optimized code execution. This is my personal favorite. What it means is pig will execute even a badly written series of steps (Think of doing duplicate operations, unnecessary variable allocation etc)  in a very optimized way.&lt;/P&gt;&lt;P&gt;3. You can go as complex as you want by using PiggyBank custom functions and also write your own udf.&lt;/P&gt;&lt;P&gt;Am not saying hive or python will not do the job but the software called Pig is a specialist in this kind of situations.&lt;/P&gt;&lt;P&gt;But do remember I mentioned all this since you asked about writing udfs which made me assume that this has a fair bit of complexity. If the transformation is simple means you can somehow fit it in a single hive query I would close my eyes and use that.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sat, 02 Jul 2016 01:36:32 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108776#M71629</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-07-02T01:36:32Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108777#M71630</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902"&gt;@rbiswas&lt;/A&gt; Thank you for detailing out the things. Yes, you are correct there is lot of complexity involved in this. As the JSON itself is in a very complex format. After processing (in the code) for every ID taken in there will 100 -to- 5000 records generated. Which needs to be taken/ captured in JSON File and inserted back in Hive.The situation is that, I have to choose either from Python or Hive. So, out of these 2 using which one will be more helpful in terms of performance and complexity? &lt;/P&gt;</description>
      <pubDate>Sat, 02 Jul 2016 01:44:15 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108777#M71630</guid>
      <dc:creator>vijaysinghparma</dc:creator>
      <dc:date>2016-07-02T01:44:15Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108778#M71631</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/11083/vijaysinghparmar.html" nodeid="11083"&gt;@Vijay Parmar&lt;/A&gt;&lt;/P&gt;&lt;P&gt;First try to fit the transformation in one hive query by using the common functions. If that is not possible or becomes very complicated,&lt;/P&gt;&lt;P&gt;go with hive udf since it will be better in terms of reusability.  Now you can write the udf either in python or java.&lt;/P&gt;&lt;P&gt;It is very difficult to comment on which one would be faster since it would depend on the implementation.&lt;/P&gt;&lt;P&gt;Go with the language you are more comfortable with. &lt;/P&gt;&lt;P&gt;Here is an example of a python udf: &lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/Azure/azure-content/blob/master/articles/hdinsight/hdinsight-python.md"&gt;https://github.com/Azure/azure-content/blob/master/articles/hdinsight/hdinsight-python.md&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Sat, 02 Jul 2016 01:52:24 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108778#M71631</guid>
      <dc:creator>rbiswas1</dc:creator>
      <dc:date>2016-07-02T01:52:24Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108779#M71632</link>
      <description>&lt;P&gt;You can always write the Hive UDF in Python. A Java UDF may yield better performance overall, but I prefer Python UDFs for the ease of development and maintainence.&lt;/P&gt;</description>
      <pubDate>Sun, 03 Jul 2016 04:43:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108779#M71632</guid>
      <dc:creator>myoung</dc:creator>
      <dc:date>2016-07-03T04:43:08Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108780#M71633</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/3902/rbiswas.html" nodeid="3902"&gt;@rbiswas&lt;/A&gt; Thank you. As it involves lot of complexity and the only best solution as of now is to write UDF.  &lt;/P&gt;</description>
      <pubDate>Sun, 03 Jul 2016 14:24:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108780#M71633</guid>
      <dc:creator>vijaysinghparma</dc:creator>
      <dc:date>2016-07-03T14:24:11Z</dc:date>
    </item>
    <item>
      <title>Re: Is Python Script better or Hive UDF?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108781#M71634</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/1479/michaellyoung.html" nodeid="1479"&gt;@Michael Young&lt;/A&gt; Due to complexity going for Python would be better than Java. Thank you  for the suggestion.&lt;/P&gt;</description>
      <pubDate>Sun, 03 Jul 2016 14:30:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Is-Python-Script-better-or-Hive-UDF/m-p/108781#M71634</guid>
      <dc:creator>vijaysinghparma</dc:creator>
      <dc:date>2016-07-03T14:30:28Z</dc:date>
    </item>
  </channel>
</rss>

