Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3363 | 05-03-2017 05:13 PM | |
2796 | 05-02-2017 08:38 AM | |
3072 | 05-02-2017 08:13 AM | |
3003 | 04-10-2017 10:51 PM | |
1514 | 03-28-2017 02:27 AM |
02-13-2017
11:54 AM
1 Kudo
Yes you can and coincidentally I just wrote to articles about Python and Python3 https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html And https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html There are many resources on this site referencing work with Python, new kid on the block is Pyspark, which makes working in familiar Python API but with benefit of Spark, a fast in-memory analytics engine. Again take a look on our site https://community.hortonworks.com/topics/pyspark.html
... View more
02-12-2017
06:06 PM
@sudarshan kumar you can also just move it out of the method but leave it in the class https://github.com/dbist/URLCount/blob/master/src/main/java/com/hortonworks/mapreduce/URLCountR.java#L33
... View more
02-12-2017
05:52 PM
@sudarshan kumar
you should avoid creating a StringBuilder in every reducer especially with 80000 as argument. In case you need to use that, move it out into setup method and initialize it there, that will force object reuse. Instead you're creating one SB for each reducer with initial capacity of 80000. How much do you expect each row to be? Surely not 80000 characters? My suggestion again is to move SB out of reducer() and let Java figure out the capacity instead of you passing such a large size. It will resize as needed. StringBuilder sb = new StringBuilder(); You're also calling toString() method way too much, I recommend right after for loop final String valueStr = value.toString(); then reference that valueStr variable instead of calling toString() so much.
... View more
02-12-2017
03:45 PM
12 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, we're going to run a Pig script against Hive tables via HCatalog integration. You can find more information in the following HDP document http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/ch_data_movement_using_oozie.html#errata_pig_with_oozie First thing you need to do in WFM is create a new Pig action. Now you can start editing the properties of the action. Since we're going to run a Pig script, let's add that property into the wf. This is just saying that we're going to execute a script, you still need to add the <file> attribute to the wf. This expects a file in the workflow directory, we will need to upload a pig script later. Next, since we're going to run Pig against Hive, we need to provide thrift metastore information for it or include hive-site.xml file into the wf directory, since that usually changes, it's probably best to add the property as part of wf. You can find the property in the Ambari > Hive > Configs and search for hive.metastore.uris. Now in WFM, you add that into the configuration section of the Pig action. I also want to compress output coming from Mappers to improve performance for intermediate IO, I'm going to use property of Mapreduce called mapreduce.map.output.compress and set it to true At this point, I'd like to see how I'm doing and I will preview the workflow in XML form. You can find it under workflow action. This is also a good time to confirm your thrift URI and commonly forgotten property <script> and <file>. Now finally, let's add the script to the wf directory. Use your favorite editor and paste the code for Pig and save file as sample.pig set hcat.bin /usr/bin/hcat;
sql show tables;
A = LOAD 'data' USING org.apache.hive.hcatalog.pig.HCatLoader();
B = LIMIT A 1;
DUMP B; I have a Hive table called 'data' and that's what I'm going to load as part of Pig, I'm going to peek into the table and dump one relation to console. In the 2nd line of the script, I'm also executing a Hive "show tables;" command. I also recommend to execute this script manually to make sure it works, command for it is pig -x tez -f sample.pig –useHCatalog Once it executes, you can see the output on the console, for brevity, I will only show the output we're looking for 2017-02-12 14:29:52,403 [main] INFO org.apache.pig.tools.grunt.GruntParser - Going to run hcat command: show tables;
OK
data
wfd
2017-02-12 14:30:09,205 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(abc,xyz)
Notice the output of show tables and then (abc,xyz) that's the data I have in my 'data' table. Finally, upload the file to wf directory. Save the wf first in WFM to create directory or point wf to an existing directory with the script in it. hdfs dfs -put sample.pig oozie/pig-hcatalog/ We are finally ready to execute the wf. As the last step, we need to tell wf that we're going to use Pig with Hive and HCatalog and we need to add a property oozie.share.lib.for.pig=hive,pig,hcatalog. This property tells Oozie that we need to use more than just Pig libraries to execute the action. Let's check the status of the wf, click the Dashboard button. Luckily wf succeeded. Let's click on the job and go to flow graph tab. All nodes appear in green, means it succeeded but we already knew that. Navigate to action tab, we'll be able to drill to Resource Manager job history from that tab. Let's click the arrow facing up to continue to RM. Go through the logs in job history and in stdout log you can find the output, we're looking for output of show tables and output of dump command. Looks good to me. Thanks!
... View more
Labels:
02-11-2017
09:16 PM
1 Kudo
This isnot an answer your looking for but for your own knowledge, HBase does not like sequential row keys, you will cause what's called a hot-spotting issue, AKA monotonously increasing keys. Which essentially will direct all your rows to a single region server and cause bottlenecks. Avoid at all costs, row keys need to be complex (consist of different parts) and random in nature. All that said, you can probably generate these keys with an ExecuteScript processor or something else.
... View more
02-11-2017
08:35 PM
@Ram Ghase what if you run the command as this? sudo -u spark hdfs dfs -ls /tmp/data/ also make sure /tmp/data exists hdfs dfs -ls /tmp/
... View more
02-11-2017
08:27 PM
9 Kudos
This is the third in the series of articles on WFM. Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will import an existing Python3 workflow and modify it to work with WFM. You need to make sure that Python3 exists on every node where YARN nodemanager is installed and Oozie is allowed to execute on that node (in essence, you want to make sure there are no YARN queue limitations on nodes with Python3 installed). Installing Python3 is beyond the scope of this tutorial, I am using Centos 6 and I followed this tutorial http://ask.xmodulo.com/install-python3-centos.html Once Python3 is deployed across all of your nodemanagers, you can import the workflow in the same way I've shown before and configure the python3-node to your liking Nothing is out of the ordinary yet. We learned the mistake in our previous tutorial and let WFM assign Resource Manager property rather than using inherited $jobTracker. I am ready to submit, notice I left the inherited queue property in the wf and WFM prompts me to input it. On submission, we can navigate to the dashboard to track the status. My job succeeded, I want to look at the result. I'm going to click on the arrow on the right and navigate to the YARN job. Deep in the logs, I can find my desired output My python3 code by the way is below #! /usr/bin/env /usr/local/bin/python3
import os, pwd, sys
print("who am I? " + pwd.getpwuid(os.getuid())[0])
print("this is a Python script")
print("Python Interpreter Version: " + sys.version)
As usual, my repo has more samples you can play around with https://github.com/dbist/oozie
... View more
Labels:
02-11-2017
07:53 PM
13 Kudos
This is a second in the series of articles on WFM. Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, we're going to import an existing workflow with a Python script wrapped in a shell action. The existing workflows can exist on HDFS or in your local file system. Let's fetch it onto the canvas. My workflow is already on HDFS and therefore that's the option I select. WFM view is integrated with WEBHDFS browser and it makes navigating the directory tree very easy. Navigate to the directory in HDFS with the desired workflow and hit select. Once imported, WFM will run validation on the syntax and present it for further modification. Now you can modify the python-node by hovering over it and clicking the gear icon. Once clicked, you can configure the rest of the action to your liking. it inherits all of the old properties of your workflow. Notice you can specify a directory and script file in the File text box. My Oozie workflow also has old properties like specification of YARN Queue, WFM correctly parses and inherits that property. Also notice I have capture output as I'd like to see the result of the output to the console by my script. At this point, I'm ready to preview my workflow, WFM comes with a handy XML preview. Looks all right to me, I'm ready to submit. Notice WFM doesn't know what $jobTracker is and prompts me to fill that out along with queue. At this point we can navigate to the WFM Dashboard tab as you've seen in my previous tutorial and track the job status. My job failed, I can debug the job status directly from WFM Turns out, issue is with my parameter $jobTracker, in WFM, it was renamed to $resourceManager and it comes by default, I need to remove my custom parameter and let WFM do what it does best. Here's preview of my XML after the change Back in the dashboard, I can click on the job and investigate the status. My job completed successfully, I can navigate to the YARN job status straight from WFM. I need to click on my succeeded wf and click on the arrow icon. It is right there on the right, same row asthe python-node Finally, navigate to the logs of your YARN job to view the output And that's all for this tutorial, you learned how to import an existing Python Oozie workflow and further edit it via WFM. My Python script by the way has the following code #! /usr/bin/env python
import os, pwd, sysprint
"who am I? " + pwd.getpwuid(os.getuid())[0]
print "this is a Python script"
print "Python Interpreter Version: " + sys.version You can find my workflow along with other samples on my github page https://github.com/dbist/oozie Stay tuned!
... View more
Labels:
02-11-2017
06:51 PM
19 Kudos
Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html Apache Ambari 2.5 will have a slew of new views that were not available before. One of them is called Workflow Manager. It is a new experience to work with Apache Oozie workflows. It is meant to be easy to use and help users in building powerful workflows without ever touching XML or dated Oozie UI. Let's get Started! We will begin with authoring a simple Shell Action; first thing we need to do is login to Ambari views instance with a user that has permissions working with Workflow Manager. In my case user is centos. Once logged in, you will be greeted with selection of views that your particular user has permissions to: We're going to select WFD, don't worry about the acronym name, that's configurable by your Ambari admin and can be named anything you prefer. Also, going forward, for brevity, I will refer to Workflow Manager as WFM. Once in WFM view, you're greeted with a canvas that you'll be using for the rest of the tutorial. We're going to create a new workflow by selecting it in the options menu. Notice there are options to create coordinator and bundle types of workflows as well. Since it's a shell action, we're going to rename the workflow to shell-action. You can see in the picture above that any action in WFM has a beginning and end. If you hover over the arrow connecting two nodes, you will notice a plus sign, this is what you're going to use to add new actions. List of all built-in actions is visible once you click the plus sign. Select shell action as that's what we're going to use in this scenario. Once clicked, you can edit the shell action node to rename it or configure it for your specific needs. Click the gear icon to modify the configuration. I'm going to run a simple echo command and capture its output as this is meant to be a simple example. You can also enter name of a script file you want to execute, I'm going to demonstrate it in a future tutorial. Then, you can scroll down into the advanced properties section to modify parameters and arguments to the command. Notice my argument is just a statement I want to echo on the console. Since I also want to capture the output of the command for further processing or just for visual perspective, in the bottom of the dialog, I will check the capture output box. At this point my workflow is ready and I can submit it. Click on that next, you will see a few more options. You need to specify the location of the workflow, path must exist on hdfs but not the xml, if xml exists, you need to overwrite it. Then you can select run on submit. Another option in the dialog is to validate your workflow. Clicking that will tell you whether your wf is valid or not. Finally, let's submit the wf. Notice anything familiar? WFM uses Oozie REST API to work with Oozie and doesn't introduce anything new to the formula except for an awesome UI, when you submit a wf, as a result you will get an Ooozie JOB ID. So what do you do next, go to Oozie UI right? No, WFM got you covered there too, it has a dashboard tab that loads the whole Oozie wf job history as well as anything current. You can now monitor your job status in this UI! Once job completes, you can navigate to the output and view the result of the wf. The output will be in the STDOUT logs of the associated YARN job.
... View more
Labels: