Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3362 | 05-03-2017 05:13 PM | |
2792 | 05-02-2017 08:38 AM | |
3068 | 05-02-2017 08:13 AM | |
3002 | 04-10-2017 10:51 PM | |
1511 | 03-28-2017 02:27 AM |
02-23-2017
08:57 PM
6 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will cover how to create Oozie coordinators via Workflow Manager View. I'm also going to leverage Publish/Import Asset functionality demonstrated in my last tutorial only now using local database rather than HDFS. We're going to publish the two actions nodes from part 9 (shell action doing echo and email action) using the next to last icon when you click on the action node. So just like I did in part 9 to publish an asset to HDFS, same steps except there's no HDFS path. You're greeted with dialog to give asset a name and description. I already published shell action to assets database in the same manner. I'd like to glance at what assets I have in my local Ambari Views instance. To do that, I'm going to click on manage assets button in the top right hand corner. You're going to see a list of any saved assets so far. In the same manner, you can delete assets by hitting red trash icon to the right of them. Asset manager also allows you to search through all saved assets. Keep in mind that local asset database is exactly that, local, it is not being shared across instances of Ambai Views nodes. For that, please use publish/import from HDFS functionality just like in part 9. We're ready to tie it all together, we're going to create a new workflow, name it part-10, then begin to add new nodes, though now instead of adding pre-existing nodes, we're going to click on import asset You'll get a pop-up to select an asset from asset manager. Click on it and hit import. Now we're ready to import 2nd asset Select the asset and import it. Your workflow should look like so I gave the action nodes more meaningful names. We pretty much built this wf from the wf in part 9 using publish/import assets. We can now submit the job. (The path mistakenly points to /user/centos/email, I then submitted this workflow and saved path to /user/centos/part-10). Great, now we know it works, we're ready to create a coordinator workflow. On the WFM page, in the right top hand corner, find create button and select coordinator. You'll be prompted to fill out the details. This beats working with XML as all I need to do is fill out 5 fields and I have a working coordinator for an existing workflow, by clicking the button next to the browse, you get an option to create a brand new workflow, since we already have one, we're going to enter the path of the existing one. I'm ready to submit the coordinator. I prefer to save coordinator and workflow in the same directory, though my screenshots do not show that, I chose /user/centos/part-10 as the HDFS path for both workflow and coordinator in my recent work. This is what my directory looks like hdfs dfs -ls part-10
Found 2 items
-rw-r--r-- 3 centos hdfs 364 2017-02-23 17:58 part-10/coordinator.xml
-rw-r--r-- 3 centos hdfs 971 2017-02-23 17:09 part-10/workflow.xml Let's preview coordinator XML. Ignore the app-path in my XML, I have two one in /user/centos/email and another one in /user/centos/part-10. I grabbed the wrong screenshot :). Let's look at our coordinator running and this will allow me to demonstrate some more cool features of WFM like chaining of search tags. Let's click on dashboard button and see our workflow in action, notice the clock icon to the left of it to identify it as part of a coordinator. This still shows workflows, if you click on the left of the page where workflow drop down is and select coordinator instead, you can see only coordinators. It makes it easy to filter out coordinators from workflows and as you see soon bundles by toggling the drop down to select the type of job you're looking for. Here's an example of what a coordinator triggered every 5min will do to your dashboard. Another cool feature in the dashboard is multi-tag search. In Oozie UI, you can click on name and it will sort ASC/DESC, here we can filter out using pre-defined tags instead to narrow down output to what's relevant. Notice I added name filter. What if I also want to filter by status:SUCCEEDED and not just name of wf? I can also add more tags, now I want to also filter out only running workflows. Now I want to also filter by user name The other available options are below Finally, since my coordinator is configured to execute a workflow called part-10 every 5min, I'm getting a lot of emails every time it runs and succeeds. I want to kill the coordinator. I can do it directly from the dashboard. To the right of the running coordinator, I have an option to kill highlighted in red, click that. once clicked, coordinator goes into Killed stated.
... View more
Labels:
04-28-2019
02:56 PM
1 Kudo
We can use rank approach which is faster than max , max scans the table twice: Here , partition column is load_date: select ld_dt.txnno , ld_dt.txndate , ld_dt.custno , ld_dt.amount , ld_dt.productno , ld_dt.spendby , ld_dt.load_date from (select *,dense_rank() over (order by load_date desc) dt_rnk from datastore_s2.transactions)ld_dt where ld_dt.dt_rnk=1
... View more
02-24-2017
12:05 AM
1 Kudo
Great to hear you resolved the issue
... View more
02-23-2017
01:32 PM
Hi Ayub, thanks a lot for your help. It worked 🙂
... View more
02-23-2017
02:29 PM
@Anders Boje yes that is a Workflow Manager view but an old version which is not mean to be production ready. Workflow Manager View AKA Oozie View will be available in Ambari 2.5. Ambari 2.5 is not released yet, it will be released in a month or two. As you noted, there are bugs and we are still working through fixing some issues so it is stable. My tutorials do cover a lot of WFM but the ideas can be applied to working with Oozie XML. You should just wait for Ambari 2.5 release if this is more what you want to do.
... View more
02-23-2017
12:31 AM
you can create an email alias with many accounts associated to it.
... View more
02-18-2017
07:36 PM
5 Kudos
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html Part 12: https://community.hortonworks.com/articles/131389/apache-ambari-workflow-manager-view-for-apache-ooz-7.html In this tutorial, I will walk you through creating a Sqoop action using WFM on HDP 2.5+. First we need a table, we're going to use MySQL as source database and table. create table imported (rowkey int, value varchar(25));
insert into imported (rowkey, value) values (1, "john doe");
insert into imported (rowkey, value) values (2, "jane doe");
I want to make sure that all cluster nodes can access this table and going to grant access to user centos on the LAN, you may have different restrictions on the network and by all means consult your DBAs. GRANT ALL PRIVILEGES ON *.* TO 'centos'@'172.22.65.%'
IDENTIFIED BY 'password'
WITH GRANT OPTION;
FLUSH PRIVILEGES;
GRANT ALL PRIVILEGES ON *.* TO 'centos'@'localhost'
IDENTIFIED BY 'password'
WITH GRANT OPTION;
FLUSH PRIVILEGES;
I want to make sure user centos can access the table mysql –u centos –p
➢ password
mysql> select * from test.imported;
+--------+----------+
| rowkey | value |
+--------+----------+
| 1 | john doe |
| 2 | jane doe |
+--------+----------+
Finally, I'd like to test my sqoop works sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password password
17/02/18 15:13:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
imported
Also, in case of Oozie with HCatalog and Sqoop, every node that will execute job attempts must have HCat and Sqoop client installed. I want to save a password in a file so that I could access it w/out a prompt and not in clear text. echo -n "password" > .password
hdfs dfs -put .password /user/$USER/
hdfs dfs -chmod 400 /user/$USER/.password
rm .password
[centos@source-1 ~]$ hdfs dfs -ls
Found 1 items
-r-------- 3 centos hdfs 8 2017-02-18 15:13 .password
[centos@source-1 ~]$ hdfs dfs -cat .password
password[centos@source-1 ~]$
Let's run the list command again referencing the file instead of --password argument sqoop list-tables --connect jdbc:mysql://source-1/test --username centos --password-file /user/centos/.password
17/02/18 15:14:43 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
imported
You can find more details in our comprehensive documentation on data movement http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/sqoop_hcatalog_integration.html Also, make sure mysql-connector-java is at an appropriate version. RHEL 6 bundles version 5.1.17 which does not work in later versions of HDP, we bundle 5.1.37 in HDP-UTILS and the only way to active it is to run the following yum downgrade mysql-connector-java then in your /usr/share/java directory you should be able to see correct connectors lrwxrwxrwx. 1 root root 31 Feb 18 15:29 jdbc-mysql.jar -> mysql-connector-java-5.1.37.jar
lrwxrwxrwx. 1 root root 31 Feb 18 15:29 mysql-connector-java.jar -> mysql-connector-java-5.1.37.jar
You have a choice to update the Oozie sharelib with this connector or bundle it as part of workflow lib. I'm going to do the latter for time's sake. Before I start authoring a workflow, I'd like to confirm my sqoop import works, I will execute it on the command line first sqoop import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/$USER/.password --table imported --hcatalog-table imported --create-hcatalog-table --hcatalog-storage-stanza "STORED AS ORCFILE" --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey
I'm choosing HCatalog import as it is more efficient than --hive-import, in case of latter, it needs to make one extra step of moving imported data from staging directory to hive and spawning an extra container. With --hcatalog-table everything happens in one shot. Also, benefit here is that you can create an ORC table from the command line instead of going into Hive and altering a table to set it to ORC. Let's see what we got as a result Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=213
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=837
CPU time spent (ms)=8890
Physical memory (bytes) snapshot=718036992
Virtual memory (bytes) snapshot=9154256896
Total committed heap usage (bytes)=535298048
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Transferred 628 bytes in 70.6267 seconds (8.8918 bytes/sec)
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Retrieved 2 records.
17/02/18 15:32:05 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
Let's see what it looks like in Hive [centos@source-1 ~]$ beeline
Beeline version 1.2.1000.2.6.0.0-493 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 "" ""
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 1.2.1000.2.6.0.0-493)
Driver: Hive JDBC (version 1.2.1000.2.6.0.0-493)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> select * from default.imported;
+------------------+-----------------+--+
| imported.rowkey | imported.value |
+------------------+-----------------+--+
| 1 | john doe |
| 2 | jane doe |
+------------------+-----------------+--+
2 rows selected (6.414 seconds)
Let's truncate the table in order to prepare for Oozie imports and additionally describe the table to demonstrate it is in fact in ORC 0: jdbc:hive2://localhost:10000> truncate table default.imported;
No rows affected (0.4 seconds)
0: jdbc:hive2://localhost:10000> describe formatted imported;
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
I'm ready to start working on a workflow, let's import sqoop action and save workflow to create a directory for it. I want to make sure I have a valid directory so I could upload a few files that are necessary for this to complete successfully. hdfs dfs -mkdir /user/centos/sqoop/lib
hdfs dfs -put /usr/share/java/mysql-connector-java-5.1.37.jar /user/centos/sqoop/lib/
hdfs dfs -put /etc/hive/conf/hive-site.xml /user/centos/sqoop/lib/
hdfs dfs -put /etc/tez/conf/tez-site.xml /user/centos/sqoop/lib/
I'm going to use my own MySQL driver than the one in the sharelib and therefore I'm uploading it to my wf, again if you update the sharelib with associated jar you don't have to do that. Secondly, I'm going to include hive-site.xml and tez-site.xml. Until 2.5, you only needed hive-site.xml but now we also need tez-site.xml. It is a small fact that will save you a lot of hours of debugging, trust me I know. Your wf lib directory should look like so hdfs dfs -ls /user/centos/sqoop/lib/
Found 3 items
-rw-r--r-- 3 centos hdfs 19228 2017-02-18 15:38 /user/centos/sqoop/lib/hive-site.xml
-rw-r--r-- 3 centos hdfs 977873 2017-02-18 15:37 /user/centos/sqoop/lib/mysql-connector-java-5.1.37.jar
-rw-r--r-- 3 centos hdfs 6737 2017-02-18 15:38 /user/centos/sqoop/lib/tez-site.xml
Finally, I want to modify my sqoop command as I no longer need --create-hcatalog-table command and want to replace $USER argument with my username, you can also use Oozie EL functions for string replacement. import --connect jdbc:mysql://172.22.65.123/test --username centos --password-file /user/centos/.password --table imported --hcatalog-table imported --hcatalog-home /usr/hdp/current/hive-webhcat --map-column-hive value=STRING --split-by rowkey That's what my command will look like in Oozie, notice missing "sqoop" command, it's inferred when you select sqoop action in WFM. Edit the sqoop action on WFM canvas and enter the command in. We are working on refreshing the UI before WFM is released and your dialog box may look slightly different but fields should remain the same. Let's tell WFM that we also expect tez-site.xml and hive-site.xml files Finally, we need to tell Oozie that we will pull in HCatalog and Hive jars for this to work At this point my wf is finished, let's inspect the XML. When you submit the job, it should succeed and you can look at the results. Again, this is more efficient and actually works on HDP 2.5+, I highly recommend checking out WFM and hcatalog options in Sqoop.
... View more
Labels:
05-23-2017
11:10 AM
The docs say "MariaDB 10" but RHEL7 comes with "MariaDB 5".
... View more
02-27-2017
08:02 AM
I was able to import all tables in the following format: sqoop import -connect jdbc:oracle:thin:@<fdqn>/<server> -username <username> -P -table CUST_NAV -columns "<column names separated by commas" -hive-import -hive-table databasenameinhive.New_CUST_NAV -target-dir 'location in hdfs' @bpreachuk I understood the workaround in my problem using your suggestion. I'll import all tables as is from the oracle db and create different views which i can then use in my select statements. Thanks guys.
... View more
10-05-2017
01:50 PM
It doesn't work. I tried many times but the results were same: KILLED. I got following error: Launcher exception: org/apache/spark/deploy/SparkSubmit java.lang.NoClassDefFoundError: org/apache/spark/deploy/SparkSubmit
... View more