Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3362 | 05-03-2017 05:13 PM | |
2791 | 05-02-2017 08:38 AM | |
3067 | 05-02-2017 08:13 AM | |
3002 | 04-10-2017 10:51 PM | |
1510 | 03-28-2017 02:27 AM |
08-29-2017
03:10 PM
Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html I get a lot of questions about doing distcp and figured I'd write yet another article in the series on WFM. There's a common assumption that FS action should be able to do a copy within a cluster. Unfortunately it's not obvious that you can leverage distcp action to do a copy within a cluster instead. The reason behind FS action missing copy functionality is that copy is not meant to be distributed and will DOS your Oozie server until the action completes. What you need to do is use distcp action as it's meant to do distributed operations and it being decoupled from Oozie launcher will complete w/out DOS. The functionality is the same even with naming convention being a bit off. We're going to start with adding a new workflow and naming it distcp-wf. Now we're going to add distcp node to the flow. I prefer to name the nodes something other than default so I'll name it distcp_example and hit the gear button to configure it. Now in distcp arguments field, I'm going to use Oozie XML variable replacement to add the full HDFS path of the source and target, which happen to be in the same cluster. They could might as well be two separate clusters. Now if you're familiar with how Oozie and Mapreduce works, you're quickly going to realize that this workflow will only run once and fail second time around. The reason is that my destination never changes and if output exists, you're going to get a failure on the next run. For that, we're going to add a prepare action to delete destination file/directory. Copy the second argument to clipboard. Paste it into advanced properties and change mkdir drop-down to delete. We're almost ready to submit our workflow; I first have to create an HDFS directory (distcp-wf) that will contain my distcp workflow and file I'd like copied. hdfs dfs -mkdir distcp-wf
hdfs dfs -touchz file
hdfs dfs -ls
Found 4 items
drwx------ - centos hdfs 0 2017-08-29 14:35 .Trash
drwx------ - centos hdfs 0 2017-08-29 14:33 .staging
drwxr-xr-x - centos hdfs 0 2017-08-29 14:35 distcp-wf
-rw-r--r-- 3 centos hdfs 10 2017-08-29 01:26 file Now I'm ready to save and submit my workflow, enter the HDFS path of the workflow directory you just created notice the job properties have the fully-expanded nameNode and resourceManager addresses, that's what is being used for variable substitution. Now I am going to submit the job and and use filtering in the dashboard for the name of the workflow. Now let's switch back to the distcp action as I'd like to demonstrate a few other things about distcp that you can leverage. If you refer to distcp user guide you notice that there are many arguments we didn't cover like -append, -update etc. What if you would like to use them in your distcp? Well WFM has got you covered, the eagle-eyed users would see the tool-tip the first time we tried to configure distcp action node and see that you can pass the arguments in the same field as source and destination. So in addition to the two arguments, I'm going to add -update and -skipcrccheck in front of the existing ones. My workflow XML should now look like so So when I execute with new arguments, everything should still be green. On a side note, our documentation team has done a phenomenal job adding resources to our WFM section. I encourage everyone interested in WFM to review. The caveats with distcp is that in some cases you cannot do distcp via Oozie from secure to insecure and vice versa. There are parameters you have to specify to make it work in some cases but overall it is not supported in heterogeneous clusters. Other issues crop up when you distcp from HA enabled clusters. You have to specify the nameservices for both clusters. Please leverage HCC to find resources how to get that working. Hope this was useful!
... View more
Labels:
05-03-2019
03:35 PM
Hi, Am looking for the same version and i am facing same problem like you ve done. Do you know where can i find this ? brs,
... View more
01-04-2018
05:18 PM
make sure that this error haven't caused by tables that Hive Create in your MySql database: check out if there is something looks like this error: Error: Index column size too large. The maximum column size is 767 bytes. (state=HY000,code=1709)
or just:
The maximum column size is 767 bytes
... View more
03-17-2017
12:50 AM
Hello Artem,
thanks, adding an interpreter line worked. I don't know how could I forget
that...? I think, i'm doing lot of multi tasking. Also I don't have
python 3 installed so I was running on python 2. Once again, thank you for
quick response. Really appreciate it. Sam
... View more
03-11-2017
02:57 AM
That's good to know! Many restrictions with Oozie...
... View more
03-10-2017
02:21 PM
I got rid off SPNEGO on this cluster and set oozie.authentication.type=simple
as I'm accessing from Mac, I don't need SPNEGO. I'm able to access Oozie UI now.
... View more
03-09-2017
09:51 PM
I'll post this as separate question
... View more
03-08-2017
10:39 AM
1 Kudo
You can pass an escaped by clause Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\')
Escaping is needed if you want to work with data that can contain these delimiter characters. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;
Default properties for SerDe is Comma-Separated (CSV) file
DEFAULT_ESCAPE_CHARACTER \
... View more
03-08-2017
09:46 PM
You can upgrade Ambari to 2.4.2 and then follow the standard upgrade process to upgrade HDP from 2.3 to 2.5.3.
... View more