Member since 
    
	
		
		
		10-01-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                3933
            
            
                Posts
            
        
                1150
            
            
                Kudos Received
            
        
                374
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3479 | 05-03-2017 05:13 PM | |
| 2857 | 05-02-2017 08:38 AM | |
| 3123 | 05-02-2017 08:13 AM | |
| 3085 | 04-10-2017 10:51 PM | |
| 1573 | 03-28-2017 02:27 AM | 
			
    
	
		
		
		08-29-2017
	
		
		03:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Part 1: https://community.hortonworks.com/articles/82964/getting-started-with-apache-ambari-workflow-design.html  Part 2: https://community.hortonworks.com/articles/82967/apache-ambari-workflow-designer-view-for-apache-oo.html  Part 3: https://community.hortonworks.com/articles/82988/apache-ambari-workflow-designer-view-for-apache-oo-1.html  Part 4: https://community.hortonworks.com/articles/83051/apache-ambari-workflow-designer-view-for-apache-oo-2.html    Part 5: https://community.hortonworks.com/articles/83361/apache-ambari-workflow-manager-view-for-apache-ooz.html    Part 6: https://community.hortonworks.com/articles/83787/apache-ambari-workflow-manager-view-for-apache-ooz-1.html  Part 7: https://community.hortonworks.com/articles/84071/apache-ambari-workflow-manager-view-for-apache-ooz-2.html  Part 8: https://community.hortonworks.com/articles/84394/apache-ambari-workflow-manager-view-for-apache-ooz-3.html  Part 9: https://community.hortonworks.com/articles/85091/apache-ambari-workflow-manager-view-for-apache-ooz-4.html  Part 10: https://community.hortonworks.com/articles/85354/apache-ambari-workflow-manager-view-for-apache-ooz-5.html  Part 11: https://community.hortonworks.com/articles/85361/apache-ambari-workflow-manager-view-for-apache-ooz-6.html  I get a lot of questions about doing distcp and figured I'd write yet another article in the series on WFM. There's a common assumption that FS action should be able to do a copy within a cluster. Unfortunately it's not obvious that you can leverage distcp action to do a copy within a cluster instead. The reason behind FS action missing copy functionality is that copy is not meant to be distributed and will DOS your Oozie server until the action completes. What you need to do is use distcp action as it's meant to do distributed operations and it being decoupled from Oozie launcher will complete w/out DOS. The functionality is the same even with naming convention being a bit off.  We're going to start with adding a new workflow and naming it distcp-wf.      Now we're going to add distcp node to the flow.      I prefer to name the nodes something other than default so I'll name it distcp_example and hit the gear button to configure it.      Now in distcp arguments field, I'm going to use Oozie XML variable replacement to add the full HDFS path of the source and target, which happen to be in the same cluster. They could might as well be two separate clusters.      Now if you're familiar with how Oozie and Mapreduce works, you're quickly going to realize that this workflow will only run once and fail second time around. The reason is that my destination never changes and if output exists, you're going to get a failure on the next run. For that, we're going to add a prepare action to delete destination file/directory. Copy the second argument to clipboard. Paste it into advanced properties and change mkdir drop-down to delete.      We're almost ready to submit our workflow; I first have to create an HDFS directory (distcp-wf) that will contain my distcp workflow and file I'd like copied.  hdfs dfs -mkdir distcp-wf 
hdfs dfs -touchz file 
hdfs dfs -ls 
Found 4 items 
drwx------   - centos hdfs          0 2017-08-29 14:35 .Trash 
drwx------   - centos hdfs          0 2017-08-29 14:33 .staging 
drwxr-xr-x   - centos hdfs          0 2017-08-29 14:35 distcp-wf
-rw-r--r--   3 centos hdfs         10 2017-08-29 01:26 file  Now I'm ready to save and submit my workflow, enter the HDFS path of the workflow directory you just created      notice the job properties have the fully-expanded nameNode and resourceManager addresses, that's what is being used for variable substitution.  Now I am going to submit the job and and use filtering in the dashboard for the name of the workflow.      Now let's switch back to the distcp action as I'd like to demonstrate a few other things about distcp that you can leverage. If you refer to distcp user guide  you notice that there are many arguments we didn't cover like -append, -update etc. What if you would like to use them in your distcp? Well WFM has got you covered, the eagle-eyed users would see the tool-tip the first time we tried to configure distcp action node and see that you can pass the arguments in the same field as source and destination.      So in addition to the two arguments, I'm going to add -update and -skipcrccheck in front of the existing ones.      My workflow XML should now look like so      So when I execute with new arguments, everything should still be green.      On a side note, our documentation team has done a phenomenal job adding resources to our WFM section. I encourage everyone interested in WFM to review. The caveats with distcp is that in some cases you cannot do distcp via Oozie from secure to insecure and vice versa. There are parameters you have to specify to make it work in some cases but overall it is not supported in heterogeneous clusters. Other issues crop up when you distcp from HA enabled clusters. You have to specify the nameservices for both clusters. Please leverage HCC to find resources how to get that working. Hope this was useful! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-03-2019
	
		
		03:35 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,   Am looking for the same version and i am facing same problem like you ve done. Do you know where can i find this ?  brs, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-04-2018
	
		
		05:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 make sure that this error haven't caused by tables that Hive Create in your MySql database:  check out if there is something looks like this error:  Error: Index column size too large. The maximum column size is 767 bytes. (state=HY000,code=1709)
or just:
The maximum column size is 767 bytes
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-17-2017
	
		
		12:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello Artem,
thanks, adding an interpreter line worked. I don't know how could I forget
that...? I think, i'm doing lot of multi tasking. Also I don't have
python 3 installed so I was running on python 2. Once again, thank you for
quick response. Really appreciate it. Sam 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-11-2017
	
		
		02:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 That's good to know! Many restrictions with Oozie... 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-10-2017
	
		
		02:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I got rid off SPNEGO on this cluster and set  oozie.authentication.type=simple
  as I'm accessing from Mac, I don't need SPNEGO. I'm able to access Oozie UI now. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-09-2017
	
		
		09:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I'll post this as separate question 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-08-2017
	
		
		10:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 You can pass an escaped by clause  Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') 
Escaping is needed if you want to work with data that can contain these delimiter characters.   https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL  CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = "\t",
   "quoteChar"     = "'",
   "escapeChar"    = "\\"
)  
STORED AS TEXTFILE;
Default properties for SerDe is Comma-Separated (CSV) file
 
DEFAULT_ESCAPE_CHARACTER \
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-08-2017
	
		
		09:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You can upgrade Ambari to 2.4.2 and then follow the standard upgrade process to upgrade HDP from 2.3 to 2.5.3.  
						
					
					... View more