Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                178
            
            
                Posts
            
        
                113
            
            
                Kudos Received
            
        
                28
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4638 | 05-25-2016 02:39 AM | |
| 4589 | 05-03-2016 01:27 PM | |
| 1194 | 04-26-2016 07:59 PM | |
| 16763 | 03-24-2016 04:10 PM | |
| 3135 | 02-02-2016 11:50 PM | 
			
    
	
		
		
		12-31-2015
	
		
		02:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Hefei Li   Great! Can you accept the answer then so we can close this question and others having similar issue get benefited? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-31-2015
	
		
		03:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 In general, you have following options when running R on
Hortonworks Data Platform (HDP) -  o RHadoop (rmr) - R program written in MapReduce paradigm. MapReduce
is not a vendor specific API and any program written with MapReduce is portable
across Hadoop distributions.  https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr  o Hadoop Streaming - R program written to make use of Hadoop
Streaming but the program structure still aligns with MapReduce. Above benefit
still applies.  o RJDBC - This example does not require the R programs to be
written using MapReduce and still remains 100% native R APIs without any third
party packages.  Here is a tutorial with a video, sample data and R script:  http://hortonworks.com/hadoop-tutorial/using-revolution-r-enterprise-tutorial-hortonworks-sandbox/  Using RJDBC, the R program can have Hadoop parallelize
pre-processing and filtering. R submits a query to Hive or SparkSQL making use of distributed and parallel processing. Then uses existing R models,
as is & without any changes or use of any proprietary APIs. Typically speaking, any data science application involves a
ton of prepping which is usually 75% of the work. RJDBC allows pushing that
work to Hive to take advantage of distributed computing.  o Spark R - Lastly, the Spark R interface which is a newer
component in Spark. SparkR is an R package that provides a light-weight
frontend to use Apache Spark from R. This component is available since Spark 1.4.1 (current version 1.5.2)  Here are some details on it -  https://spark.apache.org/docs/latest/sparkr.html  And the available API -  https://spark.apache.org/docs/latest/api/R/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-31-2015
	
		
		03:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 For the sake of discussion, lets say    the system is running at peak load 24 hours..   Out of 20K, there are 10K reads and 10K inserts per second    So, after the first 30 mins of running, system will add additional 10K deletes per second so a total of 30K hits. Its definitely not that straight forward and HBase is going to batch the actual deletes somehow internally.   30K tps is not a lot for HBase but the question is how big of a cluster are we talking about?   Also, other thing will be the memory available to the RegionServer... it makes sense to keep as much data in memory as possible so the I/O is minimal, as the data is to be deleted after 30 mins anyways. So, the next set of questions is - whats the memory available on the box and to region server? How big is each message?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-31-2015
	
		
		03:04 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Just to add to that.. if you want to make change, it would be easier for you to follow the code by looking at the source here - https://github.com/apache/ambari/  The ui code is here - https://github.com/apache/ambari/tree/trunk/ambari-web/app  You can just fork the project and build it after making changes.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-31-2015
	
		
		03:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @jarhyeon cho  Ambari UI is an angular app that uses the REST API on Ambari Server. The ui code (app.js) is here - /usr/lib/ambari-server/web/javascripts  [root@sandbox javascripts]# pwd
/usr/lib/ambari-server/web/javascripts
[root@sandbox javascripts]# ls
app.js.gz  vendor.js.gz
[root@sandbox javascripts]# 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2015
	
		
		07:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Grace Xu  Here is another approach for you (a script will definitely work as well but will not make use of oozie)   I believe we can make use of decision node and do the following to get this done via oozie -   Assumption: I am assuming the log table has some kind of id (log_entry_id) associated with the table names along with some other attributes that can be used by sqoop job like (HDFS/Hive destination, columns to import - IF not all etc)   This flow can be adjusted to match whatever constraints and existing design you are working with. For e.g. you can use the table name from the log if you do not have an id. Also, you may have some tables that were relevant (active) in past but not any more so you may have active flag in log table that can be utilized in the 2) Shell action below to figure out if that table has to be imported or not.. etc etc.. you get the idea.   This is a general structure that can be customized.   the WorkFlow will have the following key steps - 
1) Java action - Write a small java program that uses JDBC to connect to SQL Server, reads the data from log table and creates a comma delimited file on HDFS like /tmp/inegst-tables.txt 
2) Shell action (Input parameter : log_entry_id) - read the file from HDFS and get the line starting with (log_entry_id+1). The script will output the value in Java Properties format (like param1=value1,param2=vaule2) etc.. The workflow node will capture the output value for use in the subsequent steps. If the shell script did not find the next row, it will return -1. 
3) Decision Node - if the value is less than 0, then go to END else go to sqoop node
4) Sqoop Node - Execute sqoop task using the table name, destination etc received from the Shell  action capture output. On success go to previous Shell Action else go to End
Workflow Execution : When executing the workflow a default starting value of 0 will be provided for log_entry_id. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2015
	
		
		05:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Currently we don't provide a MIB so users can choose to organize their structure in any way they want. Moving forward in Ambari's future, we may provide an Apache MIB. I believe thats what the ticket, that you referenced, was opened for (along with the patch, README and other inform necessary for the implementation) : https://issues.apache.org/jira/browse/AMBARI-1320..  Today, we use a single OID for all of Alerts, and the body of the trap looks like this:  2015-07-22 13:28:41 0.0.0.0(via UDP: [172.16.204.221]:41891->[172.16.204.221]) TRAP, SNMP v1, community public
SNMPv2-SMI::zeroDotZero Cold Start Trap (0) Uptime: 0:00:00.00
SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkUp
IF-MIB::linkUp = STRING: "
     
[Alert] NameNode Last Checkpoint
[Service] HDFS
[Component] NAMENODE
[Host] revo2.hortonworks.local
Last Checkpoint: [0 hours, 43 minutes, 44 transactions]
    " IF-MIB::linkUp = STRING: "
      [OK] NameNode Last Checkpoint
  Hope this helps! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2015
	
		
		05:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Hefei Li  Can you set the directory permissions to 755? (all the directories ,including parents, in the path that contains the jar files)  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2015
	
		
		03:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Kumar Datla  The error message does say - "Unable to validate the location with path: /stmp" which means either the path does not exist or due to permissions issue the program / process was unable to access the directory.   In any case, looks like you have been able to move past the issue so I will close this thread.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-30-2015
	
		
		03:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Hefei Li  It appears that you have the jars at the right place but it could be a permissions issue. I see in your screen snapshots that the jar file have the 644 permissions but how about the directory containing the jar files? The directories are recommended to have 755.   This issue could occur due to incorrect value of umask (recommended value 0022)   Fix: Try setting the permissions of all directories in path to 755 and try again.   Let us know how it goes.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













