About bsaini

bsaini · ‎12-31-2015

@Hefei Li Great! Can you accept the answer then so we can close this question and others having similar issue get benefited?

bsaini · ‎12-31-2015

In general, you have following options when running R on Hortonworks Data Platform (HDP) - o RHadoop (rmr) - R program written in MapReduce paradigm. MapReduce is not a vendor specific API and any program written with MapReduce is portable across Hadoop distributions. https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr o Hadoop Streaming - R program written to make use of Hadoop Streaming but the program structure still aligns with MapReduce. Above benefit still applies. o RJDBC - This example does not require the R programs to be written using MapReduce and still remains 100% native R APIs without any third party packages. Here is a tutorial with a video, sample data and R script: http://hortonworks.com/hadoop-tutorial/using-revolution-r-enterprise-tutorial-hortonworks-sandbox/ Using RJDBC, the R program can have Hadoop parallelize pre-processing and filtering. R submits a query to Hive or SparkSQL making use of distributed and parallel processing. Then uses existing R models, as is & without any changes or use of any proprietary APIs. Typically speaking, any data science application involves a ton of prepping which is usually 75% of the work. RJDBC allows pushing that work to Hive to take advantage of distributed computing. o Spark R - Lastly, the Spark R interface which is a newer component in Spark. SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. This component is available since Spark 1.4.1 (current version 1.5.2) Here are some details on it - https://spark.apache.org/docs/latest/sparkr.html And the available API - https://spark.apache.org/docs/latest/api/R/

bsaini · ‎12-31-2015

For the sake of discussion, lets say the system is running at peak load 24 hours.. Out of 20K, there are 10K reads and 10K inserts per second So, after the first 30 mins of running, system will add additional 10K deletes per second so a total of 30K hits. Its definitely not that straight forward and HBase is going to batch the actual deletes somehow internally. 30K tps is not a lot for HBase but the question is how big of a cluster are we talking about? Also, other thing will be the memory available to the RegionServer... it makes sense to keep as much data in memory as possible so the I/O is minimal, as the data is to be deleted after 30 mins anyways. So, the next set of questions is - whats the memory available on the box and to region server? How big is each message?

bsaini · ‎12-31-2015

Just to add to that.. if you want to make change, it would be easier for you to follow the code by looking at the source here - https://github.com/apache/ambari/ The ui code is here - https://github.com/apache/ambari/tree/trunk/ambari-web/app You can just fork the project and build it after making changes.

bsaini · ‎12-31-2015

@jarhyeon cho Ambari UI is an angular app that uses the REST API on Ambari Server. The ui code (app.js) is here - /usr/lib/ambari-server/web/javascripts [root@sandbox javascripts]# pwd /usr/lib/ambari-server/web/javascripts [root@sandbox javascripts]# ls app.js.gz vendor.js.gz [root@sandbox javascripts]#

bsaini · ‎12-30-2015

@Grace Xu Here is another approach for you (a script will definitely work as well but will not make use of oozie) I believe we can make use of decision node and do the following to get this done via oozie - Assumption: I am assuming the log table has some kind of id (log_entry_id) associated with the table names along with some other attributes that can be used by sqoop job like (HDFS/Hive destination, columns to import - IF not all etc) This flow can be adjusted to match whatever constraints and existing design you are working with. For e.g. you can use the table name from the log if you do not have an id. Also, you may have some tables that were relevant (active) in past but not any more so you may have active flag in log table that can be utilized in the 2) Shell action below to figure out if that table has to be imported or not.. etc etc.. you get the idea. This is a general structure that can be customized. the WorkFlow will have the following key steps - 1) Java action - Write a small java program that uses JDBC to connect to SQL Server, reads the data from log table and creates a comma delimited file on HDFS like /tmp/inegst-tables.txt 2) Shell action (Input parameter : log_entry_id) - read the file from HDFS and get the line starting with (log_entry_id+1). The script will output the value in Java Properties format (like param1=value1,param2=vaule2) etc.. The workflow node will capture the output value for use in the subsequent steps. If the shell script did not find the next row, it will return -1. 3) Decision Node - if the value is less than 0, then go to END else go to sqoop node 4) Sqoop Node - Execute sqoop task using the table name, destination etc received from the Shell action capture output. On success go to previous Shell Action else go to End Workflow Execution : When executing the workflow a default starting value of 0 will be provided for log_entry_id.

bsaini · ‎12-30-2015

Currently we don't provide a MIB so users can choose to organize their structure in any way they want. Moving forward in Ambari's future, we may provide an Apache MIB. I believe thats what the ticket, that you referenced, was opened for (along with the patch, README and other inform necessary for the implementation) : https://issues.apache.org/jira/browse/AMBARI-1320.. Today, we use a single OID for all of Alerts, and the body of the trap looks like this: 2015-07-22 13:28:41 0.0.0.0(via UDP: [172.16.204.221]:41891->[172.16.204.221]) TRAP, SNMP v1, community public SNMPv2-SMI::zeroDotZero Cold Start Trap (0) Uptime: 0:00:00.00 SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkUp IF-MIB::linkUp = STRING: " [Alert] NameNode Last Checkpoint [Service] HDFS [Component] NAMENODE [Host] revo2.hortonworks.local Last Checkpoint: [0 hours, 43 minutes, 44 transactions] " IF-MIB::linkUp = STRING: " [OK] NameNode Last Checkpoint Hope this helps!

bsaini · ‎12-30-2015

@Hefei Li Can you set the directory permissions to 755? (all the directories ,including parents, in the path that contains the jar files)

bsaini · ‎12-30-2015

@Kumar Datla The error message does say - "Unable to validate the location with path: /stmp" which means either the path does not exist or due to permissions issue the program / process was unable to access the directory. In any case, looks like you have been able to move past the issue so I will close this thread.

bsaini · ‎12-30-2015

@Hefei Li It appears that you have the jars at the right place but it could be a permissions issue. I see in your screen snapshots that the jar file have the 644 permissions but how about the directory containing the jar files? The directories are recommended to have 755. This issue could occur due to incorrect value of umask (recommended value 0022) Fix: Try setting the permissions of all directories in path to 755 and try again. Let us know how it goes.

Online	Offline
Last Visited	‎04-06-2018 07:42 PM

Member Since	‎09-24-2015 03:23 PM
Last Visited	‎04-06-2018 07:42 PM
Posts	178
Kudos received	103

Cloudera Community

Re: Which is better to create Hadoop accounts in L...

Re: Last step of Ambari HDP installation fails for...

Re: How to create falcon entity dependencies?

Re: Where is the output of an Oozie workflow store...

Re: Hi I am new to falcon , can anyone help me wit...

Re: Could not run Oozie workflow with distcp-actio...

Running R program on HDP

Re: When is HBase CF TTL too short?

Re: at which directory ambari installed

Re: at which directory ambari installed

Re: Sqoop - dynamically import from SQL server

Re: Ambari 2.1.2 snmptrap

Re: Could not run Oozie workflow with distcp-actio...

Re: Cluster entity is giving error while submittin...

Re: Could not run Oozie workflow with distcp-actio...