Member since
09-24-2015
178
Posts
113
Kudos Received
28
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3560 | 05-25-2016 02:39 AM | |
3746 | 05-03-2016 01:27 PM | |
879 | 04-26-2016 07:59 PM | |
14813 | 03-24-2016 04:10 PM | |
2208 | 02-02-2016 11:50 PM |
12-31-2015
02:12 PM
@Hefei Li Great! Can you accept the answer then so we can close this question and others having similar issue get benefited?
... View more
12-31-2015
03:20 AM
2 Kudos
In general, you have following options when running R on
Hortonworks Data Platform (HDP) - o RHadoop (rmr) - R program written in MapReduce paradigm. MapReduce
is not a vendor specific API and any program written with MapReduce is portable
across Hadoop distributions. https://github.com/RevolutionAnalytics/RHadoop/wiki/rmr o Hadoop Streaming - R program written to make use of Hadoop
Streaming but the program structure still aligns with MapReduce. Above benefit
still applies. o RJDBC - This example does not require the R programs to be
written using MapReduce and still remains 100% native R APIs without any third
party packages. Here is a tutorial with a video, sample data and R script: http://hortonworks.com/hadoop-tutorial/using-revolution-r-enterprise-tutorial-hortonworks-sandbox/ Using RJDBC, the R program can have Hadoop parallelize
pre-processing and filtering. R submits a query to Hive or SparkSQL making use of distributed and parallel processing. Then uses existing R models,
as is & without any changes or use of any proprietary APIs. Typically speaking, any data science application involves a
ton of prepping which is usually 75% of the work. RJDBC allows pushing that
work to Hive to take advantage of distributed computing. o Spark R - Lastly, the Spark R interface which is a newer
component in Spark. SparkR is an R package that provides a light-weight
frontend to use Apache Spark from R. This component is available since Spark 1.4.1 (current version 1.5.2) Here are some details on it - https://spark.apache.org/docs/latest/sparkr.html And the available API - https://spark.apache.org/docs/latest/api/R/
... View more
12-31-2015
03:12 AM
1 Kudo
For the sake of discussion, lets say the system is running at peak load 24 hours.. Out of 20K, there are 10K reads and 10K inserts per second So, after the first 30 mins of running, system will add additional 10K deletes per second so a total of 30K hits. Its definitely not that straight forward and HBase is going to batch the actual deletes somehow internally. 30K tps is not a lot for HBase but the question is how big of a cluster are we talking about? Also, other thing will be the memory available to the RegionServer... it makes sense to keep as much data in memory as possible so the I/O is minimal, as the data is to be deleted after 30 mins anyways. So, the next set of questions is - whats the memory available on the box and to region server? How big is each message?
... View more
12-31-2015
03:04 AM
Just to add to that.. if you want to make change, it would be easier for you to follow the code by looking at the source here - https://github.com/apache/ambari/ The ui code is here - https://github.com/apache/ambari/tree/trunk/ambari-web/app You can just fork the project and build it after making changes.
... View more
12-31-2015
03:00 AM
1 Kudo
@jarhyeon cho Ambari UI is an angular app that uses the REST API on Ambari Server. The ui code (app.js) is here - /usr/lib/ambari-server/web/javascripts [root@sandbox javascripts]# pwd
/usr/lib/ambari-server/web/javascripts
[root@sandbox javascripts]# ls
app.js.gz vendor.js.gz
[root@sandbox javascripts]#
... View more
12-30-2015
07:47 PM
1 Kudo
@Grace Xu Here is another approach for you (a script will definitely work as well but will not make use of oozie) I believe we can make use of decision node and do the following to get this done via oozie - Assumption: I am assuming the log table has some kind of id (log_entry_id) associated with the table names along with some other attributes that can be used by sqoop job like (HDFS/Hive destination, columns to import - IF not all etc) This flow can be adjusted to match whatever constraints and existing design you are working with. For e.g. you can use the table name from the log if you do not have an id. Also, you may have some tables that were relevant (active) in past but not any more so you may have active flag in log table that can be utilized in the 2) Shell action below to figure out if that table has to be imported or not.. etc etc.. you get the idea. This is a general structure that can be customized. the WorkFlow will have the following key steps -
1) Java action - Write a small java program that uses JDBC to connect to SQL Server, reads the data from log table and creates a comma delimited file on HDFS like /tmp/inegst-tables.txt
2) Shell action (Input parameter : log_entry_id) - read the file from HDFS and get the line starting with (log_entry_id+1). The script will output the value in Java Properties format (like param1=value1,param2=vaule2) etc.. The workflow node will capture the output value for use in the subsequent steps. If the shell script did not find the next row, it will return -1.
3) Decision Node - if the value is less than 0, then go to END else go to sqoop node
4) Sqoop Node - Execute sqoop task using the table name, destination etc received from the Shell action capture output. On success go to previous Shell Action else go to End
Workflow Execution : When executing the workflow a default starting value of 0 will be provided for log_entry_id.
... View more
12-30-2015
05:44 PM
Currently we don't provide a MIB so users can choose to organize their structure in any way they want. Moving forward in Ambari's future, we may provide an Apache MIB. I believe thats what the ticket, that you referenced, was opened for (along with the patch, README and other inform necessary for the implementation) : https://issues.apache.org/jira/browse/AMBARI-1320.. Today, we use a single OID for all of Alerts, and the body of the trap looks like this: 2015-07-22 13:28:41 0.0.0.0(via UDP: [172.16.204.221]:41891->[172.16.204.221]) TRAP, SNMP v1, community public
SNMPv2-SMI::zeroDotZero Cold Start Trap (0) Uptime: 0:00:00.00
SNMPv2-MIB::snmpTrapOID.0 = OID: IF-MIB::linkUp
IF-MIB::linkUp = STRING: "
[Alert] NameNode Last Checkpoint
[Service] HDFS
[Component] NAMENODE
[Host] revo2.hortonworks.local
Last Checkpoint: [0 hours, 43 minutes, 44 transactions]
" IF-MIB::linkUp = STRING: "
[OK] NameNode Last Checkpoint
Hope this helps!
... View more
12-30-2015
05:50 AM
@Hefei Li Can you set the directory permissions to 755? (all the directories ,including parents, in the path that contains the jar files)
... View more
12-30-2015
03:35 AM
1 Kudo
@Kumar Datla The error message does say - "Unable to validate the location with path: /stmp" which means either the path does not exist or due to permissions issue the program / process was unable to access the directory. In any case, looks like you have been able to move past the issue so I will close this thread.
... View more
12-30-2015
03:32 AM
1 Kudo
@Hefei Li It appears that you have the jars at the right place but it could be a permissions issue. I see in your screen snapshots that the jar file have the 644 permissions but how about the directory containing the jar files? The directories are recommended to have 755. This issue could occur due to incorrect value of umask (recommended value 0022) Fix: Try setting the permissions of all directories in path to 755 and try again. Let us know how it goes.
... View more