Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4033 | 08-20-2018 08:26 PM | |
| 1933 | 08-15-2018 01:59 PM | |
| 2366 | 08-13-2018 02:20 PM | |
| 4096 | 07-23-2018 04:37 PM | |
| 5003 | 07-19-2018 12:52 PM |
12-28-2016
06:25 AM
Hi Sunile , Could you please attach json structure which you have used to create entity in Apache atlas ? I want to create hive table entity along with two columns entity in it,How can I do that using REST API? Please post the example with full command and json body structire if you have.
... View more
08-24-2016
08:15 PM
2 Kudos
after start the sandbox run the following to check the status of Ambari. If any of them is not running, start it again and you should be fine ambari-agent status
ambari-server status
ambari-server start
ambari-server start
... View more
08-23-2016
04:30 PM
@Ayub Pathan i will try soon. trying 100s of ways last night and none of the combos worked. will update today.
... View more
08-22-2016
06:55 PM
@Sunile Manjee Have you seen this article for tuning: https://community.hortonworks.com/articles/38591/hadoop-and-ldap-usage-load-patterns-and-tuning.html This article provides good background on the performance scaling of LDAP: http://researchweb.watson.ibm.com/people/d/dverma/papers/sigmetrics2001.pdf
... View more
08-18-2016
09:40 PM
1 Kudo
Hi, It current is not. You can manually resize scale-up/down but cannot setup auto-scaling via Hortonworks Data Cloud. We are considering this a roadmap item. Thanks.
... View more
08-17-2016
09:02 PM
2 Kudos
@jbarnett When you need to interface with the service (Hbase,hive,yarn,etc) then you decide to install the client node. typically you find in cluster setups you dedicate 1 node called "edge node" where you install all your client libraries. this then becomes your single entry point to run your services. you can add many edge node to scale out accordingly. as @Constantin Stanca explained it simply installed the client libraries for your specific version of hadoop and services. makes it very easy on end user. hope that helps.
... View more
08-16-2016
09:56 PM
4 Kudos
I am a junkie for faster & cheaper data processing. Exactly why I love IaaS. My personal REAL WORLD experience with the typically IaaS providers has been generally slow on performance. Not to say hadoop/hbase/spark/etc jobs will not perform; however, you need to be familiar with what you're getting into and set realistic expectations. Recently I meet the IaaS vendor Their liquid metal offering which provides all the greatness which comes with bare metal on-prem installations but in the cloud. Options for bonded NICs & DAS had me at hello. I decided to run the same performance test I ran on AWS (article here) on bigstep. All the details of the scripts I ran are in that article. Just a quick note - these performance articles do not advocate for or against any specific IaaS provider. Nor does it reflect the HDP software. I simply want to run the repeatable processing test with near/similar IaaS hardware profiles and gather performance statistics. Interrupt the numbers as you wish. 1xMaster Node Hardware Profile CPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz)
RAM: 128 GB DDR3 ECCLocal storage disks: 1 NVMEDisk size: 745 GBNetwork bandwidth: 40 gbps
3xData Nodes Hardware ProfileCPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz)
RAM: 256 GB DDR3 ECCLocal storage disks: 12 HDDDisk size: 1863 GBNetwork bandwidth: 40 gbps Teragen results: 11 Mins 49 Secs I want to remain as objective as possible but WOW. That is simply one of the fastest teragen results I have ever seen. TeraSort results 51 Mins 12 secs Fastest I have seen on the cloud so far. On-prem with 1 additional node I was able to get it down to 40 mins. So 51 mins on 1 less nodes is pretty good. TeraValidate Results 4 mins 42 seconds This again was the faster performance I have seen on 1TB using teravalidate. I hope this helps with some basical insights into similar test I have performed so far on various IaaS providers. In the coming weeks/months I plan on publishing performance test result using azure and GCP.
It is extremely important to understand zero performance tweaking as been done. Nor does this reflect how HDP runs on IaaS providers. This does not reflect anything about the IaaS provider as well. I simply want to run with minimum tweaking teragen/terasort/teravalidate test, with same parameters, and similar hardware profiles and document results. That's it. Keep it simple.
... View more
Labels:
08-23-2017
07:52 PM
I should have mentioned I was using VirtualBox v5.1.14
... View more
08-12-2016
07:05 PM
1 Kudo
just use doAs=true make sure only hive can read the warehouse folder and you are done. Hive cli can start but not access anything
... View more
08-11-2016
07:44 PM
1 Kudo
@Sunile Manjee As @SBandaru states, you will need to make sure that proper group membership is maintained for the non-standard users. If you specify the users at cluster creation time, Ambari will take care of this for you. If you create them after the fact, then you will need to verify group membership. You may also need to modify the auth_to_local filters if the non-standard users are in AD/LDAP and you need to map them to local users. Another thing to consider is if you run the Ambari agent as non-root. There are a number of sudo rules that need to be put in place for the ambari user that allow execution of commands as the various service accounts for purposes of starting/stopping the services, installing packages, etc. You'll need to modify the customizable users sudo entry to suit your environment.
... View more