Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4055 | 08-20-2018 08:26 PM | |
| 1954 | 08-15-2018 01:59 PM | |
| 2379 | 08-13-2018 02:20 PM | |
| 4114 | 07-23-2018 04:37 PM | |
| 5024 | 07-19-2018 12:52 PM |
08-19-2016
06:20 PM
2 Kudos
@vpemawat I am not sure why anyone down voted your questions. make no sense to me. On to your question. You are seeing typical challenge with RDBMS. For ingestion I would say start with Apache NiFi. This will ingest data directory into the hortonworks platform. Now you need to decide where to egress the data to. hive -> great tool for unknown query patterns. Ie BI HAWQ -> great tool for unknown query patterns. Ie BI. Highly optimized for BI Phoenix -> great for known query patterns. SQL on hbase HDFS -> here you can store your raw data and federate to phoenix, HAWQ, hive, etc. I hope this help start the conversation on landing data in the platform using the right tool for the use case.
... View more
08-18-2016
08:24 PM
1 Kudo
Is autoscaling possible using the new product Hortonworks Connected Data Cloud on aws? I understand the plumbing of this product uses cloudbreak which supports autoscaling. So is autoscaling supported on Hortonworks Connected Data Cloud?
... View more
Labels:
- Labels:
-
Hortonworks Cloudbreak
08-17-2016
09:11 PM
@Constantin Stanca nope.
... View more
08-17-2016
09:02 PM
2 Kudos
@jbarnett When you need to interface with the service (Hbase,hive,yarn,etc) then you decide to install the client node. typically you find in cluster setups you dedicate 1 node called "edge node" where you install all your client libraries. this then becomes your single entry point to run your services. you can add many edge node to scale out accordingly. as @Constantin Stanca explained it simply installed the client libraries for your specific version of hadoop and services. makes it very easy on end user. hope that helps.
... View more
08-17-2016
03:27 AM
@mqureshi did this answer your question?
... View more
08-16-2016
09:56 PM
4 Kudos
I am a junkie for faster & cheaper data processing. Exactly why I love IaaS. My personal REAL WORLD experience with the typically IaaS providers has been generally slow on performance. Not to say hadoop/hbase/spark/etc jobs will not perform; however, you need to be familiar with what you're getting into and set realistic expectations. Recently I meet the IaaS vendor Their liquid metal offering which provides all the greatness which comes with bare metal on-prem installations but in the cloud. Options for bonded NICs & DAS had me at hello. I decided to run the same performance test I ran on AWS (article here) on bigstep. All the details of the scripts I ran are in that article. Just a quick note - these performance articles do not advocate for or against any specific IaaS provider. Nor does it reflect the HDP software. I simply want to run the repeatable processing test with near/similar IaaS hardware profiles and gather performance statistics. Interrupt the numbers as you wish. 1xMaster Node Hardware Profile CPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz)
RAM: 128 GB DDR3 ECCLocal storage disks: 1 NVMEDisk size: 745 GBNetwork bandwidth: 40 gbps
3xData Nodes Hardware ProfileCPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz)
RAM: 256 GB DDR3 ECCLocal storage disks: 12 HDDDisk size: 1863 GBNetwork bandwidth: 40 gbps Teragen results: 11 Mins 49 Secs I want to remain as objective as possible but WOW. That is simply one of the fastest teragen results I have ever seen. TeraSort results 51 Mins 12 secs Fastest I have seen on the cloud so far. On-prem with 1 additional node I was able to get it down to 40 mins. So 51 mins on 1 less nodes is pretty good. TeraValidate Results 4 mins 42 seconds This again was the faster performance I have seen on 1TB using teravalidate. I hope this helps with some basical insights into similar test I have performed so far on various IaaS providers. In the coming weeks/months I plan on publishing performance test result using azure and GCP.
It is extremely important to understand zero performance tweaking as been done. Nor does this reflect how HDP runs on IaaS providers. This does not reflect anything about the IaaS provider as well. I simply want to run with minimum tweaking teragen/terasort/teravalidate test, with same parameters, and similar hardware profiles and document results. That's it. Keep it simple.
... View more
Labels:
08-16-2016
04:42 AM
@mqureshi to keep the curly braces use this [^0-9a-zA-Z{}]+
... View more
08-16-2016
02:29 AM
1 Kudo
@mqureshi you have tons of options. for example [^0-9a-zA-Z]+ white list characters or [^\w\d] which means matches any non-alphanumeric characters and many other ways. the first one works for me to remove special characters.
... View more
08-15-2016
04:34 AM
1 Kudo
@Edgar Daeds I do not recommend using Zeppelin LDAP+AD (0.6.0) on stack less then 2.4.x. Can it be done..I can't say for sure. Your path may be riddled with trouble. Do it with 2.4.x.
... View more
08-12-2016
03:23 PM
1 Kudo
it is in technical preview in 2.4x. Your question was about 2.3.2. So you should be okay testing it out in 2.4.x
... View more