About sunile_manjee

sunile_manjee · ‎08-19-2016

@vpemawat I am not sure why anyone down voted your questions. make no sense to me. On to your question. You are seeing typical challenge with RDBMS. For ingestion I would say start with Apache NiFi. This will ingest data directory into the hortonworks platform. Now you need to decide where to egress the data to. hive -> great tool for unknown query patterns. Ie BI HAWQ -> great tool for unknown query patterns. Ie BI. Highly optimized for BI Phoenix -> great for known query patterns. SQL on hbase HDFS -> here you can store your raw data and federate to phoenix, HAWQ, hive, etc. I hope this help start the conversation on landing data in the platform using the right tool for the use case.

sunile_manjee · ‎08-18-2016

Is autoscaling possible using the new product Hortonworks Connected Data Cloud on aws? I understand the plumbing of this product uses cloudbreak which supports autoscaling. So is autoscaling supported on Hortonworks Connected Data Cloud?

sunile_manjee · ‎08-17-2016

@Constantin Stanca nope.

sunile_manjee · ‎08-17-2016

@jbarnett When you need to interface with the service (Hbase,hive,yarn,etc) then you decide to install the client node. typically you find in cluster setups you dedicate 1 node called "edge node" where you install all your client libraries. this then becomes your single entry point to run your services. you can add many edge node to scale out accordingly. as @Constantin Stanca explained it simply installed the client libraries for your specific version of hadoop and services. makes it very easy on end user. hope that helps.

sunile_manjee · ‎08-17-2016

@mqureshi did this answer your question?

sunile_manjee · ‎08-16-2016

I am a junkie for faster & cheaper data processing. Exactly why I love IaaS. My personal REAL WORLD experience with the typically IaaS providers has been generally slow on performance. Not to say hadoop/hbase/spark/etc jobs will not perform; however, you need to be familiar with what you're getting into and set realistic expectations. Recently I meet the IaaS vendor Their liquid metal offering which provides all the greatness which comes with bare metal on-prem installations but in the cloud. Options for bonded NICs & DAS had me at hello. I decided to run the same performance test I ran on AWS (article here) on bigstep. All the details of the scripts I ran are in that article. Just a quick note - these performance articles do not advocate for or against any specific IaaS provider. Nor does it reflect the HDP software. I simply want to run the repeatable processing test with near/similar IaaS hardware profiles and gather performance statistics. Interrupt the numbers as you wish. 1xMaster Node Hardware Profile CPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz) RAM: 128 GB DDR3 ECCLocal storage disks: 1 NVMEDisk size: 745 GBNetwork bandwidth: 40 gbps 3xData Nodes Hardware ProfileCPU: 2 xIntel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz(8 x 2.40 GHz) RAM: 256 GB DDR3 ECCLocal storage disks: 12 HDDDisk size: 1863 GBNetwork bandwidth: 40 gbps Teragen results: 11 Mins 49 Secs I want to remain as objective as possible but WOW. That is simply one of the fastest teragen results I have ever seen. TeraSort results 51 Mins 12 secs Fastest I have seen on the cloud so far. On-prem with 1 additional node I was able to get it down to 40 mins. So 51 mins on 1 less nodes is pretty good. TeraValidate Results 4 mins 42 seconds This again was the faster performance I have seen on 1TB using teravalidate. I hope this helps with some basical insights into similar test I have performed so far on various IaaS providers. In the coming weeks/months I plan on publishing performance test result using azure and GCP. It is extremely important to understand zero performance tweaking as been done. Nor does this reflect how HDP runs on IaaS providers. This does not reflect anything about the IaaS provider as well. I simply want to run with minimum tweaking teragen/terasort/teravalidate test, with same parameters, and similar hardware profiles and document results. That's it. Keep it simple.

sunile_manjee · ‎08-16-2016

@mqureshi to keep the curly braces use this [^0-9a-zA-Z{}]+

sunile_manjee · ‎08-16-2016

@mqureshi you have tons of options. for example [^0-9a-zA-Z]+ white list characters or [^\w\d] which means matches any non-alphanumeric characters and many other ways. the first one works for me to remove special characters.

sunile_manjee · ‎08-15-2016

@Edgar Daeds I do not recommend using Zeppelin LDAP+AD (0.6.0) on stack less then 2.4.x. Can it be done..I can't say for sure. Your path may be riddled with trouble. Do it with 2.4.x.

sunile_manjee · ‎08-12-2016

it is in technical preview in 2.4x. Your question was about 2.3.2. So you should be okay testing it out in 2.4.x

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: I am facing issue of huge data in mysql table ...

Autoscaling on Hortonworks Connected Data Cloud?

Re: Can falcon move data from hdfs to non-hdfs sto...

Re: When to install Hadoop clients

Re: nifi regex replace special characters

Teragen, Terasort, and Teravalidate Performance te...

Re: nifi regex replace special characters

Re: nifi regex replace special characters

Re: Zeppelin authentication HDP 2.3.0

Re: Zeppelin authentication HDP 2.3.0