About sunile_manjee

sunile_manjee · ‎07-09-2016

Short Description: Teragen and Terasort Performance testing on AWS Article This article should be used with extreme care. Do not use as benchmark. I performed this test to simply run a quick 1 Terabype teragen test on AWS to determine what type of performance I can get from mapreduce on AWS with VERY LITTLE configuration tweaking/tuning On my github page here you will find the following: teragen script hadoop,yarn,mapred,capacity scheduler configurations used during testing Hardware: (Master & Datanode) 1 Master, 3 Data nodes d2.4xlarge, 16vCPU, 122GB ram, (max) 12x2000 Storage TeraGen Results: 1hrs, 6mins, 38sec Job Counters: Terasort Results: 1hrs, 34mins, 20sec Teravalidate Results: 25mins, 27sec

sunile_manjee · ‎07-08-2016

@Sri Bandaru yes and then put your jars in that directory.

sunile_manjee · ‎07-08-2016

I have launched cluster on cloudbreak and want to start using hdp services. the documentation on http://sequenceiq.com/cloudbreak-docs/latest/operations/#ssh-to-the-hosts does't provide details on how to do so. Any help?

sunile_manjee · ‎07-08-2016

@Faisal Hussain please take a look at this post. @Bryan Bende mentioned this: incoming CSV like: h1,h2,h3,h4 v1,v2,v3,v4 You could capture that in ExtractText with a pattern of: (.+),(.+),(.+),(.+)\n(.+),(.+),(.+),(.+) Then in ReplaceText: { "${csv.1}" : "${csv.5}", "${csv.2}" : "${csv.6}", "${csv.3}" : "${csv.7}", "${csv.4}" : "${csv.8}" } Would produce: { "h1" : "v1", "h2" : "v2", "h3" : "v3", "h4" : "v4" }

sunile_manjee · ‎07-08-2016

@Sri Bandaru you can add the jars in hive.aux.jars.path=/path/to/jar to add them to global location. then you don't need admin access.

sunile_manjee · ‎07-08-2016

I am running HDP cluster on AWS. EBS is getting very expensive. Reading about heterogeneous storage, is possible to use AWS-s3 as lets say cold storage and AWS-EBS as warm/hot storage. If so how would I do that. I don't see any documentation. s3 is much cheaper and hence want to use EBS as hot and s3 as cold storage.

sunile_manjee · ‎07-08-2016

@Jay Johnson use Username: root Password: hawq2016

sunile_manjee · ‎07-08-2016

@slachterman Good catch. fixed.

sunile_manjee · ‎07-08-2016

@rkovacs so does that mean to use cloudbreak.sequenceiq.com with azure i must launch azure vm?

sunile_manjee · ‎07-07-2016

Cloudbreak with cloudbreak.sequenceiq.com seems very easy on AWS. I simply create a role and use that role in cloudbreak to deploy instances. For azure I am not sure where to get App Id, Password, App Owner Tenant Id. Where do a find this info to launch clusters from cloubreak.sequenceiq.com? do I have to launch cloudbreak deployer on azure do use cloudbreak.sequenceiq.com? I don't have to do this with AWS.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Teragen, Terasort, and Teravalidate Performance te...

Re: Adding hive auxiliary jar files

Accessing HDP client services on cloudbreak

Re: Nifi - AVRO to CSV or Json to CSV,NIFI - conve...

Re: Adding hive auxiliary jar files

S3 as HDFS Heterogeneous storage?

Re: I can't D/L HDP with Pivotal HAWQ

Re: Assign IP to HDP 2.5 Sandbox on Virtualbox

Re: Cloudbreak on azure using cloudbreak.sequencei...

Cloudbreak on azure using cloudbreak.sequenceiq.co...