About learninghuman

Ashish03 · ‎09-02-2022

May I 1 question Please.. @gkeys If I buy HDInsight as PAAS, what will be role and responsibility as Hadoop Admin, or Admin job role will be removed? As we can't upgrade Hadoop versions, service will be 1 click ready. what else remaining? Performance tuning can be done by developer directly.. Hope you understand my worrying concern...

Ryanp · ‎08-07-2019

With the Advent of heterogeneous storage for hdfs can we now look at Nas in a new light .. Potentially we could lable Nas mounts on a data nodes as archive storage and have hdfs move data in there when it becomes cold I would like to hear opinions on this

TimothySpann · ‎01-11-2017

YARN is designed for Hadoop and is very mature and stable. Mesos is very new, written in C++, has CPU scheduling. This presentation is pretty good. http://www.slideshare.net/mKrishnaKumar1/mesos-vs-yarn-an-overview

tmccuch · ‎01-03-2017

@learninghuman If this answer helps, please accept it. Otherwise, I'd be happy to answer any remaining questions you have. Thanks! _Tom

tmccuch · ‎12-30-2016

@learninghuman You can read more about Hadoop Azure Support: Azure Blob Storage in the Apache Doc for Hadoop 2.7.2. You'd need to check with the vendors behind the other distros to see whether or not they support this or not.

learninghuman · ‎08-25-2016

@Tom McCuch Thanks a lot for the views and inputs. It definitely helps.

learninghuman · ‎06-29-2016

@Benjamin Leonhardi Thanks, makes sense

ravi1 · ‎05-13-2016

Yes. Once you specific STORED AS ORC, OrcSerde is what is used which ignores those. Your SerDe can decide which of them in create table script can be used.

axie · ‎07-11-2018

Thank you @Krish E, did you sort it out now? I am having the same issue. What is your table's size?

learninghuman · ‎03-25-2016

@Joseph Niemiec You mentioned "Left outerjoin and test for null in the WHERE is probably better for scaling then UNION DISTINCT if you are worried about a reducer problem. Same join syntax as the example below..." How left outer join avoids reducer (unless its a map join)? Do you recommend left outer join than union distinct? And in the point "We have found a fun case where if you try to use this to dedupe or clean.....", so my understanding is that if a partition has 5 records which are duplicates (the initial master load already had it), there is no way to remove unless a 6th records which is a duplicate of those 5 records come in the staging load. Am i right? If so, what is your recommendation to remove duplicates in the initial load itself?

Online	Offline
Last Visited	‎07-05-2016 08:52 AM

Member Since	‎06-29-2016 09:30 AM
Last Visited	‎07-05-2016 08:52 AM
Posts	81
Kudos received	43

Cloudera Community

Re: Mapreduce and Hcatalog Integration fails to us...

Re: HDInsight Vs HDP Service on Azure Vs HDP on Az...

Re: Feasibility and recommendation for running HDF...

Re: HDP on Mesos using Marathon, Docker

Re: Scaling and Auto-scaling of HDP on AWS and Azu...

Re: HDP on Cloud (Azure, AWS) - Storage options an...

Re: Hadoop for Operational data store

Re: Hadoop data linking from multiple sources

Re: Hive ORC or AVRO format with field delimiters

Re: Sqoop --split-by on a string /varchar column

Re: Remove duplicates Using Map reduce or Hive