About sunile_manjee

sunile_manjee · ‎03-30-2016

@Mats Johansson Understood. However core of services like hive/pig use map reduce. Does that have the same constraints for node labeling? it seems node labeling is only applicable to storm/spark/kafka/hbase/etc. Services which do not use map reduce as its engine.

sunile_manjee · ‎03-30-2016

I am not able to specify a nodel label when I submit my mapreduce job. Only yarn distributed shell job are allowed for node labels. How to run run the mapreduce job as yarn distributed shell job?

sunile_manjee · ‎03-29-2016

Does Ranger provide TDE for hbase or solr as it does for HDFS?

sunile_manjee · ‎03-29-2016

@Vadim thanks for sharing. This article is about R on hadoop. I am intersted in R on Spark.

sunile_manjee · ‎03-28-2016

@Vadim Are suggesting R can not run in distributed mode?

sunile_manjee · ‎03-28-2016

Also verify CBO is turned on and you are utilizing vectorization (helps with memory). Run a explain plan on the query to determine what types of joins are being used.

sunile_manjee · ‎03-26-2016

@azeltov @Paul Hargis R is used for model training. Can this be done distributed fashion across your HDP cluster? Right now I am seeing a few data scientist run strictly on edge node and not utilizing their spark data nodes. What am i missing? it is difficult to get all the R libraries on each spark data node? If if R libraries are pushed to data nodes will the model training run distributed mode? will it run parallel execution?

sunile_manjee · ‎03-24-2016

both sites I shared have files over 1gb. Definitly the flight data by year is more then 1gb.

sunile_manjee · ‎03-24-2016

When flushing occur - adjacent families are flushed as well. Does that mean all regions on region server or all CF for that specific table are flushed?

sunile_manjee · ‎03-24-2016

I agree with @Benjamin Leonhardi. Provide the log file because vectorization occurs during opertaions like scans, filters, aggregates, and joins.

Online	Offline
Last Visited	‎05-25-2022 10:07 AM

Member Since	‎05-30-2018 10:40 PM
Last Visited	‎05-25-2022 10:07 AM
Posts	1,322
Kudos received	713

Cloudera Community

Re: Iterate over ADLS files using spark?

Re: Install NiFi CA service post nifi cluster inst...

Re: Which storage format is optimum for training m...

Re: Ambari custom alert failing

Re: df.cache() is not working on jdbc table

Re: Yarn Distributed Shell - MapReduce job

Yarn Distributed Shell - MapReduce job

Does Ranger provide TDE for hbase or solr?

Re: Model training outside of edge node?

Re: Model training outside of edge node?

Re: HIVE job failed on TEZ

Re: Model training outside of edge node?

Re: i need more than 1gb csv file, could anybody h...

HBase memstore adjacent families flush

Re: What is the term 'Vectorization' used while up...