Member since
05-30-2018
1322
Posts
715
Kudos Received
148
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4025 | 08-20-2018 08:26 PM | |
| 1929 | 08-15-2018 01:59 PM | |
| 2360 | 08-13-2018 02:20 PM | |
| 4075 | 07-23-2018 04:37 PM | |
| 4993 | 07-19-2018 12:52 PM |
03-30-2016
06:25 PM
@Mats Johansson Understood. However core of services like hive/pig use map reduce. Does that have the same constraints for node labeling? it seems node labeling is only applicable to storm/spark/kafka/hbase/etc. Services which do not use map reduce as its engine.
... View more
03-30-2016
04:29 AM
1 Kudo
I am not able to specify a nodel label when I submit my mapreduce job. Only yarn distributed shell job are allowed for node labels. How to run run the mapreduce job as yarn distributed shell job?
... View more
Labels:
- Labels:
-
Apache YARN
03-29-2016
04:30 PM
1 Kudo
Does Ranger provide TDE for hbase or solr as it does for HDFS?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Ranger
-
Apache Solr
03-29-2016
02:29 AM
@Vadim thanks for sharing. This article is about R on hadoop. I am intersted in R on Spark.
... View more
03-28-2016
02:28 PM
Also verify CBO is turned on and you are utilizing vectorization (helps with memory). Run a explain plan on the query to determine what types of joins are being used.
... View more
03-26-2016
01:19 AM
@azeltov @Paul Hargis R is used for model training. Can this be done distributed fashion across your HDP cluster? Right now I am seeing a few data scientist run strictly on edge node and not utilizing their spark data nodes. What am i missing? it is difficult to get all the R libraries on each spark data node? If if R libraries are pushed to data nodes will the model training run distributed mode? will it run parallel execution?
... View more
03-24-2016
05:58 PM
both sites I shared have files over 1gb. Definitly the flight data by year is more then 1gb.
... View more
03-24-2016
03:10 PM
1 Kudo
When flushing occur - adjacent families are flushed as well. Does that mean all regions on region server or all CF for that specific table are flushed?
... View more
Labels:
- Labels:
-
Apache HBase
03-24-2016
02:46 PM
I agree with @Benjamin Leonhardi. Provide the log file because vectorization occurs during opertaions like scans, filters, aggregates, and joins.
... View more