Member since
09-23-2015
800
Posts
898
Kudos Received
185
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5395 | 08-12-2016 01:02 PM | |
2200 | 08-08-2016 10:00 AM | |
2602 | 08-03-2016 04:44 PM | |
5496 | 08-03-2016 02:53 PM | |
1419 | 08-01-2016 02:38 PM |
05-24-2016
04:28 PM
yes you would need to configure user sync with ldap/ad in the ranger ui. Alternatively use UNIX user sync in Ranger to sync with the local operating system. ( Works as well )
... View more
05-24-2016
12:48 PM
1 Kudo
Alternatively use kerberos and kerberize the HDFS UI. In this case only SPNEGO enabled browsers will be able to access the ui and you will have the same filesystem access restrictions as users have when directly accessing hdfs.
... View more
06-09-2016
10:34 AM
Can you give me top 50 min, max and the average. Also did you try the query ? What was the behaviour ? The reason I am asking that if your query is very long using a few number of reducer for example it may imply the skew and so to maximize usage of the cluster one way is too look at surrogate key creation.
... View more
05-16-2016
01:15 PM
Thanks Eric 🙂 I think that I will have some "troubles" to analyze and segment the data in Spark Step because I will need to create some rules to make that division
... View more
05-16-2016
02:38 PM
2 Kudos
Get data into the cluster? Easiest way is to have a delimited file and do hadoop fs -put file <hdfs location> You can then read those files with sc.textFile. You should go through a couple of basic tutorials I think to work with hadoop: http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/
... View more
05-16-2016
11:14 AM
1) yes you can see the "Tez session was closed ... 2) In anything after HDP2 tez is enabled by default. MapReduce might be going away as an option anyway 3) You can still use set execution engine in queries set hive.execution.engine=mr or tez 4) Not sure what you mean with utiliy. The Tez view in ambari would provide the functionality I am not completely sure about the out of the box integration with resource manager https://www.youtube.com/watch?v=xyqct59LxLY
... View more
05-10-2016
05:11 PM
2 Kudos
Here is a great writeup on file compression in Hadoop - http://comphadoop.weebly.com/
... View more
05-10-2016
11:02 AM
Contrary to popular believe Spark is not in-memory only a) Simple read no shuffle ( no joins, ... ) For the initial reads Spark like MapReduce reads the data in a stream and processes it as it comes along. I.e. unless there is a reason spark will NOT materialize the full RDDs in memory ( you can tell him to do it however if you want to cache a small dataset ) An RDD is resilient because spark knows how to recreate it ( re read a block from hdfs for example ) not because its stored in mem in different locations. ( that can be done too though. ) So if you filter out most of your data or do an efficient aggregation that aggregates on the map side you will never have the full table in memory. b) Shuffle This is done very similarly to MapReduce as it writes the map outputs to disc and reads them with the reducers through http. However spark uses an aggressive filesystem buffer strategy on the Linux filesystem so if the OS has memory available the data will not be actually written to physical disc. c) After Shuffle RDDs after shuffle are normally cached by the engine ( otherwise a failed node or RDD would require a complete re run of the job ) however as abdelkrim mentions Spark can spill these to disc unless you overrule that. d) Spark Streaming This is a bit different. Spark streaming expects all data to fit in memory unless you overwrite settings.
... View more
05-10-2016
01:42 PM
Hi Ed, It would be useful to know if you are aiming for HA or performance. Since it is a small cluster you may use it as a POC and not care much about HA, I don't know. One option not mentioned below is going with 3 masters and 3 slaves in a small HA cluster setup. That allows you to balance services on the masters more and/or dedicate one to be mostly an edge node. If security is a topic that may come in handy. Cheers, Christian
... View more