Member since
09-24-2015
98
Posts
76
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2854 | 08-29-2016 04:42 PM | |
5696 | 08-09-2016 08:43 PM | |
1743 | 07-19-2016 04:08 PM | |
2466 | 07-07-2016 04:05 PM | |
7414 | 06-29-2016 08:25 PM |
11-12-2019
06:16 AM
Hi, Could you please let us know what do you mean by not yet deployed? Does you mean that the jobs that has not been kicked off after you running the spark submit command (or) Could you please explain in detail. Thanks Akr
... View more
09-13-2016
08:25 PM
1 Kudo
@Kirk Haslbeck Michael is correct you will get 5 total executors
... View more
01-31-2018
09:20 AM
how to set JAVA_HOME path ?
... View more
08-16-2016
11:00 AM
3 Kudos
Apache Zeppelin (version 0.6.0) includes the ability to securely authenticate users and require logins. It uses the Apache Shiro security framework to accomplish this objective. Note: prior versions of Zeppelin did not force users to login. After launching the HDP 2.5 Tech Preview Sandbox on a virtual machine, make sure the Zeppelin service is up and running via Ambari. Next, open the Zeppelin UI either by clicking on: Services (tab) -> Zeppelin notebook (left-hand panel) -> Quick Links (tab) -> "Zeppelin UI" (button) or just by opening a browser at: http://sandbox.hortonworks.com:9995/ (or http://127.0.0.1:9995/) The Zeppelin welcome page should show in the browser, and you should notice a "Login" button in the upper right-hand corner. This will bring up a pop-up window with text entries for username and password. Enter one of the username/password pairs below (these are the defaults listed in the "shiro.ini" file located in the "conf" sub-directory of zeppelin): Username/Password pairs:
admin/password1
user1/password2
user2/password3
user3/password4
If you want to change these passwords or add more users, you can use the "Credentials" tab of the Zeppelin notebook to create additional usernames. After entering the credentials, you will be logged in and the existing notebooks will display on the left-hand side of the Zeppelin screen. If you enter the wrong username or password, you will be directed back to the Welcome page. FYI: For more information about Zeppelin security, see this link: https://github.com/apache/zeppelin/blob/master/SECURITY-README.md FYI: For more detailed information about Apache Shiro configuration options, see this link: http://shiro.apache.org/configuration.html#Configuration-INISections
... View more
Labels:
08-09-2016
09:22 PM
4 Kudos
Just a few months ago, Apache Storm announced release 1.0 for the distribution. The bullet points below summarize the new features available. For more detailed descriptions, you can go to this link to read the full release notes: http://storm.apache.org/2016/04/12/storm100-released.html Apache Storm 1.0
Release Apache Storm 1.0 is *up to 16 times faster than
previous versions, with latency reduced up to 60%.” Pacemaker – Heartbeat
Server Pacemaker is an optional Storm daemon designed
to process heartbeats from workers. (overcomes scaling problems of
Zookeeper) Distributed
Cache API Files in the distributed cache can be updated
at any time from the command line, without the need to redeploy a
topology. HA
Nimbus Multiple instances of the Nimbus service run in
a cluster and perform leader election when a Nimbus node fails Native
Streaming Window API Storm has support for sliding and tumbling
windows based on time duration and/or event count. Automatic
Backpressure Storm now has an automatic backpressure
mechanism based on configurable high/low watermarks expressed as a percentage
of a task's buffer size. Resource
Aware Scheduler The new resources aware scheduler (AKA "RAS
Scheduler") allows users to specify the memory and CPU requirements for
individual topology components Storm makes it easier to debug, with… Dynamic Log Levels Tuple Sampling and
Debugging Dynamic Worker
Profiling
... View more
Labels:
08-09-2016
08:43 PM
First, you should try to take advantage if your data is stored in splittable formats (snappy, LZO, bzip2, etc). If so, then instruct Spark to split the data into multiple partitions upon read. In Scala, you can do this: file = sc.textFile(Path, numPartitions) You will also need to tune your YARN container sizes to work with your executor allocation. Make sure your Max Yarn Mem Alloc ('yarn.scheduler.maximum-allocation-mb') is bigger than what you are asking for per executor (this will include the default overhead of 384 MB). The following parameters are used to allocate Spark executors and driver memory: spark.executor.instances -- number of spark executors
spark.executor.memory -- memory per spark executors (plus 384 MB overhead)
spark.driver.memory -- memory per spark driver 6MB file is pretty small, much smaller than HDFS block size, so
you are probably getting a single partition until you do something to
repartition it. You can also set numPartitions parameter like this: I would probably call one of these repartition methods on
your DataFrame: def repartition(numPartitions: Int, partitionExprs: Column*): DataFrame
Returns a new DataFrame partitioned by the given partitioning expressions into numPartitions. The resulting DataFrame is hash partitioned.
OR this: def repartition(numPartitions: Int): DataFrame
Returns a new DataFrame that has exactly numPartitions partitions.
... View more
07-28-2017
08:19 AM
We were into the same scenario where Zeppelin was always launching the 3 Containers in YARN even after having the Dynamic allocation parameters enabled from Spark but Zeppelin is not able to pick these parameters,
To get the Zeppelin to launch more than 3 containers (the default it is launching) we need to configure in the Zeppelin Spark interpreter spark.dynamicAllocation.enabled=true
spark.shuffle.service.enabled=true
spark.dynamicAllocation.initialExecutors=0
spark.dynamicAllocation.minExecutors=2 --> Start this value with the lower number, if not it will launch number of the minimum containers specified and will only use the required containers (memory and VCores) and rest of the memory and VCores will be marked as reserved memory and causes memory issues
spark.dynamicAllocation.maxExecutors=10
And it is always good to start with less executor memory (e.g 10/15g) and more executors (20/30) Our scenario we have observed that giving the executor memory (50/100g) and executors as (5/10) the query took 3min 48secs (228sec) --> which is obvious as the parallelism is very less and reducing the executor memory (10/15g) and increasing the executors (25/30) the same query took on 54secs. Please note the number of executors and executor memory are usecase dependent and we have done few trails before getting the optimal performance for our scenario.
... View more
05-19-2016
08:05 AM
Very big thnak you
... View more
05-06-2016
07:06 PM
Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan?
... View more
11-02-2017
03:15 PM
This problem was noted while running HDP 2.5. Apparently this problem was fixed in version 2.6. However, this required also updating the Oracle VM VirtualBox to version 5.1.30. This problem has not reappeared.
... View more