Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                98
            
            
                Posts
            
        
                76
            
            
                Kudos Received
            
        
                18
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3358 | 08-29-2016 04:42 PM | |
| 6355 | 08-09-2016 08:43 PM | |
| 2380 | 07-19-2016 04:08 PM | |
| 3019 | 07-07-2016 04:05 PM | |
| 8398 | 06-29-2016 08:25 PM | 
			
    
	
		
		
		11-12-2019
	
		
		06:16 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,     Could you please let us know what do you mean by not yet deployed? Does you mean that the jobs that has not been kicked off after you running the spark submit command (or) Could you please explain in detail.        Thanks  Akr 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-13-2016
	
		
		08:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Kirk Haslbeck Michael is correct you will get 5 total executors 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-31-2018
	
		
		09:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 how to set JAVA_HOME path ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-16-2016
	
		
		11:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Apache Zeppelin (version 0.6.0) includes the ability to securely authenticate users and require logins. It uses the Apache Shiro security framework to accomplish this objective. Note: prior versions of Zeppelin did not force users to login.   After launching the HDP 2.5 Tech Preview Sandbox on a virtual machine, make sure the Zeppelin service is up and running via Ambari. Next, open the Zeppelin UI either by clicking on:  Services (tab) -> Zeppelin notebook (left-hand panel) -> Quick Links (tab) -> "Zeppelin UI" (button)  or just by opening a browser at: http://sandbox.hortonworks.com:9995/ (or http://127.0.0.1:9995/)  The Zeppelin welcome page should show in the browser, and you should notice a "Login" button in the upper right-hand corner. This will bring up a pop-up window with text entries for username and password. Enter one of the username/password pairs below (these are the defaults listed in the "shiro.ini" file located in the "conf" sub-directory of zeppelin):  Username/Password pairs: 
admin/password1 
user1/password2 
user2/password3 
user3/password4
  If you want to change these passwords or add more users, you can use the "Credentials" tab of the Zeppelin notebook to create additional usernames.   After entering the credentials, you will be logged in and the existing notebooks will display on the left-hand side of the Zeppelin screen. If you enter the wrong username or password, you will be directed back to the Welcome page.  FYI: For more information about Zeppelin security, see this link:  https://github.com/apache/zeppelin/blob/master/SECURITY-README.md  FYI: For more detailed information about Apache Shiro configuration options, see this link:  http://shiro.apache.org/configuration.html#Configuration-INISections  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-09-2016
	
		
		09:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 Just a few months ago, Apache Storm announced release 1.0 for the distribution. The bullet points below summarize the new features available. For more detailed descriptions, you can go to this link to read the full release notes:  http://storm.apache.org/2016/04/12/storm100-released.html  Apache Storm 1.0
Release  Apache Storm 1.0 is *up to 16 times faster than
previous versions, with latency reduced up to 60%.”  Pacemaker – Heartbeat
Server  Pacemaker is an optional Storm daemon designed
to process heartbeats from workers. (overcomes scaling problems of
Zookeeper)  Distributed
Cache API  Files in the distributed cache can be updated
at any time from the command line, without the need to redeploy a
topology.   HA
Nimbus  Multiple instances of the Nimbus service run in
a cluster and perform leader election when a Nimbus node fails  Native
Streaming Window API  Storm has support for sliding and tumbling
windows based on time duration and/or event count.  Automatic
Backpressure  Storm now has an automatic backpressure
mechanism based on configurable high/low watermarks expressed as a percentage
of a task's buffer size.   Resource
Aware Scheduler  The new resources aware scheduler (AKA "RAS
Scheduler") allows users to specify the memory and CPU requirements for
individual topology components  Storm makes it easier to debug, with…  Dynamic Log Levels  Tuple Sampling and
Debugging  Dynamic Worker
Profiling 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		08-09-2016
	
		
		08:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 First, you should try to take advantage if your data is stored in splittable formats (snappy, LZO, bzip2, etc). If so, then instruct Spark to split the data into multiple partitions upon read. In Scala, you can do this:  file = sc.textFile(Path, numPartitions)  You will also need to tune your YARN container sizes to work with your executor allocation. Make sure your Max Yarn Mem Alloc ('yarn.scheduler.maximum-allocation-mb') is bigger than what you are asking for per executor (this will include the default overhead of 384 MB).  The following parameters are used to allocate Spark executors and driver memory:  spark.executor.instances -- number of spark executors
spark.executor.memory -- memory per spark executors (plus 384 MB overhead)
spark.driver.memory -- memory per spark driver  6MB file is pretty small, much smaller than HDFS block size, so
you are probably getting a single partition until you do something to
repartition it. You can also set numPartitions parameter like this:  I would probably call one of these repartition methods on
your DataFrame:  def repartition(numPartitions: Int, partitionExprs: Column*): DataFrame
Returns a new DataFrame partitioned by the given partitioning expressions into numPartitions. The resulting DataFrame is hash partitioned.
  OR this:  def repartition(numPartitions: Int): DataFrame
Returns a new DataFrame that has exactly numPartitions partitions.
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-28-2017
	
		
		08:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	We were into the same scenario where Zeppelin was always launching the 3 Containers in YARN even after having the Dynamic allocation parameters enabled from Spark but Zeppelin is not able to pick these parameters,   
	To get the Zeppelin to launch more than 3 containers (the default it is launching) we need to configure in the Zeppelin Spark interpreter  spark.dynamicAllocation.enabled=true   
	spark.shuffle.service.enabled=true  
	spark.dynamicAllocation.initialExecutors=0   
	spark.dynamicAllocation.minExecutors=2  --> Start this value with the lower number, if not it will launch number of the minimum containers specified and will only use the required containers (memory and VCores) and rest of the memory and VCores will be marked as reserved memory and causes memory issues  
	spark.dynamicAllocation.maxExecutors=10  
	And it is always good to start with less executor memory  (e.g 10/15g) and more executors (20/30)  Our scenario we have observed that giving the executor memory (50/100g) and executors as (5/10) the query took 3min 48secs (228sec) --> which is obvious as the parallelism is very less and reducing the executor memory (10/15g) and increasing the executors (25/30) the same query took on 54secs.   Please note the number of executors and executor memory are usecase dependent and we have done few trails before getting the optimal performance for our scenario. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2016
	
		
		08:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Very big thnak you 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-06-2016
	
		
		07:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for your help. And do you know if the diagram of the jobs executed after we execute a query, the DAG visualization is about what? That visualization shows the physical or logical plan? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-02-2017
	
		
		03:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This problem was noted while running HDP 2.5.  Apparently this problem was fixed in version 2.6.  However, this required also updating the Oracle VM VirtualBox to version 5.1.30.  This problem has not reappeared. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













