Member since 
    
	
		
		
		09-02-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                523
            
            
                Posts
            
        
                89
            
            
                Kudos Received
            
        
                42
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2723 | 08-28-2018 02:00 AM | |
| 2695 | 07-31-2018 06:55 AM | |
| 5677 | 07-26-2018 03:02 AM | |
| 2979 | 07-19-2018 02:30 AM | |
| 6459 | 05-21-2018 03:42 AM | 
			
    
	
		
		
		04-16-2018
	
		
		04:05 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @null_pointer     For some reason I cannot see the image that you have uploaded, still i got your point and trying to answer your question     we cannot always match/compare the memory usage from CM vs linux for various reasons  1. Yes, as you said CM only takes count of memory used by Hadoop components and it won't count consider if you have any other applications running on your local linux as CM designed to monitor only Hadoop and dependent services  2. (I am not sure you are getting the CM report from host monitor) There are practical difficulties to get memory usage of each client node in a single report. Ex: Consider you have 100+ nodes and each node has different memory capacity like 100 GB, 200 GB, 250 GB, 300 GB, etc, it is difficult to generate a single report to get memory usage of each client      still if the default report available in CM is not meeting your requirement, may be you can try to build custom chart from CM -> Chart (menu) -> your tsquery     https://www.cloudera.com/documentation/enterprise/5-9-x/topics/admin_cluster_util_custom.html       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-15-2018
	
		
		09:19 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Aedulla     here you go,....     http://www.bayareabikeshare.com/open-data  https://grouplens.org/datasets/movielens/  https://www.nyse.com/market-data/historical     also you can use the below free hue access (login uid: demo, pwd: demo) where you can get some pre-existing data for hive, impala, hbase, etc. Note: if you are getting any exception after login, pls try after sometime or raise a ticket, so that someone from hue team will fix the issue    http://demo.gethue.com    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2018
	
		
		05:08 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bukangarii     as long as you have jdbc connectivity to your legacy system, it is possible to export the parquet hive table to your legacy system     please check the sqoop guide document to understand the supporting data types 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-10-2018
	
		
		10:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @ludof     no need to do it everytime, because in general once you done kinit, it will be valid for 24 hours (you can customize if you want), so do it once a day manually or you can automate it in some scenarios using cron jobs  ex: you have jobs round the clock, more than one users are using the same user/batchid for a project, etc 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-09-2018
	
		
		11:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @hedy     can you try to run the 2nd pyspark command from a different user id?     because it seems this is normal issue according to the below link     https://support.datastax.com/hc/en-us/articles/207356773-FAQ-Warning-message-java-net-BindException-Address-already-in-use-when-launching-Spark-shell    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-09-2018
	
		
		11:46 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @ludof     all you have to do is, run the kinit command and give the kerberos password before you start your spark session and continue with your steps, it will be fixed 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-09-2018
	
		
		10:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @hedy     In general one port will allow one session (one connection) at a time, so your 1st session connects to the default port 4040 and your 2nd session is trying to connect to the same port but got the bind issue, so trying to connect to the next port but it is not working     there are two things that you need to check  1. please make sure the port 4041 is open   2. On your second session, when you run pyspark, pass the avilable port as a parameter.          Ex: Long back i've used spark-shell with different port as parameter, pls try similar option for pyspark       session1: $ spark-shell --conf spark.ui.port=4040       session2: $ spark-shell --conf spark.ui.port=4041          if 4041 is not working you can try upto 4057, i think thease are the available port for spark by default 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-09-2018
	
		
		10:20 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @RajeshBodolla     Not sure I get your intension to have multiple datanodes on the same machine     if you want to store data nodes in different/multiple directories in the same machine then you can use CM -> HDFS -> Configuration -> datanode.data.dir and specify your directories 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-12-2018
	
		
		08:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    srinivas ?? 🙂     @Cloudera learning     Is it struck when 1 or 2 blocks left over?      As mentioned earlier, you can monitor this from CM -> HDFS -> WebUI -> Namenode Web UI -> a new window will open, 'Datanodes' menu -> scroll down to Decommissioning (keep refresh this page to get the progress)     If your answer is yes for my above question, then I got the similar issues few times and I've over come this issue as follows:     1. CM -> Hosts -> Abort the decomm process  2. CM -> HDFS -> Instance -> Node -> Stop  3. Try to decommission the same node again for the left over blocks     Note: Some times you may struck again, retry couple of times 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













