Member since 
    
	
		
		
		10-25-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                18
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3776 | 08-24-2017 07:32 AM | |
| 52098 | 08-14-2017 08:00 AM | |
| 13442 | 04-18-2017 10:11 AM | 
			
    
	
		
		
		11-07-2017
	
		
		02:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							    Below is the error message i received .     1137 2017-11-07 03:55:11,536  [INFO ] There are no more tasks to run at this time  1138 Starting Impala Shell without Kerberos authentication  1140 Server version: impalad version 2.6.0-cdh5.8.4 RELEASE (build 207450616f75adbe082a4c2e1145a2384da83fa6)  1141 Invalidating Metadata  1142 Query: invalidate metadata  1143  1144 Fetched 0 row(s) in 4.11s  1145 Query: use `DBNAME`  1146 Query: insert overwrite table Table partition(recordtype) select adid,seg,profile,livecount,  1147 count(distinct mc) as nofs,stbnt,1 from table1 where livecount<>0 group by adid,seg,profile,livecount,stbnt  1148 WARNINGS:  1149 CatalogException: Table 'dbname.table' was modified while operation was in progress, aborting execution.  1150  1151 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-07-2017
	
		
		10:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Did anyone look into this issue?  I am also facing the same issue.I am using CDH5.10.2  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-24-2017
	
		
		07:32 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The below link connects and filter results from API to get last 30 days of jobs and limit is increased to 10000 jobs at a time.  Note: All times "endTime=1503438687495" are EPOC time, so filter your times based on requirement.  Also set limit to what ever number of jobs you want to be displayed.     http://cloudera-manager-host-ip/cmf/yarn/completedApplications?startTime=1500758462000&endTime=1503438687495&offset=0&limit=10000&serviceName=yarn&histogramAttributes=allocated_memory_seconds%2Callocated_vcore_seconds%2Ccpu_milliseconds%2Capplication_duration%2Chdfs_bytes_read%2Chdfs_bytes_written 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-22-2017
	
		
		01:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All  I am trying to get all hive queries for last 30 days ran on my cluster.  I selected all the jobs on cloudera YARN application UI and filtered for last 30 days and also selected attribute of hive_query_string, which allows me to see the actual query.  The only issue is that cloudera restricts UI to show only last 100 jobs at a time.  Because of this i can't get details on all the other queries.  I tried to hit this api http://cluster:8088/ws/v1/cluster/apps to get all the details, two issue: there is no filter and other is there is no hive_query_string variable to filter. It just shows me all the job details.  Is there an API exposed by cloudera where i can filter and get all the hive_query_string or if i can configure the cloudera UI to show me more than 100 jobs(or just export it).     Let me know.     Thanks  AB 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Cloudera Manager
 - 
						
							
		
			Cloudera Search
 
			
    
	
		
		
		08-14-2017
	
		
		08:00 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hi All     Thanks @Yuexin Zhang for the response.  I figured out the solution for this.  Below is the actual submit which worked for me.  The catch here is that when we submit in cluster mode, it uploads the file to a staging dir on hdfs.  Now the path and name of the file is different on hdfs then what it expects in the program.  To make that file available in the program, u have to make an alias for that file with '#' like mentioned below. (thats the only trick).  Now everywhere, u need to refer to that file, just mention that alias on spark submit command.  I mentioned the complete walkthrough and how to reach the solution in below links i referred to.     Issue also discussed here - https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149 - (Didn't actually helped me resolved, so i posted it separately)  Section "Important notes" in http://spark.apache.org/docs/latest/running-on-yarn.html ( Kinda have to read between the lines)  Blog explaining the reason - http://progexc.blogspot.com/2014/12/spark-configuration-mess-solved.html (Nice blog 🙂 )     spark-submit \
--master yarn \
--deploy-mode cluster \
--class myCLASS \
--properties-file /home/abhig/spark.conf \
--files /home/abhig/application.conf#application.conf,/home/abhig/log4.properties#log4j \
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=log4j" \
--conf spark.driver.extraJavaOptions="-Dconfig.file=application.conf -Dlog4j.configuration=log4j" \
/local/project/gateway/mypgm.jar        Hope this helps the next person facing similar issue! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-11-2017
	
		
		10:15 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All     I have been trying to submit below spark job in cluster mode through a bash shell.  Client mode submit works perfectly fine. But when i switch to cluster mode, this fails with error, no app file present.  App file refers to missing application.conf.     spark-submit \  --master yarn \  --deploy-mode cluster \  --class myCLASS \  --properties-file /home/abhig/spark.conf \  --files /home/abhig/application.conf \  --conf "spark.executor.extraJavaOptions=-Dconfig.resource=application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \  --driver-java-options "-Dconfig.file=/home/abhig/application.conf -Dlog4j.configuration=/home/abhig/log4.properties" \  /loca/project/gateway/mypgm.jar     I followed the link below on similar post  https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Spark-File-not-found-error-works-fine-in-local-mode-but-failed/m-p/32306#M1149   This solution mentioned is still not clear.     I even tried    --files $CONFIG_FILE#application.conf  Still it doesn't work.  Any help will be appreciated.     Thanks  AB 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Spark
 
			
    
	
		
		
		04-18-2017
	
		
		10:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Changing default umask through cloudera manager properties of HDFS from 022 to 002 helped out to get child dir inherit the permissions from parent dir. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-05-2017
	
		
		01:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All  How do i ensure that the child dir and files created by a member of a group having rwx permissions on hdfs have the same rwx permission as parent?  I tried chmod and acls both as suggested by apache and cloudera. All the new dirs created by a user in a group having permission to write are still having the r-x permissions instead of rwx which i want.  I have also enabled dfs.namenode.posix.acl.inheritance.enabled to true and dfs.permissions also to true as mentioned in https://issues.apache.org/jira/browse/HDFS-6962.  fs.permissions.umask-mode=000  dfs.umaskmode, fs.permissions.umask-mode=022  [root@dev ~]# id abhig  uid=515(abhig) gid=519(abhig) groups=519(abhig),525(low_priority),528(devgrp)  ********************************************  [abhig@dev ~]$ hdfs dfs -setfacl -m default:group:devgrp:rwx /test  [abhig@dev ~]$ hdfs dfs -getfacl /test  # file: /test  # owner: abhig  # group: devgrp  user::rwx  group::r-x  other::r-x  default:user::rwx  default:group::r-x  default:group:devgrp:rwx  default:mask::rwx  default:other::r-x  ********************************************  [abhig@dev ~]$ hdfs dfs -mkdir /test/tst1  [abhig@dev ~]$ hdfs dfs -getfacl /test/tst1  # file: /test/tst1  # owner: abhig  # group: devgrp  user::rwx  group::r-x  group:devgrp:rwx #effective:r-x  mask::r-x  other::r-x  default:user::rwx  default:group::r-x  default:group:devgrp:rwx  default:mask::rwx  default:other::r-x  *********************************************  This doesn't help much  https://community.cloudera.com/t5/Storage-Random-Access-HDFS/HDFS-ACL-Inheritance/m-p/25494#M1092      Please give a workaround if any. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			HDFS
 
			
    
	
		
		
		04-04-2017
	
		
		10:43 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi  Has anyone faced issue with impala catalog server creating jar files and storing it in /tmp dir on the node where impala catalog server is running? Like below.  Everytime i run invalidate metadata on any database, it creates 2 jar files like below and doesnt delete those after it finishes.  These files keep on increarsing and overtime start causing below issue  Query: invalidate metadata  ERROR:  FSError: java.io.IOException: No space left on device  CAUSED BY: IOException: No space left on device     Also resulting in failure of catalog server and impala daemon eventually.  This happens with only one of my cluster. All my clusters are running on same version. No change done recently.  Using CDH 5.8.3, Impala Shell v2.6.0-cdh5.8.3.     Jar files created on running invalidate metadata command.  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:54 0079d271-f044-46be-9580-7d98cd4fced2.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 1dd950d5-2db9-4a88-9043-02dff869697c.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 2c910b59-83fc-4cde-96d3-3ac086f9fcb2.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 13:02 318ef680-2ea2-47fb-b688-8df439af676c.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:50 3b15f36a-5353-4553-bee6-96c5ac807703.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 43e13701-dad8-4892-a19c-43125dbaf1e1.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:50 51a5b806-fb0c-444a-a989-87e832501711.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:55 54ef3cd7-ea8d-4932-a032-9b3f63f5a60b.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:58 81dd054d-d720-4d11-826a-eba2518e4381.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:53 93cc7752-c80c-4f70-a47c-14681fd5c487.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:58 9a25bded-1cd7-4773-8962-0fc41705f728.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:52 a70b6d71-191e-494a-a78e-63a03642e0e9.jar  -rw-r--r-- 1 impala impala 68246195 Apr 4 12:54 a7a102f8-9abe-4388-90a4-8abd85bb9c09.jar    
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Impala
 
			
    
	
		
		
		03-15-2017
	
		
		06:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							  Was unable to re direct completed spark jobs from yarn to spark history server even though all permissions and spark conf was set correctly.  Might be useful.     The issue was we were passing a spark.conf file while submitting the spark job hoping the config changes would be aggregated with default parameters from default spark.conf.  Turns out it overrides the default spark config file. Even if you pass blank spark conf it will not consider the default spark.conf for the job.  We had to below 3 lines on the custom spark conf file to enable log aggregation at spark history server and URL at resource manager to point to spark history server.  This has to be done with every spark job. If a job is submitted with below 3 parms it will not be available in spark history server even if u restart anything.  ```spark.eventLog.enabled=true  spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory  spark.yarn.historyServer.address=http://sparkhist-dev.visibleworld.com:18088```  https://community.cloudera.com/t5/CDH-Manual-Installation/Permission-denied-user-mapred-access-WRITE-inode-quot-quot-hdfs/td-p/16318/page/2 
						
					
					... View more