Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                178
            
            
                Posts
            
        
                113
            
            
                Kudos Received
            
        
                28
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4652 | 05-25-2016 02:39 AM | |
| 4591 | 05-03-2016 01:27 PM | |
| 1197 | 04-26-2016 07:59 PM | |
| 16799 | 03-24-2016 04:10 PM | |
| 3156 | 02-02-2016 11:50 PM | 
			
    
	
		
		
		12-11-2015
	
		
		11:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I see 'Connection Refused' which means either a service is down or connection to wrong port. Like Deepesh said, appears to be former and that History server is down.   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-11-2015
	
		
		07:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 @Matthew bird You need a home directory for the user in HDFS so here is what is needed -   #Login as root to the sandbox
su - hdfs 
hdfs dfs -mkdir /user/root
hdfs dfs -chown root:hadoop /user/root
hdfs dfs -chmod 755 /user/root  Try to run the pig script after you've done the above steps. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-11-2015
	
		
		06:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Amit Jain  Atlas has ton of exciting features in the roadmap and definitely plans for two way metadata exchange with other metadata management tools. As of right now (& this may change), the plan is to exchange the lineage information with other tools too, to be able to provide an end-to-end lineage of data from source system, all the way to the final destination. With that said, it seems very unlikely that in a large enterprise setting you would replace all other metadata tools with one magical tool.   Typically speaking the governance tools are expected to tap into the data processes automatically and non-intrusively to gather lineage information and this require native hooks into those data processes. Atlas has and will continue to expand, when it comes to native hooks for processing that takes place in a Hadoop cluster but I doubt there is any interest in tapping natively into the processes going on into other systems like data warehousing system, transactional, operational and reporting systems. For those pieces (metadata and lineage) from external systems, Atlas will continue to rely on and integrate with other metadata tools.   Just like Hadoop, other components in overall data architecture have their roles and place so they will continue to exist and so will the governance tools for those components. Vendors need to and (most likely) will work together to provide a seamless experience to the customers.   If you havent watched this presentation from Andrew Ahn, PM for Governance Tools at HWX, I would highly recommend it to understand better where Atlas is going - https://www.youtube.com/watch?time_continue=3&v=LZ...  Hope this helps. Let me know if you have any follow up question.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-11-2015
	
		
		02:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There are few solutions -   1. The easy solution - grant permission on files to root user. In this case, looks like the file has wide open permission but because the file is under another user's home directory, may be root user does not have access to the guest home directory. So, check the permission for /user/guest and adjust if needed.  2. Use the correct user for the job - I like to create a service Id for data processing and not use local super users like (root) or hdfs super users like (hdfs). So you can use users like guest and inbuilt test user ambari-qa. The user is identify based on their local OS identity so you can switch user to guest before running the process.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-10-2015
	
		
		02:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Hajime - The best way to find the nodemanager heap size and other memory settings is to calculate it specifically for your cluster size and hardware spec. Here is the utility that you can use -   http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-...  Snippet    hdp-configuration-utils.sh options  where options are as follows:  Table 1.1. hdp-configuration-utils.sh Options  Option  Description -c CORESThe number of cores on each host.-m MEMORYThe amount of memory on each host in GB.-d DISKSThe number of disks on each host.-k HBASE"True" if HBase is installed, "False" if not. The output recommendation is in this format -   Using cores=16 memory=64GB disks=4 hbase=True
Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4 
Num Container=8
Container Ram=6144MB 
Used Ram=48GB
Unused Ram=16GB
yarn.scheduler.minimum-allocation-mb=6144 
yarn.scheduler.maximum-allocation-mb=49152 
yarn.nodemanager.resource.memory-mb=49152 
mapreduce.map.memory.mb=6144 
mapreduce.map.java.opts=-Xmx4096m 
mapreduce.reduce.memory.mb=6144 
mapreduce.reduce.java.opts=-Xmx4096m 
yarn.app.mapreduce.am.resource.mb=6144 
yarn.app.mapreduce.am.command-opts=-Xmx4096m 
mapreduce.task.io.sort.mb=1792 
tez.am.resource.memory.mb=6144 
tez.am.launch.cmd-opts =-Xmx4096m 
hive.tez.container.size=6144 
hive.tez.java.opts=-Xmx4096m
hive.auto.convert.join.noconditionaltask.size=1342177000 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-04-2015
	
		
		08:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Neeraj Sabharwal Its not the same error. The exception stack trace pasted by OP is originating with Atlas (org.apache.atlas.web.filters.AuditFilter.doFilter) where as the one in the JIRA is within Hadoop. Same exception class different applications.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-04-2015
	
		
		06:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Looking at the ExecuteSQL code here. The capability description reads -   @CapabilityDescription("Execute provided SQL select query. Query result will be converted to Avro format." + " Streaming is used so arbitrarily large result sets are supported. This processor can be scheduled to run on " + "a timer, or cron expression, using the standard scheduling methods, or it can be triggered by an incoming FlowFile. " + "If it is triggered by an incoming FlowFile, then attributes of that FlowFile will be available when evaluating the " + "select query. " + "FlowFile attribute 'executesql.row.count' indicates how many rows were selected." )  Even though above para says - "Streaming is used so arbitrarily large result sets are supported." , it appears that its not referring to the JDBC streaming but the fact that ResultSet is broken down into smaller tuples and sent to next processor as stream.   Here is the snippet of Code to back that assessment -  Query Execution in ExecuteSQL and call to JDBCCommon. convertToAvroStream ->  convertToAvroStream method reading data using getObject method  The getObject method does not seem to support streaming alternative like getAscii etc as described here - https://docs.oracle.com/cd/B28359_01/java.111/b312... 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-04-2015
	
		
		06:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Can you help understand the scenario when this is needed? So the Hive shell is executed but wait until a query is executed for creating AM.. this means there are situations where Hive shell is executed and then exited without executing the query? Wont this be an exception scenario or in your case this is so frequent / regular that a workaround is required. I am sorry, just trying to understand when will such a configuration be needed..  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-04-2015
	
		
		04:57 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This should be updated / corrected then?  Partitioning Recommendations for Slave Nodes 
 Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using ext3, ext4, or XFS, in that order of preference. HDFS on ext3 has been publicly tested on the Yahoo cluster, which makes it the safest choice for the underlying file system. The ext4 file system may have potential data loss issues with default options because of the "delayed writes" feature. XFS reportedly also has some data loss issues upon power failure. Do not use LVM; it adds latency and causes a bottleneck.    Source: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_cluster-planning-guide/content/ch_partitioning_chapter.html   A lot of this conflicts with the reality (Paul's Smartsense statistics) and what we all are discussing here.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-04-2015
	
		
		01:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 My response to your comment was longer than whats allowed for comments so adding as new answer.  
						
					
					... View more