Member since 
    
	
		
		
		07-31-2013
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                1924
            
            
                Posts
            
        
                462
            
            
                Kudos Received
            
        
                311
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1968 | 07-09-2019 12:53 AM | |
| 11843 | 06-23-2019 08:37 PM | |
| 9131 | 06-18-2019 11:28 PM | |
| 10105 | 05-23-2019 08:46 PM | |
| 4559 | 05-20-2019 01:14 AM | 
			
    
	
		
		
		03-21-2019
	
		
		05:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							The search-based HDFS find tool has been removed and is superseded in C6 by the native "hdfs dfs -find" command, documented here: https://hadoop.apache.org/docs/r3.1.2/hadoop-project-dist/hadoop-common/FileSystemShell.html#find
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-20-2019
	
		
		04:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Flume scripts need to be run under a Bash shell environment, but it appears  that you are trying PowerShell in Windows.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-17-2019
	
		
		06:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Thank you for confirming the details,    Does the subject part of your klist output match the added username in the HBase Superusers configuration precisely?    If your user is in a different realm than the cluster services, is the realm name present as part of HDFS -> Configuration -> 'Trusted Realms'?    Are all commands done as the superuser failing? What HBase shell command/operation specifically is leading to your quoted error?    As to adding groups, it can be done in the same field, except you need to add an '@' prefix to the name. For ex. if your group is cluster_administrators, then add it in as '@cluster_administrators' in the HBase Superusers config. When using usernames, the @ must not be specified. Both approaches should work though.     P.s. If you'll be relying on groups, ensure all cluster hosts return consistent group lookup output for id <user> commands, as the authorization check is distributed across the cluster roles for HBase. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2019
	
		
		06:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Please create a new thread for distinct questions, instead of bumping an  older, resolved thread.    As to your question, the error is clear as is the documentation, quoted  below:    """  Spooling Directory Source    This source lets you ingest data by placing files to be ingested into a  “spooling” directory on disk. This source will watch the specified  directory for new files, and will parse events out of new files as they  appear. The event parsing logic is pluggable. After a given file has been  fully read into the channel, it is renamed to indicate completion (or  optionally deleted).    Unlike the Exec source, this source is reliable and will not miss data,  even if Flume is restarted or killed. In exchange for this reliability,  only immutable, uniquely-named files must be dropped into the spooling  directory. Flume tries to detect these problem conditions and will fail  loudly if they are violated:    If a file is written to after being placed into the spooling directory,  Flume will print an error to its log file and stop processing.  If a file name is reused at a later time, Flume will print an error to its  log file and stop processing.  """ -  https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source    It appears that you can get around this by using ExecSource with a script  or command that reads the files, but you'll have to sacrifice reliability.  It may be instead worth investing in an approach that makes filenames  unique (`uuidgen` named softlinks in another folder, etc.)  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2019
	
		
		06:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Could you share your CDH version? I'm unable to reproduce this with a  username added (without @ character prefix) to the config you've mentioned  in the recent CDH 6.x releases.    By 're-deployed' did you mean restart? I had to restart the service for all  hosts to see the new superuser config.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-07-2019
	
		
		09:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							It appears that you're trying to use Sqoop's internal handling of DATE/TIMESTAMP data types, instead of using Strings which the Oracle connector converts them to.    Have you tried the option specified at https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_java_sql_timestamp?    -Doraoop.timestamp.string=false    You shouldn't need to map the column types manually in this approach.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-07-2019
	
		
		06:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							You'll need to use lsof with a pid specifier (lsof -p PID). The PID must be your target RegionServer's java process (find via 'ps aux | grep REGIONSERVER' or similar).    In the output, you should be able to classify the items as network (sockets) / filesystem (files) / etc., and the interest would be in whatever holds the highest share. For ex. if you see a lot more sockets hanging around, check their state (CLOSE_WAIT, etc.). Or if it is local filesystem files, investigate if those files appear relevant.    If you can pastebin your lsof result somewhere, I can take a look.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-06-2019
	
		
		11:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							MapReduce jobs can be submitted with ease, as all they mostly require is the correct config on the classpath (such as under src/main/resources for Maven projects).    Spark/PySpark greatly relies on its script tooling to submit to a remote cluster so it is a little more involved to achieve this. IntelliJ IDEA has a remote execution option in its run targets that can be configured to copy over the build jar and invoke any arbitrary command on an edge host. This can be combined with remote debugging perhaps to get equal experience as MR.    Another option is to use a web interface based editor such as CDSW.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-06-2019
	
		
		07:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							> can we deploy the HttpFS role on more than one node? is it best practice?    Yes, the HttpFs service is an end-point for REST API access to HDFS, so you can deploy multiple instances and also consider load balancing (might need sticky sessions for data read paging).    > we can see that new logs are created on opt/hadoop/dfs/nn/current on the actine namenode on node01 but no new files . on the standby namenode no node02 - is it OK ??    Yes, this is normal. The new edit logs are redundantly written only when the NameNode is active. At all times the edits are primarily always written into the JournalNode directories.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-06-2019
	
		
		06:33 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							It is not normal to see the file descriptors limit run out or run close to limit unless you have an overload problem of some form. I'd recommend checking via 'lsof' what is the major contributor towards the FD count for your RegionServer process - chances are it is avoidable (a bug, a flawed client, etc.).    The number should be proportional to your total region store file counts and the number of connecting clients. While the article at https://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/ focuses on DN data transceiver threads in particular, the formulae at the end can be applied similarly to file descriptors in general too.
						
					
					... View more