Member since 
    
	
		
		
		01-08-2014
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                88
            
            
                Posts
            
        
                15
            
            
                Kudos Received
            
        
                11
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 7097 | 10-29-2015 10:12 AM | |
| 8030 | 11-27-2014 11:02 AM | |
| 7148 | 11-03-2014 01:49 PM | |
| 3929 | 09-30-2014 11:26 AM | |
| 11203 | 09-21-2014 11:24 AM | 
			
    
	
		
		
		04-05-2016
	
		
		06:23 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please open a new discussion thread for your issue. Older solved threads are unlikely to receive an appropriate amount of attention.     I'd recommend you post your MapReduce issue over in the batch processing forum. Be sure to include you version of CDH, a complete stack trace, and the command you used to launch the job. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-29-2015
	
		
		10:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 They are. 5.3.8 (Oct 20th) happened after 5.4.7 (Sep 18th). The next release of 5.4 after the 5.3.8 release will have the fix. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-03-2015
	
		
		07:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Each file uses a minimum of one block entry (though that block will only be the size of the actual data).     So if you are adding 2736 folders each with 200 files that's   2736 * 200 = 547,200   blocks.     Do the folders represent some particular partitioning strategy? Can the files within a particular folder be combined into a single larger file?     Depending on your source data format, you may be better off looking at something like Kite to handle the dataset management for you. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-27-2014
	
		
		11:02 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 As Mike previously mentioned, those configuration files don't exist when the cluster is handled by CM.     It sounds like the underlying problem might be incorrect host name resolution. Accumulo and Hadoop require forward and reverse DNS to be set up correctly. You should not have IP addresses in your configuration files.     If the problem is incorrect host names, you can check a few things     1) What does CM think the name of the hosts are?     If you go to http://cm.example.com:7180/cmf/hardware/hosts (where "cm.example.com" is the name of your CM host), what is listed in the "name" column? It should be all full qualified domain names.     2) What does the host think its name is?     Log into each of the cluster machines and run the "hostname" command. It should return a fuly qualified domain name and this name should match the one found in the "name" column above.     3) What do Accumulo processes think the host names are?     You can see this by looking inside of ZooKeeper. Because ZooKeeper is used to maintain critical information for Accumulo, you should be very careful while dealing with it directly. It's also important to note that this information is deep in the internals of Accumulo; you must not presume it will be the same across versions. Below I'll show an example from a cluster running Accumulo 1.6.0-cdh5.1.0.     Connect to zookeeper and see what shows up for tablet servers in the /accumulo/%UUID%/tservers node     $ zookeeper-client -server zoo1.example.com,zoo2.example.com,zoo3.example.com
Connecting to zoo1.example.com,zoo2.example.com,zoo3.example.com
... SNIP ...
2014-11-27 10:50:11,499 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=zoo1.example.com,zoo2.example.com,zoo3.example.com sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@8d80be3
Welcome to ZooKeeper!
2014-11-27 10:50:11,535 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@975] - Opening socket connection to server zoo2.example.com/10.17.72.3:2181. Will not attempt to authenticate using SASL (java.lang.SecurityException: Unable to locate a login configuration)
JLine support is enabled
2014-11-27 10:50:11,546 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@852] - Socket connection established to zoo2.example.com/10.17.72.3:2181, initiating session
2014-11-27 10:50:11,560 [myid:] - INFO  [main-SendThread(zoo2.example.com:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server zoo2.example.com/10.17.72.3:2181, sessionid = 0x349ace5c95e63c4, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/
users               problems            monitor             root_tablet         hdfs_reservations   gc
table_locks         namespaces          recovery            fate                tservers            tables
next_file           tracers             config              masters             bulk_failed_copyq   dead
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 0] ls /accumulo/e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1/tservers 
[tserver1.example.com:10011,tserver2.example.com:10011,tserver3.example.com:10011,tserver4.example.com:10011,tserver5.example.com:10011]
[zk: zoo1.example.com,zoo2.example.com,zoo3.example.com(CONNECTED) 1]      The UUID in the top level /accumulo node is the internal id used to track your Accumulo instance. If there are multiple of these, you can find the one for your current cluster by listing all instance information (presuming you have an Accumulo gateway on the node). This utility is also an Accumulo internal, so neither its name, usage, nor output format can be counted on across versions.     $>  accumulo org.apache.accumulo.server.util.ListInstances
INFO : Using ZooKeepers zoo1.example.com,zoo2.example.com,zoo3.example.com
 Instance Name       | Instance ID                          | Master                        
---------------------+--------------------------------------+-------------------------------
          "accumulo" | e8f3afdf-a59c-4ebd-ae0c-15d47b9dd5e1 |master2.example.com:10010
         "dedicated" | 496b74ab-316c-41bc-badb-4b908039f725 |                              
         "dedicatum" | e49b451b-4607-4a0e-9dda-b49dc938080e |                                   4) Is HDFS confused?     Can you use hdfs commands from inside / outside of the cluster? E.g. can you list the root directory? Can the Accumulo user list their home directory or the /accumulo directory?       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-03-2014
	
		
		01:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Current versions of Spark don't have a spark-assembly jar artifact (see for example maven central for upstream). The assembly is used internally by distributions when executing Spark.     Instead you should have a dependency for whichever part of Spark you make use of, e.g. spark-core. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-03-2014
	
		
		12:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 maven.jenkins.cloudera.com is an internal repository used in our internal build and publishing process. It is currently online but it is not available outside of Cloudera's internal network. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-03-2014
	
		
		11:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You should only rely on released (i.e. non-SNAPSHOT) versions in your own projects. CDH 5.2.0 was released 14 Oct, so there are no longer SNAPSHOT versions of the artifacts.     See the CDH documentation on using maven for info on what the proper version string is for the component you wish to use: CDH 5.2.0 Maven artifact coordinates       
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-30-2014
	
		
		11:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This error is because you have configured a minimum required replication rather than a default level of replication.     Some systems, like Sqoop 2, purposefully set a low replication level for temporary files that they aren't worried about losing. With a required minimum replication the namenode will reject these requests as invalid.     The fix is to update the minimum required replication back to 1. Do this by resetting the property dfs.namenode.replication.min. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-22-2014
	
		
		05:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi!     I'd be happy to help you with this new problem. To make things easier for future users, how about we mark my answer for the original thread topic and start a new one for this issue? 
						
					
					... View more