Member since 
    
	
		
		
		05-02-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                319
            
            
                Posts
            
        
                145
            
            
                Kudos Received
            
        
                59
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 8179 | 06-03-2019 09:31 PM | |
| 2188 | 05-22-2019 02:38 AM | |
| 2820 | 05-22-2019 02:21 AM | |
| 1685 | 05-04-2019 08:17 PM | |
| 2104 | 04-14-2019 12:06 AM | 
			
    
	
		
		
		04-26-2018
	
		
		10:59 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 SANDBOX VERSION AFFECTED  HDP 2.6.0.3 Sandbox as identified below.  # wget https://downloads-hortonworks.akamaized.net/sandbox-hdp-2.6/HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
# md5sum HDP_2.6_docker_05_05_2017_15_01_40.tar.gz
886845a5e2fc28f773c59dace548e516  HDP_2.6_docker_05_05_2017_15_01_40.tar.gz    ISSUE  When using classic Hive CLI after a while the following error surfaces.  [root@sandbox demos]# hive
log4j:WARN No such property [maxFileSize] in org.apache.log4j.DailyRollingFileAppender.
Logging initialized using configuration in file:/etc/hive/2.6.0.3-8/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:547)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: sandbox
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:438)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:690)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:631)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:160)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:179)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:530)
    ... 8 more
Caused by: java.net.UnknownHostException: sandbox
    ... 21 more
[root@sandbox demos]#  RESOLUTION  Modify /etc/hosts to allow sandbox to be resolved just as sandbox.hortonworks.com does.  [root@sandbox ~]# cat /etc/hosts
127.0.0.1localhost
::1localhost ip6-localhost ip6-loopback
fe00::0ip6-localnet
ff00::0ip6-mcastprefix
ff02::1ip6-allnodes
ff02::2ip6-allrouters
172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]# cp /etc/hosts /tmp/
[root@sandbox ~]# vi /etc/hosts
[root@sandbox ~]# diff /etc/hosts /tmp/hosts
7c7
< 172.17.0.2sandbox.hortonworks.com sandbox
---
> 172.17.0.2sandbox.hortonworks.com
[root@sandbox ~]# 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2018
	
		
		10:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I'm guessing you've already seen http://hbase.apache.org/0.94/book/secondary.indexes.html which basically is telling you that you'll need to have a second table whose rowkey is your "secondary index" and is only being used to find the rowkey needed for the actual table.  The coprocessor strategy, as I understand it, is to just formalize & automate the "dual-write secondary index" strategy.  Good luck and happy Hadooping! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2018
	
		
		10:38 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 While I don't want to oversimplify this process nor not suggest that Hortonworks Professional Services doesn't do these conversions with customers all the time (there is often more at play than simply moving the data, such as testing apps before & after), but... you can leverage DistCp, https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html, as your tool to move the data from your original cluster to your new one.  For the HBase data, I'd look to its Snapshots feature, http://hbase.apache.org/book.html#ops.snapshots, including its ability to export the snapshot to another cluster, as a solid approach.  Good luck and happy Hadooping! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2018
	
		
		09:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 It looks like you are only letting YARN use 25GB's of your worker nodes' 64GB as well as only 6 of your 16 CPU cores, so these values should be raised.  Check out details at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_command-line-installation/content/determine-hdp-memory-config.html for a script that can help you set some baseline values for these properties.  As for the Spark jobs.  Interestingly enough, each of these jobs is requesting a certain size and number of containers and I'm betting each job is a bit different.  Since Spark jobs get their resources first, it would seem normal that a specific job (as long as the resource request doesn't change nor does the fundamental dataset size for input) take a comparable time to run from invocation to invocation.  Surely, that isn't necessarily the case from different Spark jobs which may be doing entirely different things.  Good luck and happy Hadooping/Sparking! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-12-2018
	
		
		09:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 https://stackoverflow.com/questions/45100487/how-data-is-split-into-part-files-in-sqoop can start to explain more, but ultimately (and thanks to the power of open-source) you'll have to go look for yourself - you can find source code at https://github.com/apache/sqoop.  Good luck and happy Hadooping! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-09-2017
	
		
		08:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 From looking at your RM UI it sure looks like both of these jobs are basically fighting each other to get running.  Meaning, the AppMaster containers are running, but they can't get anymore more containers to be run from YARN.  My recommendation would be to give the VM 10GB of memory (that's how I run it on my 16GB laptop) when you restart it.  I'd also try to run it from the command line just to take the Ambari View out of the picture, but if you want to run it in Ambari then kill any application via the RM UI that is around should it hang again.  Good luck and happy Hadooping! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-01-2017
	
		
		08:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Unfortunately, it is a bit more complicated than all of that.  In general, Spark is lazy executed so depending on what you do even the "temp view" tables/DataFrame(Set) may not stay around from DAG to DAG.  There is an explicit cache method you can use on a DataFrame(Set), but even then you may be trying to cache something that simply won't fit in memory.  No worries, Spark assumes that your DF(S)/RDD collections won't fit and it inherently handles this.    I'm NOT trying to sell you on anything, but probably some deeper learnings could help you.  I'm a trainer here at Hortonworks (and again, not really trying to sell you something, but pointing to a resource/opportunity) and we spend several days building up this knowledge in our https://hortonworks.com/services/training/class/hdp-developer-enterprise-spark/ class).  Again, apologies for being a salesperson, but my general thought was there's still a bit more to learn for you on Spark internals that might take some more interactive ways of building up that knowledge. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-31-2017
	
		
		09:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Generically speaking, yes, I'd just run the query that is built upon your Hive tables as Spark SQL is going to "figure out" what it needs to do in its optimizer before doing any work anyway.  If the performance is within your SLA then I'd just go with that, but of course, you could always then use that as a baseline to do some comparisons with if/when you do some other approaches in your code.  Happy Hadooping (eh hem... Sparking!) and good luck! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-04-2017
	
		
		09:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training.  Additionally, Hadoop: The Definitive Guide guide, https://smile.amazon.com/Definitive-version-revised-English-Chinese/dp/7564159170/, is still a very good resource. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-04-2017
	
		
		09:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training as a good start. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













