Member since 
    
	
		
		
		09-29-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                67
            
            
                Posts
            
        
                45
            
            
                Kudos Received
            
        
                10
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2588 | 05-25-2016 10:24 AM | |
| 13177 | 05-19-2016 11:24 AM | |
| 9483 | 05-13-2016 10:09 AM | |
| 3472 | 05-13-2016 06:41 AM | |
| 10139 | 03-25-2016 09:15 AM | 
			
    
	
		
		
		06-09-2021
	
		
		06:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 you can try below set parameters  set hive.vectorized.execution.reduce.enabled=false;  and   set hive.vectorized.execution.enabled=true; 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-19-2016
	
		
		07:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Other very good ways to load data into HDFS is using Flume or Nifi. "Hadoop fs put" is good but it has some limitation or lack of flexibility that might make it difficult to use it in a production environment.  If you look at the documentation of the Flume HDFS sink for instance ( http://flume.apache.org/FlumeUserGuide.html#hdfs-sink ), you'll see that Flume lets you define how to rotate the files, how to write the file names etc. Other options can be defined for the source (your local text files) or for the channel. "Hadoop fs put" is more basic and doesn't offer those possibilities. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-18-2019
	
		
		07:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							     Hive Import is Complete, but the next line gives the INFO  hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.  I have made changes in the sqoop query as i am fetching data from Oracle.  When i log into hive the database doesn't has any tables, Please provide appropriate solution to this.  Hopeful to hear from you guys. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-18-2016
	
		
		11:06 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Robin Dong   As mentioned by Ancil, you might want to have a script to do the sqoop download in parallel. And you need to control quite well how big is your parallelism. Above all if you want to avoid the typical "No more spool space in...".  Here's a script to do that: https://community.hortonworks.com/articles/23602/sqoop-fetching-lot-of-tables-in-parallel.html  Another problem I saw in Teradata, is that it is some data types are not supported when you try to directly insert the data into Hive from Sqoop.  So the solution I took was the traditional one:  1) Sqoop to HDFS.  2) Build external tables on top of them  3) Create ORC file and then insert the data or the external tables 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-02-2016
	
		
		02:28 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @vbhoomireddy are you still having issues with this? Can you accept the best answer or provide your own solution? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-11-2015
	
		
		04:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Sourygna Luangsay. We used syslogtcp for our project. Which is struggling for between 500-1000 events / seconds. Looks like multiport_syslogtcp uses Apache Mina (https://mina.apache.org/)  having High-performance asynchronous TCP library, which provides better throughput on multicore machines even when using single TCP port. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-16-2016
	
		
		10:29 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I got it working on Ambari 2.2.1   1.Create mount points:   #mkdir /hadoop/hdfs/data1 /hadoop/hdfs/data2
/hadoop/hdfs/data3  #chown hdfs:hadoop /hadoop/hdfs/data1 /hadoop/hdfs/data2
/hadoop/hdfs/data3  (**We are using
the configuration for test purpose only, so no disks are mounted.)   2.Login to Ambari > HDFS>setting  3.Add datanode directories as shown
below:  Datanode>datanode
directories:  [DISK]/hadoop/hdfs/data,[SSD]/hadoop/hdfs/data1,[RAMDISK]/hadoop/hdfs/data2,[ARCHIVE]/hadoop/hdfs/data3  
      
 Restart hdfs hdfs service.  Restart all other afftected services.  Create a directory
/cold  # su hdfs  [hdfs@hdp-qa2-n1 ~]$
hadoop fs -mkdir /cold  Set COLD storage policy
on /cold   [hdfs@hdp-qa2-n1 ~]$
hdfs storagepolicies -setStoragePolicy -path /cold -policy COLD  Set storage policy
COLD on /cold  5. Run get storage
policy:  [hdfs@hdp-qa2-n1 ~]$
hdfs storagepolicies -getStoragePolicy -path /cold  The storage policy of
/cold:  BlockStoragePolicy{COLD:2,
storageTypes=[ARCHIVE], creationFallbacks=[], replicationFallbacks=[]}   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-21-2015
	
		
		09:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @cliu@hortonworks.com  This is very helpful benchmarks posted by Amplab. Click 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-28-2019
	
		
		05:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi,  Thanks for the Script, it solves most of my automation problems where i need to compare hive tables.  few things i am trying to modify and its not working for me.  we have cluster with hive installed on multi node (load balancer is enabled for HS2) , and we are using beeline (instead of hive cli) to execute queries to get data locally. as cluster is enabled for load balancer, it is executing two queries in two different nodes and local data is now in two nodes and script not able to get the actual data and failing.  not sure how to make it work only on single node using beeline.      Cluster is kerbrose, sentry and hs2 enabled for load balancer 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













