Member since 
    
	
		
		
		04-07-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                22
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4244 | 03-20-2017 03:06 PM | 
			
    
	
		
		
		06-21-2017
	
		
		04:39 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hello we installed HDP 2.6.1 and would like to setup ssl for zeppelin.  on the server that zeppelin is installed, the port 8443 is already been used by other service, how do I change the ssl port for zeppelin? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Zeppelin
			
    
	
		
		
		06-19-2017
	
		
		08:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello @Dominika Bialek thanks for the reponse.  that was the issue.  After adding S3 location, the issue resolved 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2017
	
		
		04:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hello,  When I tried running the following command, I am getting the error:  alter table btest.testtable add IF NOT EXISTS partition (load_date='2017-06-19') location 's3a://testbucket/data/xxx/load_date=2017-06-19';  I am getting the following error:  Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [READ] privilege on [s3a://testbucket/data/xxx/load_date=2017-06-19]  FYI:  Select statement works fine.  I can run select statement having data located in S3.  it is just that insert statement is failing. We are using ranger for authorization but hive user has full permission on all the databases, tables 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Ranger
			
    
	
		
		
		04-19-2017
	
		
		08:23 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you for the response.  I did it using creating the temp hive table 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-17-2017
	
		
		04:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,  is there a way to split 2 GB ORC file in to 50MB files?  We have many ORC files (larger than 1GB) in HDFS.  We are planning to move those files to S3 and configure Hive external table to S3.  The performance has been significantly affected by copying the larger files.  If I split those files in to multiple files of 50MB or less and copy to S3 than the performance is comparable to HDFS (to test I created another table stored as ORC and insert the existing table data which created multiple files but that is not a viable solution as I have tables with multiple partition and many tables).  Is it possible to split the ORC files in to multiple files? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		03-26-2017
	
		
		10:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	for e.g.  
	I have a table in my source database having columns monthid (int),
monthshort (string).  I copy the data of the table daily using nifi and store in hdfs.  
	I have created external table in hive:  
 CREATE external table if not exists test.s3amonths( 
monthid                 int,
monthshort              string ) 
PARTITIONED BY (load_date string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
LOCATION 'hdfs://xxx/bshah-s3hdptest/test/months';
  
	daily loading the data using following statement:  alter table test.s3amonths add IF NOT EXISTS partition (load_date='2016-12-10') location 'hdfs://xxx/bshah-s3hdptest/test/months/load_date=2016-12-10';
alter table teset.s3amonths add IF NOT EXISTS partition (load_date='2016-12-11') location 'hdfs://xxx/bshah-s3hdptest/test/months/load_date=2016-10-11';  Now the schema in the source table change to : monthid (int), monthlong (string),monthshort (string)  when loading the new partition how do I ensure that the existing data will not be affected and the new data having additional column information will also be loaded successfully 
 
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache NiFi
- 
						
							
		
			HDFS
			
    
	
		
		
		03-20-2017
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for the reply.  I am on HDP 2.4.2 and Hive 1.2.1.2.4 version.  The cluster is created using EC2 instances.  I found the solution for the proxy issue.  When setup initially, it was setup with fs.s3a.proxy.host and fs.s3a.proxy.port variable.  Now we setup the S3 endpoint in route table so it does not use the proxy any more to connect to s3.  I removed those two variable from hdfs xml file and that has resolved the performance issue 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-20-2017
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks for the reply.  I am on HDP 2.4.2 and Hive 1.2.1.2.4 version.  The cluster is created using EC2 instances.  I found the solution for the proxy issue.  When setup initially, it was setup with fs.s3a.proxy.host and fs.s3a.proxy.port variable.  Now we setup the S3 endpoint in route table so it does not use the proxy any more to connect to s3.  I removed those two variable from hdfs xml file and that has resolved the performance issue 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-16-2017
	
		
		03:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,  I have created an external table pointing to s3 location.   when I run a query using hive or beeline it is taking a lot of time to retrieve the result.  The file format that I use is orc.    just to give you some perspective:  external table created using HDFS ORC object gives results in less than 30 sec while the same takes more than 30 min when the object in S3.      Also I see lot of below error in task logs (and I think that's taking a lot of time to retrieve the result as I see in the log for more than 30 mins the same error constantly.    2017-03-16 02:41:28,153 [INFO] [TezChild] |http.AmazonHttpClient|: Unable to execute HTTP request: Read timed out
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:170)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166)
	at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90)
	at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
	at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
	at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
	at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
	at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
	at org.apache.http.impl.client.DefaultRequestDirector.createTunnelToTarget(DefaultRequestDirector.java:902)
	at org.apache.http.impl.client.DefaultRequestDirector.establishRoute(DefaultRequestDirector.java:821)
	at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:647)
	at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111)
	at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91)
	at org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115)
	at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62)
	at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:111)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:245)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:831)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:802)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1013)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1046)
	at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1101)
	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:120)
	at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:54)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
	at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
	at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
	at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141)
	at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61)
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:328)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
	at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
 
        





