About bshah1

bshah1 · ‎06-21-2017

hello we installed HDP 2.6.1 and would like to setup ssl for zeppelin. on the server that zeppelin is installed, the port 8443 is already been used by other service, how do I change the ssl port for zeppelin?

bshah1 · ‎06-19-2017

Hello @Dominika Bialek thanks for the reponse. that was the issue. After adding S3 location, the issue resolved

bshah1 · ‎06-19-2017

Hello, When I tried running the following command, I am getting the error: alter table btest.testtable add IF NOT EXISTS partition (load_date='2017-06-19') location 's3a://testbucket/data/xxx/load_date=2017-06-19'; I am getting the following error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hive] does not have [READ] privilege on [s3a://testbucket/data/xxx/load_date=2017-06-19] FYI: Select statement works fine. I can run select statement having data located in S3. it is just that insert statement is failing. We are using ranger for authorization but hive user has full permission on all the databases, tables

bshah1 · ‎04-19-2017

Thank you for the response. I did it using creating the temp hive table

bshah1 · ‎04-17-2017

Hello, is there a way to split 2 GB ORC file in to 50MB files? We have many ORC files (larger than 1GB) in HDFS. We are planning to move those files to S3 and configure Hive external table to S3. The performance has been significantly affected by copying the larger files. If I split those files in to multiple files of 50MB or less and copy to S3 than the performance is comparable to HDFS (to test I created another table stored as ORC and insert the existing table data which created multiple files but that is not a viable solution as I have tables with multiple partition and many tables). Is it possible to split the ORC files in to multiple files?

bshah1 · ‎03-26-2017

for e.g. I have a table in my source database having columns monthid (int), monthshort (string). I copy the data of the table daily using nifi and store in hdfs. I have created external table in hive: CREATE external table if not exists test.s3amonths( monthid int, monthshort string ) PARTITIONED BY (load_date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://xxx/bshah-s3hdptest/test/months'; daily loading the data using following statement: alter table test.s3amonths add IF NOT EXISTS partition (load_date='2016-12-10') location 'hdfs://xxx/bshah-s3hdptest/test/months/load_date=2016-12-10'; alter table teset.s3amonths add IF NOT EXISTS partition (load_date='2016-12-11') location 'hdfs://xxx/bshah-s3hdptest/test/months/load_date=2016-10-11'; Now the schema in the source table change to : monthid (int), monthlong (string),monthshort (string) when loading the new partition how do I ensure that the existing data will not be affected and the new data having additional column information will also be loaded successfully

bshah1 · ‎03-20-2017

Thanks for the reply. I am on HDP 2.4.2 and Hive 1.2.1.2.4 version. The cluster is created using EC2 instances. I found the solution for the proxy issue. When setup initially, it was setup with fs.s3a.proxy.host and fs.s3a.proxy.port variable. Now we setup the S3 endpoint in route table so it does not use the proxy any more to connect to s3. I removed those two variable from hdfs xml file and that has resolved the performance issue

bshah1 · ‎03-20-2017

Thanks for the reply. I am on HDP 2.4.2 and Hive 1.2.1.2.4 version. The cluster is created using EC2 instances. I found the solution for the proxy issue. When setup initially, it was setup with fs.s3a.proxy.host and fs.s3a.proxy.port variable. Now we setup the S3 endpoint in route table so it does not use the proxy any more to connect to s3. I removed those two variable from hdfs xml file and that has resolved the performance issue

bshah1 · ‎03-16-2017

Hello, I have created an external table pointing to s3 location. when I run a query using hive or beeline it is taking a lot of time to retrieve the result. The file format that I use is orc. just to give you some perspective: external table created using HDFS ORC object gives results in less than 30 sec while the same takes more than 30 min when the object in S3. Also I see lot of below error in task logs (and I think that's taking a lot of time to retrieve the result as I see in the log for more than 30 mins the same error constantly. 2017-03-16 02:41:28,153 [INFO] [TezChild] |http.AmazonHttpClient|: Unable to execute HTTP request: Read timed out java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:92) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300) at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:66) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127) at org.apache.http.impl.client.DefaultRequestDirector.createTunnelToTarget(DefaultRequestDirector.java:902) at org.apache.http.impl.client.DefaultRequestDirector.establishRoute(DefaultRequestDirector.java:821) at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:647) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:384) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1111) at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:91) at org.apache.hadoop.fs.s3a.S3AInputStream.seek(S3AInputStream.java:115) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:62) at org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:111) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:245) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.beginReadStripe(RecordReaderImpl.java:831) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripe(RecordReaderImpl.java:802) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1013) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1046) at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1101) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:120) at org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.next(VectorizedOrcInputFormat.java:54) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116) at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:141) at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:61) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:328) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

bshah1 · ‎03-16-2017

thanks for your reply. that help

Online	Offline
Last Visited	‎07-10-2017 12:35 PM

Member Since	‎04-07-2016 02:59 AM
Last Visited	‎07-10-2017 12:35 PM
Posts	22
Kudos received	1

Cloudera Community

Re: hive s3a performance issue

how to change zeppelin ssl port from 8443 to 5443

Re: after upgrading HDP to 2.6.1 - insert command ...

after upgrading HDP to 2.6.1 - insert command does...

Re: split orc file

split orc file

how to add partition to existing table having diff...

Re: hive s3a performance issue

Re: hive s3a performance issue

hive s3a performance issue

Re: how to alter hive table partition