Member since
06-13-2016
20
Posts
0
Kudos Received
0
Solutions
01-30-2018
08:49 AM
Any suggestion?
... View more
01-19-2018
03:14 AM
Hi, I am running an ETL which populates the data from HIVE to MySQL. The SELECT Query have multiple UNION and the time taken for executing the job is about 1 hour. Most of the time the transformation(ETL) succeed and few times it is failed with the following exception. Moreover the Hive JDBC connection is on non-kerberized HDP cluster and there is non error logged into Hiveserver2/Hive metastore: org.apache.thrift.transport.TTransportException
2018/01/18 15:11:21 - Table input 3.0 -
2018/01/18 15:11:21 - Table input 3.0 - at org.pentaho.di.core.database.Database.openQuery(Database.java:1768)
2018/01/18 15:11:21 - Table input 3.0 - at org.pentaho.di.trans.steps.tableinput.TableInput.doQuery(TableInput.java:236)
2018/01/18 15:11:21 - Table input 3.0 - at org.pentaho.di.trans.steps.tableinput.TableInput.processRow(TableInput.java:140)
2018/01/18 15:11:21 - Table input 3.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2018/01/18 15:11:21 - Table input 3.0 - at java.lang.Thread.run(Thread.java:748)
2018/01/18 15:11:21 - Table input 3.0 - Caused by: java.sql.SQLException: org.apache.thrift.transport.TTransportException
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:365)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:242)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:437)
2018/01/18 15:11:21 - Table input 3.0 - at org.pentaho.di.core.database.Database.openQuery(Database.java:1757)
2018/01/18 15:11:21 - Table input 3.0 - ... 4 more
2018/01/18 15:11:21 - Table input 3.0 - Caused by: org.apache.thrift.transport.TTransportException
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:77)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_GetOperationStatus(TCLIService.java:413)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.service.cli.thrift.TCLIService$Client.GetOperationStatus(TCLIService.java:400)
2018/01/18 15:11:21 - Table input 3.0 - at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
2018/01/18 15:11:21 - Table input 3.0 - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2018/01/18 15:11:21 - Table input 3.0 - at java.lang.reflect.Method.invoke(Method.java:498)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.jdbc.HiveConnection$SynchronizedHandler.invoke(HiveConnection.java:1374)
2018/01/18 15:11:21 - Table input 3.0 - at com.sun.proxy.$Proxy59.GetOperationStatus(Unknown Source)
2018/01/18 15:11:21 - Table input 3.0 - at org.apache.hive.jdbc.HiveStatement.waitForOperationToComplete(HiveStatement.java:332)
2018/01/18 15:11:21 - Table input 3.0 - ... 7 more Is there any workaround/ configuration interms of Hive JDBC connnection to get rid of this error. ? Appreciate your feedback/suggestion. Thanks!!!
... View more
Labels:
- Labels:
-
Apache Hive
11-07-2017
07:16 AM
When I run Hive Queries with couple of users using AUTO_HIVE20_VIEW instance in the ambari, the simple query also took long time to respond.I have tried tuning the following parameters in ambari-.properties file. client.threadpool.size.max = 100 views.ambari.request.read.timeout.millis=12000 views.request.read.timeout.millis=120000 views.ambari.hive.<HIVE_VIEW_INSTANCE_NAME>.result.fetch.timeout=120000 However it does not helps. Morever the memory utilization is also less for ambari-server instance. I have observed the following line in hive20-view.log 07 Nov 2017 07:16:04,703 ERROR [HiveViewActorSystem-akka.actor.default-dispatcher-1829] [HIVE 2.0.0 AUTO_HIVE20_INSTANCE] OperationController:174 - Cannot update Dag Information for job. Job with id: 271 for instance: AUTO_HIVE20_INSTANCE has either not started or has expired.
07 Nov 2017 07:16:07,716 INFO [ambari-client-thread-16477] [HIVE 2.0.0 AUTO_HIVE20_INSTANCE] Aggregator:328 - Saving DAG information via actor system for job id: 271
07 Nov 2017 07:16:07,716 ERROR [HiveViewActorSystem-akka.actor.default-dispatcher-1829] [HIVE 2.0.0 AUTO_HIVE20_INSTANCE] OperationController:174 - Cannot update Dag Information for job. Job with id: 271 for instance: AUTO_HIVE20_INSTANCE has either not started or has expired. Kindly help me to resolve this issue. Thanks!!
... View more
11-07-2017
06:42 AM
Thanks for your reply. So I need to use INSERT OVERWRITE command to load the data from orig_log to orc_log table periodically right after creating partition on day column.
... View more
11-04-2017
06:07 PM
Hi, I have used org.openx.data.jsonserde.JsonSerDe for loading log data(fields such as Map etc.,) into HIVE External table. I could be able to query the data properly. However the query response time is high and I have created ORC derived table from the staging table using CREATE EXTERNAL TABLE <orc_log> stored as ORC tblproperties("orc.compress"="SNAPPY") AS SELECT * from orig_log; There are two questions: 1. How to sync ORC table with orig_log table where the data is loading incrementally in this orig table. 2.ANALYZE TABLE statement fails for both orig_log and orc_log tables because the complex JSON data type such as Map is not supported. Would be great if you can suggest the way to overcome/resolve my issue. Thanks in advance!!!
... View more
Labels:
- Labels:
-
Apache Hive
11-01-2017
08:32 AM
Thank you Jay, I will update the proceedings..meanwhile, is it possible to audit the HIVE queries executed by the users through Ambari view? OR do we need to install Ranger for audit logs?
... View more
11-01-2017
02:48 AM
Problem: The scenario is to allow multiple users (created using Ambari console) to get access to HiveServer2 installed in the cluster. Hence we created users/groups using Admin login and also created home directory /user/<username> in HDFS. When the user login to console and have got access to Ambari Hive View2.0 instance and try to execute the queries, after few queries executed, then there is no response even try to stop the execution. But the same set of users can be logged into beeline and execute the queries, the response is always there. Since we need to provide the GUI for users to execute queries, we thought of using Ambari views. (unlike Hive CLI or beeline). Is there any difference interms of accessing Hiveserver2 using Ambari views or beeline? Any help to tune the usage of Ambari view is appreciated. Thanks in advance!!!
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hive
11-01-2017
02:23 AM
Thanks much for your detailed reply, it really helps.!!!
... View more
10-27-2017
03:07 AM
We have 2-node cluster(1 master 4 CPU,16 GB RAM + 1 data node 8 CPU,30 GB RAM). However in Ambari console, I could be able to see the Total cluster memory is 22 GB only. Is there a way to allocate more cluster memory(around 36GB ) out of 46 GB physical memory we have together from master + data node. Morever, the number of containers are only 5 whereas the available Vcores are 8 already. I have attached the screenshot for your reference. Please suggest a way to improve the cluster resource utilization. Thank you in advance.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache YARN
10-25-2017
01:24 AM
We have 2-node cluster(1 master 4 CPU,16 GB RAM + 1 data node 8 CPU,30 GB RAM) and the estimated amount of data being processed through HIVE tables are 100 GB. We are using Ambari Hive 2.0 view instance running in Master and the estimated number of support/analytics users are around 15-20. When we try to access the HIVE instance differently for each user (per session), all HIVE queries (using Tez) are processed via YARN default queue. However the expectation is to get the HIVE results in parallel for each session, but these Tez jobs are executed in sequence and the performance is major constraint here. We dont want to add more nodes as the data being processed is still in GBs and we wanted to improve the parallelism in HIVE query execution with the current hardware configuration. We have also applied tuning parameters related to HIVE such as et hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true; along with converting the table into ORC format. Even then the performance of query response time and parallelism are not improved. Any help related to this,highly appreciated. Thanks!!!
... View more
Labels:
- Labels:
-
Apache Hive
10-12-2017
04:15 PM
Thanks much
... View more
10-12-2017
10:49 AM
Thanks for your response. Is it possible to minimize the response time by converting my table in ORC or Parquet format.?
... View more
10-12-2017
10:48 AM
Thanks for your reply. Yes true, we cannot compare with RDBMS as both HIVE and RDBMS meant for different purposes. However, it is evident that HIVE is still handful for batch analytics but not for interactive.(atleast for now)
... View more
10-11-2017
04:07 AM
My usecase is to perform interactive analytics on top of the log data (json format) stored in HDFS and in HIVE table(TEXTFILE format). We have around 30 million records and the size of the dataset is around 60 GB. Since Tez is the default query engine for my hive version, i expected the query results should be faster enough, but the response time for even count() also took around 30 seconds. What would be best practice or recommendation for performing interactive log analytics using HIVE? Do I need to use HIVE table with RC/ORC format rather than TEXT.? My customer comparing the query response time with RDBMS in this case. Appreciate your suggestion on approach/solution to satisfy my usecase. Thanks!!!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
07-14-2016
01:14 AM
Hello, Is there a way to restrict/protect the access to the following service URLs through browser. As of now all these URLs are accessible without authentication and our Security Assessment team list these as part of the vulnerabilities. http://domainame:50070/logs/ http://domainame:50070/explorer.html#/
http://domainame:50070/dfshealth.html#tab-datanode
http://domainame:16030/rs-status
http://domainame:8088/cluster/cluster
http://domainame:8188/applicationhistory
http://domainame:8042/node http://secondarynamenode:16010/logs/ http://datanode:61310/logs/ Your speedy response is highly appreciated. Thanks
... View more
- Tags:
- Security
Labels:
- Labels:
-
Security
06-14-2016
09:25 AM
Thanks Deepesh for your immense response.
... View more
06-13-2016
10:56 PM
We are using HDP2.2. stack and by default mysql 5.6.x has been bundled. The security team assessment report shows that there are vulnerabilities in mysql 5.6 and the remedy of the issue could be upgrade mysql 5.6.30 or above. I am not sure how to upgrade mysql alone in HDP 2.2 and would be good if any one have such experience of upgrading mysql 5.6.x to 5.6.30 or above. Please reply.
... View more
Labels: