About tarak271

tarak271 · ‎08-13-2024

Could you please try the steps detailed at https://hub.docker.com/r/apache/hive If you want to use your own core-site.xml/hdfs-site.xml/yarn-site.xml or hive-site.xml for the service, you can provide the environment variable HIVE_CUSTOM_CONF_DIR for the command. For example: Put the custom configuration file under the directory /opt/hive/conf and run: docker run -d -p 9083:9083 --env SERVICE_NAME=metastore \--env DB_DRIVER=postgres -v /opt/hive/conf:/hive_custom_conf --env HIVE_CUSTOM_CONF_DIR=/hive_custom_conf \--name metastore apache/hive:${HIVE_VERSION}

tarak271 · ‎04-24-2023

Please check https://docs.cloudera.com/cdp-private-cloud-upgrade/latest/migrate-hive-workloads/topics/hive-repl-load-overview.html and try REPL between HDP 3 and CDP 7 to take replicate HMS metadata with or without actual data

tarak271 · ‎04-24-2023

Please check the number of open connections for each HiveServer2/HiveMetastore instance from CM UI to see if HS2/HMS instances are overloaded with high number of client connections if HMS backend database is performing optimally or not the performance of KDC or AD server if authentication is enabled if there are high JVM pauses in CM charts of HS2/HMS or look for traces matching Detected pause in JVM or host machine (eg GC) If there is no abnormality in any of the above cases, then we might need to collect jstack for all the three processes i.e beeline, HS2 and HMS to confirm the location of slowness due to which connection is getting hung

tarak271 · ‎01-23-2023

java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.io.Text.set(Text.java:225) at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59) at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70) at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64) at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78) at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56) at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557) The above error will be thrown if there is a schema mismatch between table metadata and orc file like create table test(str string); -- table metadata and orcfile dump looks like Type: struct<str:int> ... Please correct schema and try again

tarak271 · ‎01-20-2023

In case of HWC, user query will be processed by HWC API connecting to HS2 server where HS2 will execute query either within HS2 or Tez/LLAP daemons In case of Spark API, spark's framework is used to execute the query by getting necessary metadata about table from HMS Please refer to below articles to know more about HWC https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html https://community.cloudera.com/t5/Community-Articles/Integrating-Apache-Hive-with-Apache-Spark-Hive-Warehouse/ta-p/249035

tarak271 · ‎01-20-2023

Invalid OperationHandle: OperationHandle This exception occurs when there are multiple HiveServer2 instances and access them using Zookeeper/Knox with failover configured When a query(irrespective of number of rows) took more time and HS2 is not able to respond within the defined timeout, ZK/KNOX will do a failover to the next available HS2 Since the other HS2 is unaware of the Query/Operation Handle, it throws Invalid OperationHandle exception To solve this problem Check if we can optimize the query to run faster either by adding a filter or splitting the available data into multiple tables and then query them in separate queries etc Check if HS2 is utilized beyond its capacity like using 200 connections at a given point in time for a 24GB heap of HS2/HMS HMS backend database not able to cope up to serve requests from HMS Check yarn queue has enough capacity to serve the query otherwise query will be in waiting state Check if HDFS is healthy and Namenode is able to respond to the requests without delays Sometimes if Ranger needs to check too many files/directories in HDFS before the query gets executed If Load balancer is used, sticky sessions should be enabled so that one-one relationship gets established for opened connections avoiding failover to another HS2 instance The above explanation holds good for any version of Hive

tarak271 · ‎03-11-2022

@useryy2 I have tried the same query in Hive-3.1.3000.7.1.7.0-551 and did not get any error +----------------------------------------------------+ | createtab_stmt | +----------------------------------------------------+ | CREATE EXTERNAL TABLE `testexplode`( | | `name` string, | | `childs` array<string>, | | `amap` map<string,string>) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | WITH SERDEPROPERTIES ( | | 'collection.delim'=',', | | 'field.delim'='\t', | | 'line.delim'='\n', | | 'mapkey.delim'=':', | | 'serialization.format'='\t') | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.mapred.TextInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | LOCATION | | 'hdfs://c1448-xxxx.cloudera.com:8020/warehouse/tablespace/external/hive/testexplode' | | TBLPROPERTIES ( | | 'bucketing_version'='2', | | 'transient_lastDdlTime'='1646831327') | +----------------------------------------------------+ And the query result is 0: jdbc:hive2://c1448-xxxx.coelab.cloudera.c> select explode(childs) from testexplode; INFO : Compiling command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9): select explode(childs) from testexplode INFO : No Stats for default@testexplode, Columns: amap, name, childs INFO : Semantic Analysis Completed (retrial = false) INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col, type:string, comment:null)], properties:null) INFO : Completed compiling command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9); Time taken: 2.262 seconds INFO : Executing command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9): select explode(childs) from testexplode INFO : Completed executing command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9); Time taken: 0.062 seconds INFO : OK +---------+ | col | +---------+ | child1 | | child2 | | child3 | | child4 | | child5 | | child6 | | child7 | | child8 | +---------+ 8 rows selected (2.626 seconds) Could you please try connect beeline with --verbose=true flag and share the complete stack trace to see the exact cause of the problem, also share hive version being used

vladenache · ‎02-18-2022

thanks for noticing this, i missed it! the table is a result of a sqoop job from mysql, so Ii guess the column name had q5 in it and somehow it ended up defined as a type ?!

tarak271 · ‎01-30-2022

The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split set tez.grouping.max-size=1073741824; -- 1 GB max split Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers. If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat”, you may need to adjust below properties as well set mapreduce.input.fileinputformat.split.maxsize=50000; set mapreduce.input.fileinputformat.split.minsize=50000; Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references References: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915 https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/ http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/ http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/

tarak271 · ‎01-05-2022

I have used CDP (Hive-3.1.3000) to run below queries. 1. I am getting value instead of null when i execute below command 0: jdbc:hive2://hs2> select from_utc_timestamp(from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX"),"America/New_York"); +------------------------+ | _c0 | +------------------------+ | 2021-10-02 20:00:00.0 | +------------------------+ 2. from_utc_timestamp does not give output along with TimeZone as it is internally returning LocalDateTime which does not have TZ information GenericUDFFromUtcTimestamp.java // Now output this timestamp's millis value to the equivalent toTz. Timestamp result = Timestamp.valueOf( fromTs.getZonedDateTime().withZoneSameInstant(toTz.toZoneId()).toLocalDateTime().toString()); LocalDateTime //----------------------------------------------------------------------- /** * Outputs this date-time as a {@code String}, such as {@code 2007-12-03T10:15:30}. * <p> * The output will be one of the following ISO-8601 formats: * <ul> * <li>{@code uuuu-MM-dd'T'HH:mm}</li> * <li>{@code uuuu-MM-dd'T'HH:mm:ss}</li> * <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSS}</li> * <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSSSSS}</li> * <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSS}</li> * </ul> * The format used will be the shortest that outputs the full value of * the time where the omitted parts are implied to be zero. * * @return a string representation of this date-time, not null */ @Override public String toString() { return date.toString() + 'T' + time.toString(); } 3. Converting timestamp to the format "2021-10-03T15:10:00-04:00" does not seem to be possible with existing Hive UDF functions instead 'Z' notation can be used 0: jdbc:hive2://hs2> select from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX"); +-----------------------+ | _c0 | +-----------------------+ | 2021-10-03T15:10:00Z | +-----------------------+ Or 0: jdbc:hive2://hs2> select from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssZZZ"); +---------------------------+ | _c0 | +---------------------------+ | 2021-10-03T15:10:00+0000 | +---------------------------+ Reference: https://stackoverflow.com/questions/34723683/simpledateformat-ignores-xxx-if-timezone-is-set-to-utc

Online	Offline
Last Visited	‎01-13-2025 04:05 AM

Member Since	‎03-16-2020 01:45 AM
Last Visited	‎01-13-2025 04:05 AM
Posts	337
Kudos received	3

Cloudera Community

Re: create table as select failed, but only when s...

Re: Configuration kerberos authorization via JDBC ...

Re: URGENT: MIgrate Hive data from HDP 3 to CDP 7 ...

Re: Hive Connection Timeout

Re: Hive compactor error

Re: HiveWarehouseSession vs SQLContext spark execu...

Re: Invalid OperationHandle: OperationHandle [opTy...

Re: Function "explode" doesn't work in Beeline

Re: create table as select failed, but only when s...

Re: Number of mapper is not changing

Re: Timestamp string conversion / from_utc_timesta...