Member since
03-16-2020
107
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1193 | 02-18-2022 12:59 AM |
01-23-2023
08:41 AM
java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.io.Text.set(Text.java:225)
at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:59)
at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
at org.apache.orc.impl.writer.StructTreeWriter.writeFields(StructTreeWriter.java:64)
at org.apache.orc.impl.writer.StructTreeWriter.writeBatch(StructTreeWriter.java:78)
at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:557) The above error will be thrown if there is a schema mismatch between table metadata and orc file like create table test(str string); -- table metadata and orcfile dump looks like Type: struct<str:int> ... Please correct schema and try again
... View more
01-20-2023
08:52 AM
In case of HWC, user query will be processed by HWC API connecting to HS2 server where HS2 will execute query either within HS2 or Tez/LLAP daemons In case of Spark API, spark's framework is used to execute the query by getting necessary metadata about table from HMS Please refer to below articles to know more about HWC https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html https://community.cloudera.com/t5/Community-Articles/Integrating-Apache-Hive-with-Apache-Spark-Hive-Warehouse/ta-p/249035
... View more
01-20-2023
08:06 AM
Invalid OperationHandle: OperationHandle This exception occurs when there are multiple HiveServer2 instances and access them using Zookeeper/Knox with failover configured When a query(irrespective of number of rows) took more time and HS2 is not able to respond within the defined timeout, ZK/KNOX will do a failover to the next available HS2 Since the other HS2 is unaware of the Query/Operation Handle, it throws Invalid OperationHandle exception To solve this problem Check if we can optimize the query to run faster either by adding a filter or splitting the available data into multiple tables and then query them in separate queries etc Check if HS2 is utilized beyond its capacity like using 200 connections at a given point in time for a 24GB heap of HS2/HMS HMS backend database not able to cope up to serve requests from HMS Check yarn queue has enough capacity to serve the query otherwise query will be in waiting state Check if HDFS is healthy and Namenode is able to respond to the requests without delays Sometimes if Ranger needs to check too many files/directories in HDFS before the query gets executed If Load balancer is used, sticky sessions should be enabled so that one-one relationship gets established for opened connections avoiding failover to another HS2 instance The above explanation holds good for any version of Hive
... View more
03-11-2022
05:00 AM
@useryy2 I have tried the same query in Hive-3.1.3000.7.1.7.0-551 and did not get any error +----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `testexplode`( |
| `name` string, |
| `childs` array<string>, |
| `amap` map<string,string>) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'collection.delim'=',', |
| 'field.delim'='\t', |
| 'line.delim'='\n', |
| 'mapkey.delim'=':', |
| 'serialization.format'='\t') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://c1448-xxxx.cloudera.com:8020/warehouse/tablespace/external/hive/testexplode' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transient_lastDdlTime'='1646831327') |
+----------------------------------------------------+ And the query result is 0: jdbc:hive2://c1448-xxxx.coelab.cloudera.c> select explode(childs) from testexplode;
INFO : Compiling command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9): select explode(childs) from testexplode
INFO : No Stats for default@testexplode, Columns: amap, name, childs
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:col, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9); Time taken: 2.262 seconds
INFO : Executing command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9): select explode(childs) from testexplode
INFO : Completed executing command(queryId=hive_20220311124936_b4346a2e-121e-4d4c-8c8f-b05ce3cfc4c9); Time taken: 0.062 seconds
INFO : OK
+---------+
| col |
+---------+
| child1 |
| child2 |
| child3 |
| child4 |
| child5 |
| child6 |
| child7 |
| child8 |
+---------+
8 rows selected (2.626 seconds) Could you please try connect beeline with --verbose=true flag and share the complete stack trace to see the exact cause of the problem, also share hive version being used
... View more
02-18-2022
12:59 AM
1 Kudo
@vladenache The issue seems to be with field FieldSchema(name:80t_lab.fan_glo (q5), type:tinyint, comment:null)], properties:null) Please check schema of the table for above field and correct it's name. From the attached stack trace it is trying to identify q5 as a data type and failing as it does not exist
... View more
02-02-2022
12:06 AM
HI @vladenache Could you please share next 50 lines after the below line Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: No enum constant org.apache.parquet.schema.OriginalType.q5 or full output of beeline console which includes full stack trace
... View more
01-31-2022
10:20 PM
1 Kudo
Hi @vladenache From the pasted stack trace, we could see that enum q5 does not exist. From the code base org.apache.parquet.schema.OriginalType please refer to https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/schema/OriginalType.java So the problem could be with Incompatible versions of Parquet/Hive in use, please let us know what versions of hive, parquet are in use or Hadoop distribution version also helps Source data format issues, please share schema of source table and sample data would help Please share full stack trace to understand more about code execution path while query is getting executed
... View more
01-30-2022
10:34 PM
The following parameters control the number of mappers for splittable formats with Tez: set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split Adjust the above values to best suit your data file size to avoid file split grouping leading to increased number of mappers. If you still don't see number of mappers increased and hive.input.format is set to “org.apache.hadoop.hive.ql.io.CombineHiveInputFormat” , you may need to adjust below properties as well set mapreduce.input.fileinputformat.split.maxsize=50000;
set mapreduce.input.fileinputformat.split.minsize=50000; Please note that data locality w.r.t nodes also plays roles in determining, for more information please refer to the below references References: https://community.cloudera.com/t5/Support-Questions/How-are-number-of-mappers-determined-for-a-query-with-hive/m-p/94915 https://cwiki.apache.org/confluence/display/TEZ/How+initial+task+parallelism+works https://cloudera.ericlin.me/2015/05/how-to-control-the-number-of-mappers-required-for-a-hive-query/ http://cloudsqale.com/2018/10/22/tez-internals-1-number-of-map-tasks/ http://cloudsqale.com/2018/12/24/orc-files-split-computation-hive-on-tez/
... View more
01-05-2022
09:38 AM
I have used CDP (Hive-3.1.3000) to run below queries. 1. I am getting value instead of null when i execute below command 0: jdbc:hive2://hs2> select from_utc_timestamp(from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX"),"America/New_York");
+------------------------+
| _c0 |
+------------------------+
| 2021-10-02 20:00:00.0 |
+------------------------+ 2. from_utc_timestamp does not give output along with TimeZone as it is internally returning LocalDateTime which does not have TZ information GenericUDFFromUtcTimestamp.java // Now output this timestamp's millis value to the equivalent toTz.
Timestamp result = Timestamp.valueOf(
fromTs.getZonedDateTime().withZoneSameInstant(toTz.toZoneId()).toLocalDateTime().toString()); LocalDateTime //-----------------------------------------------------------------------
/**
* Outputs this date-time as a {@code String}, such as {@code 2007-12-03T10:15:30}.
* <p>
* The output will be one of the following ISO-8601 formats:
* <ul>
* <li>{@code uuuu-MM-dd'T'HH:mm}</li>
* <li>{@code uuuu-MM-dd'T'HH:mm:ss}</li>
* <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSS}</li>
* <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSSSSS}</li>
* <li>{@code uuuu-MM-dd'T'HH:mm:ss.SSSSSSSSS}</li>
* </ul>
* The format used will be the shortest that outputs the full value of
* the time where the omitted parts are implied to be zero.
*
* @return a string representation of this date-time, not null
*/
@Override
public String toString() {
return date.toString() + 'T' + time.toString();
} 3. Converting timestamp to the format "2021-10-03T15:10:00-04:00" does not seem to be possible with existing Hive UDF functions instead 'Z' notation can be used 0: jdbc:hive2://hs2> select from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssXXX");
+-----------------------+
| _c0 |
+-----------------------+
| 2021-10-03T15:10:00Z |
+-----------------------+ Or 0: jdbc:hive2://hs2> select from_unixtime(unix_timestamp('2021-10-03 15:10:00.0', "yyyy-MM-dd HH:mm:ss.S"),"yyyy-MM-dd'T'HH:mm:ssZZZ");
+---------------------------+
| _c0 |
+---------------------------+
| 2021-10-03T15:10:00+0000 |
+---------------------------+ Reference: https://stackoverflow.com/questions/34723683/simpledateformat-ignores-xxx-if-timezone-is-set-to-utc
... View more
01-03-2022
12:37 AM
Hi @Anitauzanna I have tried below use case as per the problem description, but I could not see '?' in hive table [root@c1448-node1 ~]# hdfs dfs -cat /tmp/data.csv
1,a,b
2,,c
3,d,
[root@c1448-node1 ~]#
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.4.7.7.1.7.0-551
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)
scala> val csv_df = spark.read.csv("/tmp/data.csv");
csv_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
scala> csv_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+
scala> csv_df.write.mode("overwrite").saveAsTable("null_test");
22/01/03 08:22:28 WARN conf.HiveConf: HiveConf of name hive.masking.algo does not exist
Hive Session ID = 623fe696-c08a-4bad-b15a-12b6c7db1c52
scala> val null_df=spark.sql("select * from null_test");
null_df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 1 more field]
scala> null_df.show
+---+----+----+
|_c0| _c1| _c2|
+---+----+----+
| 1| a| b|
| 2|null| c|
| 3| d|null|
+---+----+----+
scala>
[root@c1448-node1 ~]#
[root@c1448-node1 ~]# beeline -e "select * from null_test;"
+----------------+----------------+----------------+
| null_test._c0 | null_test._c1 | null_test._c2 |
+----------------+----------------+----------------+
| 1 | a | b |
| 2 | NULL | c |
| 3 | d | NULL |
+----------------+----------------+----------------+
3 rows selected (1.344 seconds) I guess there could be some unsupported characters in your text file, we might need to change encoding to get them correctly visible Please share the sample data along with the Hive table schema & spark commands tried so far to help troubleshoot the issue further
... View more