Member since
08-15-2019
29
Posts
111
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1607 | 09-08-2017 11:30 PM | |
2484 | 06-08-2017 07:24 PM | |
6315 | 03-28-2017 05:20 PM | |
3706 | 03-17-2017 04:27 AM | |
3135 | 03-09-2017 11:48 PM |
09-08-2024
10:36 PM
With the Hive (newer than Hive 2.2), you can use Merge INTO MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET
target.name = source.name,
target.age = source.age
WHEN NOT MATCHED THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
... View more
07-23-2020
07:28 AM
Run msck command for the table you want to truncate in hive shell. hive> use <database-name>; hive> msck repair table <table-name>; If it will show any error then rectify it as we got one of our partition was missing. So we create that partition directory on hdfs location and re-run msck repair command. Now it would not show any issue. Now running truncate command will run successfully. hive> truncate table <table-name>; [NOTE: Please update database and table name as per the requirement]
... View more
03-13-2018
09:49 PM
5 Kudos
For full stack of error message from hive you could look into hiveserver log like @Slim was mentioning. Since hive interactive is enabled I believe you should look into hsihiveserver log on the node where HiveServer2 Interactive is running
... View more
09-08-2017
11:41 PM
1 Kudo
That explains it, thanks!
... View more
07-23-2017
06:50 AM
4 Kudos
Consolidating below some of the errors thrown by Spark Thrift Server during SQL execution, that could be worked around by configuring certain parameters of spark-thrift-sparkconf.conf and hive-site.xml
Error 1:
Join condition is missing or trivial.Use the CROSS JOIN syntax to allow cartesian products between these relations.;
Resolution: spark.sql.crossjoin.enabled: true
Error 2:
Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method "eval(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" grows beyond 64 KB
Resolution: spark.sql.codegen.wholeStage : false
Error 3:
java.lang.OutOfMemoryError: Java heap space
Resolution: spark.driver.memory : 10g <to a higher-value>
spark.sql.ui.retainedExecutions: 5 <to some lower-value>
Error 4: org.apache.spark.SparkException: Exception thrown in awaitResult: (state=,code=0) Resolution: hive.metastore.try.direct.sql: false (in hive-site.xml)
To enable heap dump collection for spark driver and executors, for debugging out of memory errors
spark.driver.extraJavaOptions: '-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path-to-dump-file-location>'
spark.executor.extraJavaOptions: '-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path-to-dump-file-location>'
... View more
Labels:
07-13-2017
10:39 PM
2 Kudos
If the issue is that the number of rows is too high starting beeline with beeline --incremental=true will be of help
... View more
06-29-2017
09:05 PM
Thanks @Kshitij Badani, will configure a local user as I am about to try some spark queries
... View more
04-28-2017
11:12 PM
12 Kudos
Hive is having an improvised explain plan with the hive.explain.user set to true. Now the indentation and arrows clearly depict the execution flow. Retrieving the join order needed a lot of back and forth references in the old style which was made as easy as following the arrows now, even with the details of mappers and reducers.
Let us take a look at one sample explain plan and retrieve the join order
Plan optimized by CBO.
Vertex dependency in root stage
Map 2 <- Map 1 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE), Map 8 (BROADCAST_EDGE), Map 9 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3 vectorized, llap
File Output Operator [FS_101]
Group By Operator [GBY_100] (rows=1 width=736)
Output:["_col0","_col1","_col2","_col3"],aggregations:["min(VALUE._col0)","min(VALUE._col1)","min(VALUE._col2)","min(VALUE._col3)"]
<-Map 2 [SIMPLE_EDGE] llap
SHUFFLE [RS_45]
Group By Operator [GBY_44] (rows=1 width=736)
Output:["_col0","_col1","_col2","_col3"],aggregations:["min(_col1)","min(_col3)","min(_col14)","min(_col19)"]
Map Join Operator [MAPJOIN_92] (rows=80573328660391072 width=114)
Conds:MAPJOIN_91._col11, _col7=RS_41._col0, _col0(Inner),Output:["_col1","_col3","_col14","_col19"]
<-Map 9 [BROADCAST_EDGE] llap
BROADCAST [RS_41]
PartitionCols:_col0, _col0
Select Operator [SEL_23] (rows=1960358 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_86] (rows=1960358 width=104)
predicate:id is not null
TableScan [TS_21] (rows=1960358 width=104)
default@title,t,Tbl:COMPLETE,Col:NONE,Output:["id","title"]
<-Map Join Operator [MAPJOIN_91] (rows=73248479012737488 width=125)
Conds:MAPJOIN_90._col10=RS_38._col0(Inner),Output:["_col1","_col3","_col7","_col11","_col14"]
<-Map 8 [BROADCAST_EDGE] llap
BROADCAST [RS_38]
PartitionCols:_col0
Select Operator [SEL_20] (rows=2 width=40)
Output:["_col0"]
Filter Operator [FIL_85] (rows=2 width=40)
predicate:((UDFToString(role) = 'actress') and id is not null)
TableScan [TS_18] (rows=4 width=40)
default@role_type,rt,Tbl:COMPLETE,Col:NONE,Output:["id","role"]
<-Map Join Operator [MAPJOIN_90] (rows=66589524931926816 width=138)
Conds:MAPJOIN_89._col0, _col6=RS_35._col0, _col0(Inner),Output:["_col1","_col3","_col7","_col10","_col11","_col14"]
<-Map 7 [BROADCAST_EDGE] llap
BROADCAST [RS_35]
PartitionCols:_col0, _col0
Select Operator [SEL_17] (rows=764741 width=105)
Output:["_col0","_col1"]
Filter Operator [FIL_84] (rows=764741 width=105)
predicate:((UDFToString(gender) = 'f') and (name like '%An%') and id is not null)
TableScan [TS_15] (rows=3058966 width=105)
default@name,n,Tbl:COMPLETE,Col:NONE,Output:["id","name","gender"]
<-Map Join Operator [MAPJOIN_89] (rows=60535930444216152 width=152)
Conds:MAPJOIN_88._col7, _col4=RS_32._col0, _col1(Inner),Output:["_col0","_col1","_col3","_col6","_col7","_col10","_col11"]
<-Map 6 [BROADCAST_EDGE] llap
BROADCAST [RS_32]
PartitionCols:_col0, _col1
Select Operator [SEL_14] (rows=11636372 width=8)
Output:["_col0","_col1"]
Filter Operator [FIL_83] (rows=11636372 width=8)
predicate:(movie_id is not null and company_id is not null)
TableScan [TS_12] (rows=11636372 width=8)
default@movie_companies,mc,Tbl:COMPLETE,Col:NONE,Output:["movie_id","company_id"]
<-Map Join Operator [MAPJOIN_88] (rows=55032662847395656 width=167)
Conds:MAPJOIN_87._col0, _col2=RS_29._col0, _col2(Inner),Output:["_col0","_col1","_col3","_col4","_col6","_col7","_col10"]
<-Map 4 [BROADCAST_EDGE] llap
BROADCAST [RS_29]
PartitionCols:_col0, _col2
Select Operator [SEL_8] (rows=6110483 width=116)
Output:["_col0","_col1","_col2","_col4"]
Filter Operator [FIL_81] (rows=6110483 width=116)
predicate:((note) IN ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') and person_id is not null and person_role_id is not null and movie_id is not null and role_id is not null)
TableScan [TS_6] (rows=12220966 width=116)
default@cast_info,ci,Tbl:COMPLETE,Col:NONE,Output:["person_id","movie_id","person_role_id","note","role_id"]
<-Map Join Operator [MAPJOIN_87] (rows=50029692413268030 width=184)
Conds:(Inner),(Inner),Output:["_col0","_col1","_col2","_col3","_col4"]
<-Map 1 [BROADCAST_EDGE] llap
BROADCAST [RS_24]
Select Operator [SEL_2] (rows=701938 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_79] (rows=701938 width=104)
predicate:person_id is not null
TableScan [TS_0] (rows=701938 width=104)
default@aka_name,an,Tbl:COMPLETE,Col:NONE,Output:["person_id","name"]
<-Map 5 [BROADCAST_EDGE] llap
BROADCAST [RS_26]
Select Operator [SEL_11] (rows=34365 width=259)
Output:["_col0"]
Filter Operator [FIL_82] (rows=34365 width=259)
predicate:((UDFToString(country_code) = '[us]') and id is not null)
TableScan [TS_9] (rows=68731 width=259)
default@company_name,cn,Tbl:COMPLETE,Col:NONE,Output:["id","country_code"]
<-Select Operator [SEL_5] (rows=2074019 width=104)
Output:["_col0","_col1"]
Filter Operator [FIL_80] (rows=2074019 width=104)
predicate:id is not null
TableScan [TS_3] (rows=2074019 width=104)
default@char_name,chn,Tbl:COMPLETE,Col:NONE,Output:["id","name"]
1. Start reading from the bottom
2. Look for the keyword 'TableScan', the row below it will have the details of the tables being joined
3. Look for the keyword 'Join Operator', the indent of this line will tell us the tables that are part of this join
4. This joined table group will form candidate for another join if the line starts with '<-'
5. Similarly keep going to top to get the complete join order
The join order in the above example boils down to (title(t),(((movie_companies(mc),(cast_info(ci),(company_name(cn),aka_name(an),char_name(chn)))),name(n)),role_type(rt)))
... View more
Labels:
04-28-2017
05:10 PM
7 Kudos
Looks like the data was removed from hdfs but no drop partition query was executed on hive side. The data being displayed from 'show partitions <table_name>' command retrieves data from hive metastore db, which has the information on table partition but not the real data, which is stored in hdfs. About the data missing from hdfs, it might be worth checking the hdfs-audit.log to trace the lifecycle of the directory/file
... View more
02-22-2019
10:32 PM
I am also facing the same issue in HDP3... Is it resolved...????
... View more