Member since
07-16-2018
15
Posts
2
Kudos Received
0
Solutions
02-19-2021
08:22 AM
Hi , Where you able to get this issue sorted out. Regards Sam
... View more
07-16-2018
09:49 AM
1 Kudo
@Satish Anjaneyappa I think when you are running the hive query it's generating some temporary files in HDFS. Can you check the remaining space in hdfs $ hdfs dfsadmin -report When you delete files in hadoop it drops the files to the /.Trash but does not remove them. If you are sure of the files you want to delete its always a good idea to use -skipTrash option $ hdfs dfs -rm -skipTrash /path/to/file This will skip the trash However, if you do not add the -skipTrash flag, files are stored in a trash folder, which by default is: /user/hdfs/.Trash Empty the HDFS Trash by running $ hdfs dfs -expunge HTH
... View more
07-11-2018
02:08 PM
I think you would probably want to take a look at the "MERGE" statement; https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge but afaik it would need to only keep the last operation on the row to make it work; which is what you are also suggesting (I,U,D = D)... -- merge is only supported on transactional tables
set hive.support.concurrency=true;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
drop table if exists target;
drop table if exists upd;
create table target (id string, t timestamp)
stored as orc
tblproperties('transactional'='true');
create table upd (id string, iud string, t timestamp);
insert into target values ('1','2001-11-11'),
('10','2001-11-11');
insert into upd values ('1','D','2001-11-11'),
('2','U','2001-10-11'),
('2','D','2001-11-11'),
('2','I','2001-12-11'),
('3','U','2001-10-11'),
('3','D','2001-11-11'),
('4','I','2001-11-11'),
('10','D','2001-11-11');
-- the following will filter the "final" updates
select * from (
select *,LEAD(id,1) over (partition by id order by t) as lead
from upd
order by t
) tt where lead is null;
-- this merge statement will bake the updates into the target table
merge into target as ta using
(
select * from (
select *,LEAD(id,1) over (partition by id order by t) as lead
from upd
order by t
) tt where lead is null
) as up
ON ta.id=up.id
when matched and up.iud='D' then DELETE
when matched and up.iud='U' then update set t=up.t
when not matched and up.iud='I' then insert values (up.id,up.t);
for more info about using LEAD in this query: https://sqlsunday.com/2016/04/11/last-row-per-group/
... View more
06-28-2018
04:44 PM
Hi @Satish Anjaneyappa! Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS? I made a test here, and it's doing good so far: --TABLE HAS 50 ROWS!
0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`(
0: jdbc:hive2://node3:10000/default> `gender` string,
0: jdbc:hive2://node3:10000/default> `age` int,
0: jdbc:hive2://node3:10000/default> `salary` double,
0: jdbc:hive2://node3:10000/default> `zip` int)
0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED
0: jdbc:hive2://node3:10000/default> FIELDS TERMINATED BY ','
0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.mapred.TextInputFormat'
0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
0: jdbc:hive2://node3:10000/default> LOCATION
0: jdbc:hive2://node3:10000/default> 'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries'
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default> 'COLUMN_STATS_ACCURATE'='true',
0: jdbc:hive2://node3:10000/default> 'numFiles'='1',
0: jdbc:hive2://node3:10000/default> 'numRows'='0',
0: jdbc:hive2://node3:10000/default> 'rawDataSize'='732',
0: jdbc:hive2://node3:10000/default> 'totalSize'='781',
0: jdbc:hive2://node3:10000/default> 'transient_lastDdlTime'='1529819960');
No rows affected (0.443 seconds)
0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries;
+------------------------------------------------------------------------------------------------------+--+
| Explain |
+------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE |
| Select Operator |
| Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE |
| Group By Operator |
| aggregations: count(1) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| Reduce Output Operator |
| sort order: |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| value expressions: _col0 (type: bigint) |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: count(VALUE._col0) |
| mode: mergepartial |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE |
| table: |
| input format: org.apache.hadoop.mapred.TextInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+------------------------------------------------------------------------------------------------------+--+
42 rows selected (0.255 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
| Explain |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.232 seconds)
0: jdbc:hive2://node3:10000/default> desc salaries;
+-----------+------------+----------+--+
| col_name | data_type | comment |
+-----------+------------+----------+--+
| gender | string | |
| age | int | |
| salary | double | |
| zip | int | |
+-----------+------------+----------+--+
4 rows selected (0.426 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+------------------------------------------------------------------------------------------------+--+
| Explain |
+------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: age (type: int) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+------------------------------------------------------------------------------------------------+--+
0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ;
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1529940007017_0004
INFO : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/
INFO : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/
INFO : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job -kill job_1529940007017_0004
INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO : 2018-06-28 16:38:56,357 Stage-0 map = 0%, reduce = 0%
INFO : 2018-06-28 16:39:02,796 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec
INFO : MapReduce Total cumulative CPU time: 2 seconds 930 msec
INFO : Ended Job = job_1529940007017_0004
INFO : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732]
No rows affected (16.338 seconds)
0: jdbc:hive2://node3:10000/default> explain select * from salaries;
+----------------------------------------------------------------------------------------------------------+--+
| Explain |
+----------------------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) |
| outputColumnNames: _col0, _col1, _col2, _col3 |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+----------------------------------------------------------------------------------------------------------+--+
17 rows selected (0.226 seconds)
0: jdbc:hive2://node3:10000/default> explain select age from salaries;
+-----------------------------------------------------------------------------------------------+--+
| Explain |
+-----------------------------------------------------------------------------------------------+--+
| STAGE DEPENDENCIES: |
| Stage-0 is a root stage |
| |
| STAGE PLANS: |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| TableScan |
| alias: salaries |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: age (type: int) |
| outputColumnNames: _col0 |
| Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE |
| ListSink |
| |
+-----------------------------------------------------------------------------------------------+--+
17 rows selected (0.227 seconds) And also, try to set the set hive.stats.autogather=true; Hope this helps!
... View more
03-09-2018
10:28 AM
@rtrivedi Yes got some hive logs .. 2018-03-06 03:42:42,951 INFO [HiveServer2-Background-Pool: Thread-1132047]: exec.MoveTask (MoveTask.java:execute(309)) - Partition is: {region_code=null, fiscal_year_week_code=null}
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298)
2018-03-06 03:49:24,600 ERROR [HiveServer2-Background-Pool: Thread-1132295]: ql.Driver (SessionState.java:printError(962)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
... View more
04-28-2017
01:46 PM
1 Kudo
Thanks a lot .. It works Consolidated answer
ALTER TABLE abc.Stage_xact_personall_region_of_Sale DROP PARTITION (region_code='TYES'); Then do an overwrite on it insert overwrite table abc.Stage_xact_personall_region_of_Sale partition(region_code) select * from xyz.Region_Sale where region_code='TYES'; -Thanks Satish
... View more