About Bangalore

SamBerchmans · ‎02-19-2021

Hi , Where you able to get this issue sorted out. Regards Sam

Shelton · ‎07-16-2018

@Satish Anjaneyappa I think when you are running the hive query it's generating some temporary files in HDFS. Can you check the remaining space in hdfs $ hdfs dfsadmin -report When you delete files in hadoop it drops the files to the /.Trash but does not remove them. If you are sure of the files you want to delete its always a good idea to use -skipTrash option $ hdfs dfs -rm -skipTrash /path/to/file This will skip the trash However, if you do not add the -skipTrash flag, files are stored in a trash folder, which by default is: /user/hdfs/.Trash Empty the HDFS Trash by running $ hdfs dfs -expunge HTH

zhaindrich · ‎07-11-2018

I think you would probably want to take a look at the "MERGE" statement; https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Merge but afaik it would need to only keep the last operation on the row to make it work; which is what you are also suggesting (I,U,D = D)... -- merge is only supported on transactional tables set hive.support.concurrency=true; set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; drop table if exists target; drop table if exists upd; create table target (id string, t timestamp) stored as orc tblproperties('transactional'='true'); create table upd (id string, iud string, t timestamp); insert into target values ('1','2001-11-11'), ('10','2001-11-11'); insert into upd values ('1','D','2001-11-11'), ('2','U','2001-10-11'), ('2','D','2001-11-11'), ('2','I','2001-12-11'), ('3','U','2001-10-11'), ('3','D','2001-11-11'), ('4','I','2001-11-11'), ('10','D','2001-11-11'); -- the following will filter the "final" updates select * from ( select *,LEAD(id,1) over (partition by id order by t) as lead from upd order by t ) tt where lead is null; -- this merge statement will bake the updates into the target table merge into target as ta using ( select * from ( select *,LEAD(id,1) over (partition by id order by t) as lead from upd order by t ) tt where lead is null ) as up ON ta.id=up.id when matched and up.iud='D' then DELETE when matched and up.iud='U' then update set t=up.t when not matched and up.iud='I' then insert values (up.id,up.t); for more info about using LEAD in this query: https://sqlsunday.com/2016/04/11/last-row-per-group/

vmurakami · ‎06-28-2018

Hi @Satish Anjaneyappa! Hm, what about the ANALYZE TABLE <TBL_NAME> COMPUTE STATISTICS? I made a test here, and it's doing good so far: --TABLE HAS 50 ROWS! 0: jdbc:hive2://node3:10000/default> CREATE EXTERNAL TABLE `salaries`( 0: jdbc:hive2://node3:10000/default> `gender` string, 0: jdbc:hive2://node3:10000/default> `age` int, 0: jdbc:hive2://node3:10000/default> `salary` double, 0: jdbc:hive2://node3:10000/default> `zip` int) 0: jdbc:hive2://node3:10000/default> ROW FORMAT DELIMITED 0: jdbc:hive2://node3:10000/default> FIELDS TERMINATED BY ',' 0: jdbc:hive2://node3:10000/default> STORED AS INPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.mapred.TextInputFormat' 0: jdbc:hive2://node3:10000/default> OUTPUTFORMAT 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' 0: jdbc:hive2://node3:10000/default> LOCATION 0: jdbc:hive2://node3:10000/default> 'hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries' 0: jdbc:hive2://node3:10000/default> TBLPROPERTIES ( 0: jdbc:hive2://node3:10000/default> 'COLUMN_STATS_ACCURATE'='true', 0: jdbc:hive2://node3:10000/default> 'numFiles'='1', 0: jdbc:hive2://node3:10000/default> 'numRows'='0', 0: jdbc:hive2://node3:10000/default> 'rawDataSize'='732', 0: jdbc:hive2://node3:10000/default> 'totalSize'='781', 0: jdbc:hive2://node3:10000/default> 'transient_lastDdlTime'='1529819960'); No rows affected (0.443 seconds) 0: jdbc:hive2://node3:10000/default> explain select count(1) from salaries; +------------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-1 is a root stage | | Stage-0 depends on stages: Stage-1 | | | | STAGE PLANS: | | Stage: Stage-1 | | Map Reduce | | Map Operator Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Select Operator | | Statistics: Num rows: 1 Data size: 732 Basic stats: COMPLETE Column stats: COMPLETE | | Group By Operator | | aggregations: count(1) | | mode: hash | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | Reduce Output Operator | | sort order: | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | value expressions: _col0 (type: bigint) | | Reduce Operator Tree: | | Group By Operator | | aggregations: count(VALUE._col0) | | mode: mergepartial | | outputColumnNames: _col0 | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | File Output Operator | | compressed: false | | Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE | | table: | | input format: org.apache.hadoop.mapred.TextInputFormat | | output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | | serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | ListSink | | | +------------------------------------------------------------------------------------------------------+--+ 42 rows selected (0.255 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 6 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.232 seconds) 0: jdbc:hive2://node3:10000/default> desc salaries; +-----------+------------+----------+--+ | col_name | data_type | comment | +-----------+------------+----------+--+ | gender | string | | | age | int | | | salary | double | | | zip | int | | +-----------+------------+----------+--+ 4 rows selected (0.426 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +------------------------------------------------------------------------------------------------+--+ | Explain | +------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 183 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +------------------------------------------------------------------------------------------------+--+ 0: jdbc:hive2://node3:10000/default> analyze table salaries compute statistics ; INFO : Number of reduce tasks is set to 0 since there's no reduce operator INFO : number of splits:1 INFO : Submitting tokens for job: job_1529940007017_0004 INFO : The url to track the job: http://node4:8088/proxy/application_1529940007017_0004/ INFO : Starting Job = job_1529940007017_0004, Tracking URL = http://node4:8088/proxy/application_1529940007017_0004/ INFO : Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job -kill job_1529940007017_0004 INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0 INFO : 2018-06-28 16:38:56,357 Stage-0 map = 0%, reduce = 0% INFO : 2018-06-28 16:39:02,796 Stage-0 map = 100%, reduce = 0%, Cumulative CPU 2.93 sec INFO : MapReduce Total cumulative CPU time: 2 seconds 930 msec INFO : Ended Job = job_1529940007017_0004 INFO : Table default.salaries stats: [numFiles=1, numRows=50, totalSize=781, rawDataSize=732] No rows affected (16.338 seconds) 0: jdbc:hive2://node3:10000/default> explain select * from salaries; +----------------------------------------------------------------------------------------------------------+--+ | Explain | +----------------------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: gender (type: string), age (type: int), salary (type: double), zip (type: int) | | outputColumnNames: _col0, _col1, _col2, _col3 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +----------------------------------------------------------------------------------------------------------+--+ 17 rows selected (0.226 seconds) 0: jdbc:hive2://node3:10000/default> explain select age from salaries; +-----------------------------------------------------------------------------------------------+--+ | Explain | +-----------------------------------------------------------------------------------------------+--+ | STAGE DEPENDENCIES: | | Stage-0 is a root stage | | | | STAGE PLANS: | | Stage: Stage-0 | | Fetch Operator | | limit: -1 | | Processor Tree: | | TableScan | | alias: salaries | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | Select Operator | | expressions: age (type: int) | | outputColumnNames: _col0 | | Statistics: Num rows: 50 Data size: 732 Basic stats: COMPLETE Column stats: NONE | | ListSink | | | +-----------------------------------------------------------------------------------------------+--+ 17 rows selected (0.227 seconds) And also, try to set the set hive.stats.autogather=true; Hope this helps!

Bangalore · ‎03-09-2018

@rtrivedi Yes got some hive logs .. 2018-03-06 03:42:42,951 INFO [HiveServer2-Background-Pool: Thread-1132047]: exec.MoveTask (MoveTask.java:execute(309)) - Partition is: {region_code=null, fiscal_year_week_code=null} at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:298) 2018-03-06 03:49:24,600 ERROR [HiveServer2-Background-Pool: Thread-1132295]: ql.Driver (SessionState.java:printError(962)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Bangalore · ‎04-28-2017

Thanks a lot .. It works Consolidated answer ALTER TABLE abc.Stage_xact_personall_region_of_Sale DROP PARTITION (region_code='TYES'); Then do an overwrite on it insert overwrite table abc.Stage_xact_personall_region_of_Sale partition(region_code) select * from xyz.Region_Sale where region_code='TYES'; -Thanks Satish

Online	Offline
Last Visited	‎07-18-2018 11:12 PM

Member Since	‎07-16-2018 10:06 AM
Last Visited	‎07-18-2018 11:12 PM
Posts	15
Kudos received	2

Cloudera Community

Re: Restrict memory for multiple hive query

Re: Insufficient space for shared memory file

Re: Skipping rows in hive while querying

Re: Hive explain plan fetching wrong row numbers

Re: Hive : insert into org.apache.hadoop.hive.ql.e...

Re: truncate partition external table