Created on 07-18-2024 01:47 AM - edited 07-18-2024 01:49 AM
Good day,
In SIT, I have manually create a hive table. Follow by manually create a partition on it. Then I copy the bucket file from other table and put inside the partition. At this moment, if I run select statement, I getting 0 result, I believe this is normal because data haven't write to meta store.
Then I run the following commands:
ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12') COMPUTE STATISTICS;
After I run this command, I able to select data from my table.
After that I apply the same way in production, but I am still getting empty result.
Currently I am trying to recover all the meta store in production, this is because 1 of the person accidently delete all the meta store in production. Thus now I am trying to recover on it.
For your information, parquet tables working fine. But I have some ORC tables also, now pending ORC table to resolve.
Please help.
Created 07-23-2024 11:49 PM
@AhXian Ideally it is not supported to copy files into and out of an ACID table location. In this case we can use a workaround:
1. Create an external table similar(same columns) as the original ACID table.
2. Upload the data file into the external table location.
3. MSCK REPAIR table <external table name>
4. See if you can read the data from the external table now.
5. If [4] works, insert the data into the managed table by using:
insert into table managed_table_name partition(xx,yy) select * from external_table_name;
Created 07-18-2024 03:34 AM
@AhXian, Welcome to our community! To help you get the best possible answer, I have tagged our Hive experts @smruti @asish @Asok who may be able to assist you further.
Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.
Regards,
Vidya Sargur,Created 07-22-2024 08:45 AM
@AhXian These ORC tables we are talking about, are they ACID?
If we copy data files to ACID table location, and run ANALYZE...COMPUTE STATISTICS, that alone might not resolve the issue. We would like for you to share an example. Please share the following command output:
describe formatted table_name partition(dl_created_yr='2023', dl_created_mth='12')
&HDFS command o/p
hdfs dfs -ls -R <partition location from the above command>
Created 07-22-2024 07:06 PM
Hi @smruti ,
Yes, those ORC tables are ACID.
describe formatted table_name partition(dl_created_yr='2023', dl_created_mth='12')
col_name,data_type,comment
uuid,string,
pymt_inst_trx_uuid,varchar(16),OriSrcDType: hexadecimal
pymt_inst_segment_uuid,varchar(16),OriSrcDType: hexadecimal
trx_ref_no,varchar(50),
trx_status,smallint,OriSrcDType: tinyint unsigned
journal_id,varchar(50),
promo_code,varchar(50),
fee_scheme,varchar(50),
from_account,varchar(50),
from_account_name,varchar(100),
to_account,varchar(50),
to_account_name,varchar(150),
currency,varchar(50),
amount,"decimal(20,6)",
charge,"decimal(20,6)",
tax,"decimal(20,6)",
tax_type,varchar(50),
process_date,timestamp,
process_end_date,timestamp,
error_code,varchar(10),
error_message,varchar(200),
src_ref_no,varchar(20),
ims_message_id,varchar(27),
src_ref_no_2,varchar(20),
created_date,timestamp,
created_by,varchar(16),OriSrcDType: hexadecimal
updated_date,timestamp,
updated_by,varchar(16),OriSrcDType: hexadecimal
info,string,OriSrcDType: json
dq_is_dirty,boolean,
dq_error_code_mandatory,varchar(255),
dq_error_code_optional,varchar(255),
dq_sla,varchar(255),
dq_priority,varchar(255),
dl_source_system,varchar(255),
dl_file_name,varchar(255),
dl_created_dt,timestamp,
dl_created_day,varchar(2),
dl_created_time,varchar(8),
,NULL,NULL
# Partition Information,NULL,NULL
# col_name,data_type,comment
dl_created_yr,varchar(4),
dl_created_mth,varchar(2),
,NULL,NULL
# Detailed Partition Information,NULL,NULL
Partition Value: ,"[2023, 12] ",NULL
Database: ,raw_zone_sit ,NULL
Table: ,jxtemp2 ,NULL
CreateTime: ,Thu Jul 18 17:42:30 MYT 2024,NULL
LastAccessTime: ,UNKNOWN ,NULL
Location: ,hdfs://cdpuat.devambankgroup.devahb.com:8020/warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12,NULL
Partition Parameters:,NULL,NULL
,COLUMN_STATS_ACCURATE,"{\""BASIC_STATS\"":\""true\"",\""COLUMN_STATS\"":{\""amount\"":\""true\"",\""charge\"":\""true\"",\""created_by\"":\""true\"",\""created_date\"":\""true\"",\""currency\"":\""true\"",\""dl_created_day\"":\""true\"",\""dl_created_dt\"":\""true\"",\""dl_created_time\"":\""true\"",\""dl_file_name\"":\""true\"",\""dl_source_system\"":\""true\"",\""dq_error_code_mandatory\"":\""true\"",\""dq_error_code_optional\"":\""true\"",\""dq_is_dirty\"":\""true\"",\""dq_priority\"":\""true\"",\""dq_sla\"":\""true\"",\""error_code\"":\""true\"",\""error_message\"":\""true\"",\""fee_scheme\"":\""true\"",\""from_account\"":\""true\"",\""from_account_name\"":\""true\"",\""ims_message_id\"":\""true\"",\""info\"":\""true\"",\""journal_id\"":\""true\"",\""process_date\"":\""true\"",\""process_end_date\"":\""true\"",\""promo_code\"":\""true\"",\""pymt_inst_segment_uuid\"":\""true\"",\""pymt_inst_trx_uuid\"":\""true\"",\""src_ref_no\"":\""true\"",\""src_ref_no_2\"":\""true\"",\""tax\"":\""true\"",\""tax_type\"":\""true\"",\""to_account\"":\""true\"",\""to_account_name\"":\""true\"",\""trx_ref_no\"":\""true\"",\""trx_status\"":\""true\"",\""updated_by\"":\""true\"",\""updated_date\"":\""true\"",\""uuid\"":\""true\""}}"
,numFiles ,0
,numRows ,0
,rawDataSize ,0
,totalSize ,0
,transient_lastDdlTime,1721295750
,NULL,NULL
# Storage Information,NULL,NULL
SerDe Library: ,org.apache.hadoop.hive.ql.io.orc.OrcSerde,NULL
InputFormat: ,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,NULL
OutputFormat: ,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,NULL
Compressed: ,No ,NULL
Num Buckets: ,31 ,NULL
Bucket Columns: ,"[dl_created_day, dl_created_time]",NULL
Sort Columns: ,[] ,NULL
,NULL,NULL
# Constraints,NULL,NULL
,NULL,NULL
# Primary Key,NULL,NULL
Table: ,raw_zone_sit.jxtemp2,NULL
Constraint Name: ,pk_211785244_1721295718303_0,NULL
Column Name: ,uuid ,NULL
hdfs dfs -ls -R /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12
drwxr-xr-x - root hadoop 0 2024-07-18 17:42 /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12/delta_0000001_0000001_0000
-rw-r--r-- 3 root hadoop 11495 2024-07-18 17:42 /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12/delta_0000001_0000001_0000/bucket_00023_0
Thanks.
Created 07-23-2024 05:25 AM
@AhXian the metadata is still not updated, as I see the following:
,numFiles ,0
,numRows ,0
,rawDataSize ,0
,totalSize ,0
Let's try the following commands one after the other:
MSCK REPAIR raw_zone_sit.jxtemp SYNC PARTITIONS;
ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12')
COMPUTE STATISTICS;
ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12')
COMPUTE STATISTICS for COLUMNS;
See if the describe command still reflects 0 files.
Also, could you collect the output of the following command from the backend metastore database?
select * from NEXT_WRITE_ID where NWI_TABLE='jxtemp';
Created 07-23-2024 08:27 PM
hi @smruti , thanks for your response, Yes I have run the 3 commands you mention actually, its only working in SIT environment but not working in Prod environment.
Unfortunately, now we already copy out all the hdfs file in some backup folder, and we drop the database and re-create database for encryption purpose. Thus, now I unable to select any data from NEXT_WRITE_ID because the table no longer there.
Anyhow, I still keep the hdfs bucket files. I am thinking to re-create the table and partition, and then put in the bucket files and run the 3 commands, but they are not working. May I know what direction I should go? As this method is working in SIT, but not sure why Prod not working.
Created 07-23-2024 11:49 PM
@AhXian Ideally it is not supported to copy files into and out of an ACID table location. In this case we can use a workaround:
1. Create an external table similar(same columns) as the original ACID table.
2. Upload the data file into the external table location.
3. MSCK REPAIR table <external table name>
4. See if you can read the data from the external table now.
5. If [4] works, insert the data into the managed table by using:
insert into table managed_table_name partition(xx,yy) select * from external_table_name;
Created 07-23-2024 11:53 PM
hi @smruti , thanks for your response.
Yes, step 1 until step 4 is what I exactly did last time, where step [3] I add in 1 more command call Analyze.
In [4], I able to select the data, in SIT environment.
The weird thing is, same thing I done in production, but its not working in [4].
Created 07-24-2024 12:15 AM
In Hive ACID tables, the base/delta files are associated with a writeID. My assumption is the write ID has changed or got removed in the production HMS, so hive does not recognize the data file. So, we will have to follow the workaround that I mentioned in my last update.
Created 07-24-2024 05:35 AM
Hi @smruti , the workaround u means are the 5 steps right?