Support Questions

Find answers, ask questions, and share your expertise

ANALYZE command not write data into hive metastore

avatar
Explorer

Good day,

In SIT, I have manually create a hive table. Follow by manually create a partition on it. Then I copy the bucket file from other table and put inside the partition. At this moment, if I run select statement, I getting 0 result, I believe this is normal because data haven't write to meta store.

Then I run the following commands:

ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12') 
COMPUTE STATISTICS;

After I run this command, I able to select data from my table.

After that I apply the same way in production, but I am still getting empty result.

Currently I am trying to recover all the meta store in production, this is because 1 of the person accidently delete all the meta store in production. Thus now I am trying to recover on it.

For your information, parquet tables working fine. But I have some ORC tables also, now pending ORC table to resolve.

Please help.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@AhXian Ideally it is not supported to copy files into and out of an ACID table location. In this case we can use a workaround:

1. Create an external table similar(same columns) as the original ACID table.

2. Upload the data file into the external table location.

3. MSCK REPAIR table <external table name>

4. See if you can read the data from the external table now.

5. If [4] works, insert the data into the managed table by using:

insert into table managed_table_name partition(xx,yy) select * from external_table_name;

View solution in original post

14 REPLIES 14

avatar
Community Manager

@AhXian, Welcome to our community! To help you get the best possible answer, I have tagged our Hive experts  @smruti @asish @Asok who may be able to assist you further.

Please feel free to provide any additional information or details about your query, and we hope that you will find a satisfactory solution to your question.



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar
Master Collaborator

@AhXian These ORC tables we are talking about, are they ACID?

If we copy data files to ACID table location, and run ANALYZE...COMPUTE STATISTICS, that alone might not resolve the issue. We would like for you to share an example. Please share the following command output:

describe formatted table_name partition(dl_created_yr='2023', dl_created_mth='12')

&HDFS command o/p

hdfs dfs -ls -R <partition location from the above command>

avatar
Explorer

Hi @smruti ,
Yes, those ORC tables are ACID.

describe formatted table_name partition(dl_created_yr='2023', dl_created_mth='12')

col_name,data_type,comment
uuid,string,
pymt_inst_trx_uuid,varchar(16),OriSrcDType: hexadecimal
pymt_inst_segment_uuid,varchar(16),OriSrcDType: hexadecimal
trx_ref_no,varchar(50),
trx_status,smallint,OriSrcDType: tinyint unsigned
journal_id,varchar(50),
promo_code,varchar(50),
fee_scheme,varchar(50),
from_account,varchar(50),
from_account_name,varchar(100),
to_account,varchar(50),
to_account_name,varchar(150),
currency,varchar(50),
amount,"decimal(20,6)",
charge,"decimal(20,6)",
tax,"decimal(20,6)",
tax_type,varchar(50),
process_date,timestamp,
process_end_date,timestamp,
error_code,varchar(10),
error_message,varchar(200),
src_ref_no,varchar(20),
ims_message_id,varchar(27),
src_ref_no_2,varchar(20),
created_date,timestamp,
created_by,varchar(16),OriSrcDType: hexadecimal
updated_date,timestamp,
updated_by,varchar(16),OriSrcDType: hexadecimal
info,string,OriSrcDType: json
dq_is_dirty,boolean,
dq_error_code_mandatory,varchar(255),
dq_error_code_optional,varchar(255),
dq_sla,varchar(255),
dq_priority,varchar(255),
dl_source_system,varchar(255),
dl_file_name,varchar(255),
dl_created_dt,timestamp,
dl_created_day,varchar(2),
dl_created_time,varchar(8),
,NULL,NULL
# Partition Information,NULL,NULL
# col_name,data_type,comment
dl_created_yr,varchar(4),
dl_created_mth,varchar(2),
,NULL,NULL
# Detailed Partition Information,NULL,NULL
Partition Value: ,"[2023, 12] ",NULL
Database: ,raw_zone_sit ,NULL
Table: ,jxtemp2 ,NULL
CreateTime: ,Thu Jul 18 17:42:30 MYT 2024,NULL
LastAccessTime: ,UNKNOWN ,NULL
Location: ,hdfs://cdpuat.devambankgroup.devahb.com:8020/warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12,NULL
Partition Parameters:,NULL,NULL
,COLUMN_STATS_ACCURATE,"{\""BASIC_STATS\"":\""true\"",\""COLUMN_STATS\"":{\""amount\"":\""true\"",\""charge\"":\""true\"",\""created_by\"":\""true\"",\""created_date\"":\""true\"",\""currency\"":\""true\"",\""dl_created_day\"":\""true\"",\""dl_created_dt\"":\""true\"",\""dl_created_time\"":\""true\"",\""dl_file_name\"":\""true\"",\""dl_source_system\"":\""true\"",\""dq_error_code_mandatory\"":\""true\"",\""dq_error_code_optional\"":\""true\"",\""dq_is_dirty\"":\""true\"",\""dq_priority\"":\""true\"",\""dq_sla\"":\""true\"",\""error_code\"":\""true\"",\""error_message\"":\""true\"",\""fee_scheme\"":\""true\"",\""from_account\"":\""true\"",\""from_account_name\"":\""true\"",\""ims_message_id\"":\""true\"",\""info\"":\""true\"",\""journal_id\"":\""true\"",\""process_date\"":\""true\"",\""process_end_date\"":\""true\"",\""promo_code\"":\""true\"",\""pymt_inst_segment_uuid\"":\""true\"",\""pymt_inst_trx_uuid\"":\""true\"",\""src_ref_no\"":\""true\"",\""src_ref_no_2\"":\""true\"",\""tax\"":\""true\"",\""tax_type\"":\""true\"",\""to_account\"":\""true\"",\""to_account_name\"":\""true\"",\""trx_ref_no\"":\""true\"",\""trx_status\"":\""true\"",\""updated_by\"":\""true\"",\""updated_date\"":\""true\"",\""uuid\"":\""true\""}}"
,numFiles ,0
,numRows ,0
,rawDataSize ,0
,totalSize ,0
,transient_lastDdlTime,1721295750
,NULL,NULL
# Storage Information,NULL,NULL
SerDe Library: ,org.apache.hadoop.hive.ql.io.orc.OrcSerde,NULL
InputFormat: ,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat,NULL
OutputFormat: ,org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,NULL
Compressed: ,No ,NULL
Num Buckets: ,31 ,NULL
Bucket Columns: ,"[dl_created_day, dl_created_time]",NULL
Sort Columns: ,[] ,NULL
,NULL,NULL
# Constraints,NULL,NULL
,NULL,NULL
# Primary Key,NULL,NULL
Table: ,raw_zone_sit.jxtemp2,NULL
Constraint Name: ,pk_211785244_1721295718303_0,NULL
Column Name: ,uuid ,NULL

hdfs dfs -ls -R /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12

drwxr-xr-x - root hadoop 0 2024-07-18 17:42 /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12/delta_0000001_0000001_0000
-rw-r--r-- 3 root hadoop 11495 2024-07-18 17:42 /warehouse/tablespace/managed/hive/raw_zone_sit.db/jxtemp2/dl_created_yr=2023/dl_created_mth=12/delta_0000001_0000001_0000/bucket_00023_0


Thanks.

avatar
Master Collaborator

@AhXian the metadata is still not updated, as I see the following:

,numFiles ,0
,numRows ,0
,rawDataSize ,0
,totalSize ,0

Let's try the following commands one after the other:

MSCK REPAIR raw_zone_sit.jxtemp SYNC PARTITIONS;

ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12')
COMPUTE STATISTICS;

ANALYZE TABLE raw_zone_sit.jxtemp PARTITION (dl_created_yr='2023', dl_created_mth='12')
COMPUTE STATISTICS for COLUMNS;

See if the describe command still reflects 0 files.

Also, could you collect the output of the following command from the backend metastore database?

select * from NEXT_WRITE_ID where NWI_TABLE='jxtemp';

avatar
Explorer

hi @smruti , thanks for your response, Yes I have run the 3 commands you mention actually, its only working in SIT environment but not working in Prod environment.

Unfortunately, now we already copy out all the hdfs file in some backup folder, and we drop the database and re-create database for encryption purpose. Thus, now I unable to select any data from NEXT_WRITE_ID because the table no longer there.

Anyhow, I still keep the hdfs bucket files. I am thinking to re-create the table and partition, and then put in the bucket files and run the 3 commands, but they are not working. May I know what direction I should go? As this method is working in SIT, but not sure why Prod not working.

avatar
Master Collaborator

@AhXian Ideally it is not supported to copy files into and out of an ACID table location. In this case we can use a workaround:

1. Create an external table similar(same columns) as the original ACID table.

2. Upload the data file into the external table location.

3. MSCK REPAIR table <external table name>

4. See if you can read the data from the external table now.

5. If [4] works, insert the data into the managed table by using:

insert into table managed_table_name partition(xx,yy) select * from external_table_name;

avatar
Explorer

hi @smruti , thanks for your response.
Yes, step 1 until step 4 is what I exactly did last time, where step [3] I add in 1 more command call Analyze.
In [4], I able to select the data, in SIT environment.
The weird thing is, same thing I done in production, but its not working in [4]. 

avatar
Master Collaborator

In Hive ACID tables, the base/delta files are associated with a writeID. My assumption is the write ID has changed or got removed in the production HMS, so hive does not recognize the data file. So, we will have to follow the workaround that I mentioned in my last update. 

avatar
Explorer

Hi @smruti , the workaround u means are the 5 steps right?