About smruti

smruti · ‎09-14-2022

@RamuAnnamalai It looks similar to https://issues.apache.org/jira/browse/IMPALA-10042 Please check what's the value of "Maximum Cached File Handles" under Impala Configuration in CM UI? Set that to Zero(0) and see if the issue still reappears. How do you write to the table? Is there a chance, the data is getting corrupted during the insert?

smruti · ‎09-14-2022

@Asim- Unless your final table has to be a Hive managed(acid) table then, you could incrementally update the Hive table directly using Sqoop. e.g. sqoop import --connect jdbc:oracle:thin:@xx.xx.xx.xx:1521:ORCL --table EMPLOYEE --username user1 --password welcome1 --incremental lastmodified --merge-key employee_id --check-column emp_timestamp --target-dir /usr/hive/warehouse/external/empdata/ Otherwise, the way you are trying is the actually the way Cloudera recommends it.

smruti · ‎09-05-2022

@HanzalaShaikh You may consider DLM replication. This is explained here and here. You set the hive.repl.rootdir to set the location where you you want to store the backup, and use the REPL DUMP command to dump your data and metadata: e.g. REPL DUMP db1 WITH('hive.repl.rootdir'='s3a://blah/'); Refer to the Cloudera documentation for for more details and examples.

smruti · ‎08-31-2022

@mohammad_shamim Did you have Hive HA configured in CDH cluster, in that case, you need to make sure that there are equal number of HS2 instances created in the CDP cluster, because without that HA cannot be attained. Also, make sure that there is no Hiveserver2 instance created under "Hive" service in CDP. It should only be present under Hive on Tez service.

smruti · ‎08-18-2022

@ssuja I am afraid it's not achievable using Ranger. If you already have a data directory owned by a specific user, say user1, you may create a policy in Ranger providing hive and other users access to that directory path(URI), and keep the physical path owned by user1 itself. See, if this is something you can work with. I should also mention, creating an external Hive table without Location clause, will create a directory with hive ownership, for Impersonation is disabled in Hive.

smruti · ‎08-12-2022

Hi @ssuja there is a Hive property that would help you achieve what you are aiming for. Look for hive.server2.enable.doAs under Hive on Tez configurations and enable it. However, there is a catch. This property needs to be disabled if you are using Ranger for authorization. If you are not using Ranger, and using Storage Based Authorization(which is not the recommended in CDP), then you could definitely enable this. Refer to the doc here.

smruti · ‎08-05-2022

@xinghx The only difference between CDP 7.1.1 and 7.1.7 is HIVE-24920. In your test case, the CREATE TABLE statement is creating an External table with "TRANSLATED_TO_EXTERNAL" table property set to "TRUE". Your second query to change the table to a Managed/acid table does not really work, so that query has no impact apart from just adding a table property. Now coming to the RENAME query, I notice it does not change the location in CDP 7.1.1 either. Please refer to the attachment. In CDP 7.1.7(SP1) it does change the location if we have "TRANSLATED_TO_EXTERNAL" = "TRUE", If we set it to false, we have the same behavior as 7.1.1. alter table alter_test set tblproperties("TRANSLATED_TO_EXTERNAL"="FALSE"); I hope this helps.

smruti · ‎08-03-2022

@xinghx This is an expected behavior in later version of CDP. Please refer to this Release note. If yours is a managed table, in the default warehouse location, the HDFS path will be renamed, the way you expect it to. However, if you plan to rename an External table, you will also need to change the location accordingly: ALTER TABLE <tableName> RENAME TO <newTableName>; ALTER TABLE <newTableName> set location "hdfs://<location>";

smruti · ‎08-01-2022

@Imran_chaush If you are on CDP, and using Ranger for authorization, then you may check the audit log to see which users tried to access that specific database and table. Else, you will have to read the raw log file to see what are the queries run on a specific table, and then try to find out the users submitting those queries. e.g. grep -E 'Compiling.*<table name>' /var/log/hive/hadoop-cmf-hive_on_tez-HIVESERVER2-node1.log.out Column 5 is your session ID, and you may grep for the session ID again to find the user associated with it.

smruti · ‎08-01-2022

@Caliber The following command should work: # for hql in {a.hql,b.hql}; do beeline -n hive -p password --showheader=false --silent=true -f $hql; done

Online	Offline
Last Visited	‎11-01-2024 04:13 AM

Member Since	‎10-28-2020 05:19 AM
Last Visited	‎11-01-2024 04:13 AM
Posts	551
Kudos received	43

Cloudera Community

Re: ANALYZE command not write data into hive metas...

Re: HBase stores base64 data when data is inserted...

Re: Deleting hive service on CDP Private Base and ...

Re: Not Able to run import command. it fails with ...

Re: Any alternate for org.apache.hive:hive-jdbc ma...

Re: has an Invalid parquet version number:017

Re: Incrementally ETL to Apache Hive

Re: Backup and Disaster recovery alternative optio...

Re: CDP upgrade from CDH

Re: How to give access to users to create and own ...

Re: How to give access to users to create and own ...

Re: hive 执行rename后，表的location没有相应发生改变

Re: hive 执行rename后，表的location没有相应发生改变

Re: How Many User using Hive DB and Table

Re: Beeline file name parameterize