Member since
02-08-2019
28
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3133 | 06-13-2019 12:19 AM |
09-24-2019
01:46 AM
1 Kudo
@eMazarakis, later releases do not support asterisk either, it will be treated as a literal. The expressions that are available can be found here in chapter 'To drop or alter multiple partitions'. Previously, I was referring to the intention behind "part_col='201801*' ", it suggests that the desired outcome of this expression would be to remove all data from January 2018 in one operation. However, as it is not possible in CDH 5.9, I was proposing to choose a different partition strategy if multiple partitions have to be dropped frequently and the size of the data allows. For example, if after ingestion only 1 analytic query is executed on the data, then the days have to be dropped one-by-one, which is 32 operations. Therefore, if the size of the data allows, the number of operations could be reduced to 2 with a different partition strategy where the table is partitioned by month.
... View more
09-20-2019
10:11 AM
So it looks like column specific is only on a table without partitions (non-incremental) @hores that's incorrect, non-incremental compute stats works on partitioned tables and is generally the preferred method for collecting stats on partitioned tables. We've generally tried to steer people away from incremental stats because of the size issues on large tables, It would also be error-prone to use correctly and complex to implement - what happens if you compute incremental stats with different subsets of the columns? You can end up with different subsets of the columns on different partitions and then you have to somehow reconcile it all each time.
... View more
08-14-2019
05:27 PM
For number 2, ANY changes outside of Impala, you will need INVALIDATE METADATA, or if new data added, then REFRESH will do. Work is underway to improve it: https://issues.apache.org/jira/browse/IMPALA-3124 Cheers Eric
... View more
08-09-2019
07:09 PM
1 Kudo
Hi, "HiveServer2 Enable Impersonation is setting to TRUE" is probably the reason. When Impersonation is true, it means Hive will impersonate as the end user who runs the query to submit jobs. Your ACL output showed that the directory is owned by "hive:hive" and as @Tomas79 found out, you have sticky bit set, so if hive needs to impersonate as the end user, the end user who runs the query will not be able to delete the path as he/she is not the owner. If impersonation is OFF, then HS2 will run query as "hive" user (the user that runs HS2 process), then you should not see such issue. I assume you have no sentry? As sentry will require Impersonation to be OFF on HS2 side, so that all queries will be running under "hive" user. To test the theory, try to remove the sticky bit on this path and drop again in Hive. Cheers Eric
... View more
06-13-2019
12:19 AM
@Consult I found the solution. The sqoop command creates a YARN process, type MAPREDUCE. So if we only kill the processes through unix shell, this YARN process will continue to run at the background. So from the cloudera manager, we go to YARN --> Applications and then we kill the YARN process. .
... View more
03-15-2019
04:56 AM
Dear @AnisurRehman You can import data from RDBMS to HDFS only with SQOOP. Then If you want to manipulate this table through Impala-Shell then you only need to run the following command from a pc where Impala is installed. impala-shell -d db_name -q "INVALIDATE METADATA tablename"; You have to do INVALIDATE because your table is new for Impala daemon metadata. Then if you append new data-files to the existing tablename table you only need to do refesh, the command is impala-shell -d db_name -q "REFRESH tablename"; Refresh due to the fact that you do not want the whole metadata for the specific table, only the block location for the new data-files. So after that you can quey the table through Impala-shell and Impala query editor.
... View more