Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1856 | 11-29-2023 01:16 PM | |
| 2351 | 10-27-2023 04:29 PM | |
| 1883 | 07-07-2023 10:20 AM | |
| 3880 | 03-21-2023 08:35 AM | |
| 1343 | 01-25-2023 08:50 PM |
02-03-2021
04:53 PM
One thing I noticed today in case it may help with this issue... Today I tried the sqoop from MSQL -> Hbase again on a new table with compression set and pre-split in Cloudera 5.15.1 and Cloudera 6.2.1 environments, Hbase configuration (and HDFS configuration for that matter) is almost identical. In the Cloudera 6.2.1 (ie. Hbase 2.1.2) environment I see the flush to the HStoreFile happen fairly quickly (only about 32,000 new entries) and in the logs it mentions 'Adding a new HDFS file' of size 321Kb. In the Cloudera 5.15.1 (ie. Hbase 1.2.x) environment I see the flush happen to the HStoreFile take longer and there are 700,000 entries being flush and the 'Adding a new HDFS file' is of size 6.5Mb. The memstore flush size is set to 128Mb in both environments and region servers have 24Gb available. So I think it's hitting the 0.4 heap factor for memstores and then it flushes in both cases. Also there are only a few tables with heavy writes so most of the other tables are fairly idle. So I don't think they would take up much memstore space. In the Cloudera 6.2.1 environment each server holds about 240 regions. In the Cloudera 5.15.1 environment each server holds about 120 regions. My thinking is that if I can get the Cloudera 6.2.1/hbase 2.1.2 memstore flush happening with a similar size and number of entries as the Cloudera 5.15.1 environment the performance issue for large writes would be solved. Just not sure how to make that happen. I also noticed that minor compactions happen in both environments take a similar amount of time so I think that's not an issue. Richard
... View more
01-21-2021
09:11 PM
Hi Igor, You can define what users can and cannot do in Atlas by way of defining authorization policies in Ranger. Details on how to do that can be found here: https://docs.cloudera.com/runtime/7.2.6/atlas-securing/topics/atlas-configure-ranger-authorization.html What you refer to as Bookmarks can potentially be done via Saved Searches (see here), depending on what you want to achieve. As for the popularity score, this could be made a metadata attribute that can be updated by users. There is no automation to derive this score with Atlas out-of-the-box. Hope this helps, Alex
... View more
01-15-2021
10:11 AM
Two things to check: 1. Does your nifi service user account have permissions on the table and hdfs location where it's trying to do the insert; 2. Your Hive SQL statement here looks a bit off to me: insert into transformed_db.tbl_airbnb_listing_transformed
select a.*, 20210113 partition_date_id from
staging_db.etbl_raw_airbnb_listing a Is 20210113 a column name? Are you missing a comma between that and parition_date_id? Is your source staging table partitioned? If you are trying to select only a specific date, than the syntax to do that is different.
... View more
12-18-2020
11:56 AM
Settings look fine. _HOST gets replaced by the actual FQDN of the host at runtime. One thing to check is to make sure reverse DNS lookup works on all hosts.
... View more
12-18-2020
10:35 AM
Out-of-the-box Hue can't properly parse this format. There are some potential solutions in this thread: https://stackoverflow.com/questions/13628658/hive-load-csv-with-commas-in-quoted-fields and it depends on what you are comfortable with: pre-processing the file to reformat the input or to use a different SerDe in Hive. Hope that helps, Alex
... View more
12-18-2020
12:45 AM
Hello @Anks2411 Thanks for sharing the Cause. To your query, Yes, HBase Balancer should be enabled & "balance_switch" should be set as "true". Once you have no further queries, Kindly mark the Post as Solved as well. - Smarak
... View more
12-15-2020
09:32 PM
Thanks @aakulov. Appreciate it. Currently we do not have subscription for HDP/HDF cluster.
... View more
12-15-2020
10:05 AM
1 Kudo
If you just execute SET hive.auto.convert.join=true; in your Hive session that will apply for the duration of your session. Keep in mind though that this setting is set to true by default since Hive 0.11.0. Regards, Alex
... View more
12-15-2020
09:56 AM
I was able to reproduce this error and it looks like the problem is the identical column name in your tableA and tableB. Namely, DateColumn is referenced in the subquery. Hive interprets this as a reference to the parent query which is not allowed (per limitation listed here). Essentially it's confused what you mean by this query due to overloaded column name. To solve this, you can explicitly specify table names when referring to columns: UPDATE tableA
SET tableA.ColA = "Value"
WHERE year(tableA.DateColumn) >= (
select (max(year(tableB.DateColumn))-1)
from tableB
) Let me know if this works. Regards, Alex
... View more
12-09-2020
09:53 AM
Hi thanks for the answer , that really helpded. now im trying to to configure nifi processor so i can put my file on hdfs vm . no cluster on hdfs virtual machine on which i have hadoop i need the hadoop configuration ressource and also kerberos many thanks
... View more