About aakulov

Rjkoop · ‎02-03-2021

One thing I noticed today in case it may help with this issue... Today I tried the sqoop from MSQL -> Hbase again on a new table with compression set and pre-split in Cloudera 5.15.1 and Cloudera 6.2.1 environments, Hbase configuration (and HDFS configuration for that matter) is almost identical. In the Cloudera 6.2.1 (ie. Hbase 2.1.2) environment I see the flush to the HStoreFile happen fairly quickly (only about 32,000 new entries) and in the logs it mentions 'Adding a new HDFS file' of size 321Kb. In the Cloudera 5.15.1 (ie. Hbase 1.2.x) environment I see the flush happen to the HStoreFile take longer and there are 700,000 entries being flush and the 'Adding a new HDFS file' is of size 6.5Mb. The memstore flush size is set to 128Mb in both environments and region servers have 24Gb available. So I think it's hitting the 0.4 heap factor for memstores and then it flushes in both cases. Also there are only a few tables with heavy writes so most of the other tables are fairly idle. So I don't think they would take up much memstore space. In the Cloudera 6.2.1 environment each server holds about 240 regions. In the Cloudera 5.15.1 environment each server holds about 120 regions. My thinking is that if I can get the Cloudera 6.2.1/hbase 2.1.2 memstore flush happening with a similar size and number of entries as the Cloudera 5.15.1 environment the performance issue for large writes would be solved. Just not sure how to make that happen. I also noticed that minor compactions happen in both environments take a similar amount of time so I think that's not an issue. Richard

aakulov · ‎01-21-2021

Hi Igor, You can define what users can and cannot do in Atlas by way of defining authorization policies in Ranger. Details on how to do that can be found here: https://docs.cloudera.com/runtime/7.2.6/atlas-securing/topics/atlas-configure-ranger-authorization.html What you refer to as Bookmarks can potentially be done via Saved Searches (see here), depending on what you want to achieve. As for the popularity score, this could be made a metadata attribute that can be updated by users. There is no automation to derive this score with Atlas out-of-the-box. Hope this helps, Alex

aakulov · ‎01-15-2021

Two things to check: 1. Does your nifi service user account have permissions on the table and hdfs location where it's trying to do the insert; 2. Your Hive SQL statement here looks a bit off to me: insert into transformed_db.tbl_airbnb_listing_transformed select a.*, 20210113 partition_date_id from staging_db.etbl_raw_airbnb_listing a Is 20210113 a column name? Are you missing a comma between that and parition_date_id? Is your source staging table partitioned? If you are trying to select only a specific date, than the syntax to do that is different.

aakulov · ‎12-18-2020

Settings look fine. _HOST gets replaced by the actual FQDN of the host at runtime. One thing to check is to make sure reverse DNS lookup works on all hosts.

aakulov · ‎12-18-2020

Out-of-the-box Hue can't properly parse this format. There are some potential solutions in this thread: https://stackoverflow.com/questions/13628658/hive-load-csv-with-commas-in-quoted-fields and it depends on what you are comfortable with: pre-processing the file to reformat the input or to use a different SerDe in Hive. Hope that helps, Alex

smdas · ‎12-18-2020

Hello @Anks2411 Thanks for sharing the Cause. To your query, Yes, HBase Balancer should be enabled & "balance_switch" should be set as "true". Once you have no further queries, Kindly mark the Post as Solved as well. - Smarak

Bhushan · ‎12-15-2020

Thanks @aakulov. Appreciate it. Currently we do not have subscription for HDP/HDF cluster.

aakulov · ‎12-15-2020

If you just execute SET hive.auto.convert.join=true; in your Hive session that will apply for the duration of your session. Keep in mind though that this setting is set to true by default since Hive 0.11.0. Regards, Alex

aakulov · ‎12-15-2020

I was able to reproduce this error and it looks like the problem is the identical column name in your tableA and tableB. Namely, DateColumn is referenced in the subquery. Hive interprets this as a reference to the parent query which is not allowed (per limitation listed here). Essentially it's confused what you mean by this query due to overloaded column name. To solve this, you can explicitly specify table names when referring to columns: UPDATE tableA SET tableA.ColA = "Value" WHERE year(tableA.DateColumn) >= ( select (max(year(tableB.DateColumn))-1) from tableB ) Let me know if this works. Regards, Alex

toutou · ‎12-09-2020

Hi thanks for the answer , that really helpded. now im trying to to configure nifi processor so i can put my file on hdfs vm . no cluster on hdfs virtual machine on which i have hadoop i need the hadoop configuration ressource and also kerberos many thanks

Online	Offline
Last Visited	‎09-05-2024 02:11 AM

Member Since	‎02-27-2020 04:13 PM
Last Visited	‎09-05-2024 02:11 AM
Posts	173
Kudos received	42

Cloudera Community

Re: Changing Colours or adding a banner to WebUIs

Re: CDP Public Cloud - Resizing of Worker/Compute ...

Re: How to collect queries submitted by other user...

Re: CDH配置好以后，agent服务能够启动，但是server服务无法启动 (After CDH...

Re: How to increase timeout definition?

Re: Sqoop SQL to Hbase - slow after upgrade

Re: Access to the creation of Glossary, Terms, Cat...

Re: Apache Nifi encounter error when setup the Put...

Re: WARN org.apache.hadoop.security.ShellBasedUnix...

Re: Issue - Load data from CSV into HIVE table usi...

Re: URGENT - Cloudera 5.10 - HBASE Region Not As...

Re: HDP/HDF Upgrade to CDP Private Cloud Base

Re: How - Hive property setting works?

Re: Update Query Failing when using Sub query for ...

Re: cloudera nifi