About aakulov

aakulov · ‎05-25-2021

Hello, I haven't used Flume myself, but there is some mention of serializer.delimiter parameter in the Flume documentation. It would be helpful to know what the source of the data is (e.g. file on hdfs) and what the destination is (e.g. Hive). Also you should know that in Cloudera Data Platform, Flume is no longer a supported component. If you are just starting to learn it, I would recommend saving yourself some time and exploring NiFi, Kafka, and Flink (good starter blog post). Regards, Alex

aakulov · ‎12-16-2020

You'll need to look through the Region Server log files to find the root cause of the problem. The error message you shared is not enough information to go on.

aakulov · ‎12-15-2020

If you just execute SET hive.auto.convert.join=true; in your Hive session that will apply for the duration of your session. Keep in mind though that this setting is set to true by default since Hive 0.11.0. Regards, Alex

aakulov · ‎12-15-2020

Hi Bhushan, The best way to approach this is to reach out to your account team as they will have a better idea of your environment and nuances. At a high level, an in-place upgrade from HDP/HDF 3 to CDP will be available early 2021. Regards, Alex

aakulov · ‎12-08-2020

The reason why doing these operations as cloudbreak user fail is because this is a service user for accessing the cluster's machines only and performing admin tasks on them. this user does not have access to the data (no kerberos principal and no IDBroker mapping). Instead, you can SSH to your cluster's EC2 machines with your username and workload password. That way you will have a kerberos principal working. Another thing to check is to make sure your user has IDBroker mapping to access S3 resources and potentially to access DynamoDB resources as well, since S3Guard relies on Dynamo. Hope this helps, Alex

aakulov · ‎12-08-2020

I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.

aakulov · ‎11-20-2020

There is a way to provide frequency inside coordinator.xml that allows you to specify day-of-week. See here for details: https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_oozie_cron.html

aakulov · ‎11-13-2020

With your original approach, each query can filter out whole partitions of the table based on the WHERE clauses (that is if your table is partitioned and at least some of the columns in the clause use those partitions). However, if your WHERE clauses are pretty different/unique, then you will be scanning big portion of the table for every one of your 100+ queries. With the suggested approach, there is only one scan of the table, but there is more processing that is happening for each row. The best way to see if performance is better is just to test it and go with the winner.

aakulov · ‎11-12-2020

Do your WHERE conditions rely on different columns in MyTable or all the same columns, just different filter criteria? If it's the latter than the answer is partitioning your Hive table based on those key columns. Also if your MyTable is not too big, it would be most efficient to do your 100 queries in memory with something like SparkSQL, rather than Hive.

aakulov · ‎11-03-2020

The error likely indicates that some AWS resources were not reachable from CDP control plane. Double check your security policy settings and any proxy settings. Reach out to support as they will be able to better assist by being able to look at your particular environment setup. Regarding the logs, if the CM instance was stood up in your Data Lake, you can search the logs by clicking "Command logs" or "Service logs" in the Data Lake tab of your Data Lake environment.

Online	Offline
Last Visited	‎09-05-2024 02:11 AM

Member Since	‎02-27-2020 04:13 PM
Last Visited	‎09-05-2024 02:11 AM
Posts	173
Kudos received	42

Cloudera Community

Re: Changing Colours or adding a banner to WebUIs

Re: CDP Public Cloud - Resizing of Worker/Compute ...

Re: How to collect queries submitted by other user...

Re: CDH配置好以后，agent服务能够启动，但是server服务无法启动 (After CDH...

Re: How to increase timeout definition?

Re: Change delimiter in flume

Re: URGENT - Cloudera 5.10 - HBASE Region Not As...

Re: How - Hive property setting works?

Re: HDP/HDF Upgrade to CDP Private Cloud Base

Re: accessing s3guard with hadoop cli

Re: Distcp with wildcard

Re: Schedule oozie job on weekends

Re: Run Multiple Count Operation On Data Table

Re: Run Multiple Count Operation On Data Table

Re: CDP - Data Lake creation failed: Operation tim...