Member since
02-27-2020
173
Posts
42
Kudos Received
48
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1092 | 11-29-2023 01:16 PM | |
1175 | 10-27-2023 04:29 PM | |
1157 | 07-07-2023 10:20 AM | |
2518 | 03-21-2023 08:35 AM | |
922 | 01-25-2023 08:50 PM |
12-16-2020
10:57 PM
You'll need to look through the Region Server log files to find the root cause of the problem. The error message you shared is not enough information to go on.
... View more
12-16-2020
10:52 PM
Can you show the first couple lines of your file exactly as they appear in the file. You can open the CSV with a simple text editor of your choice and show the output in a comment here. When you are in the upload screen in Hue, note that under Extras section there are additional parameters that you might need to adjust to fit your file formatting.
... View more
12-15-2020
10:05 AM
1 Kudo
If you just execute SET hive.auto.convert.join=true; in your Hive session that will apply for the duration of your session. Keep in mind though that this setting is set to true by default since Hive 0.11.0. Regards, Alex
... View more
12-15-2020
09:56 AM
I was able to reproduce this error and it looks like the problem is the identical column name in your tableA and tableB. Namely, DateColumn is referenced in the subquery. Hive interprets this as a reference to the parent query which is not allowed (per limitation listed here). Essentially it's confused what you mean by this query due to overloaded column name. To solve this, you can explicitly specify table names when referring to columns: UPDATE tableA
SET tableA.ColA = "Value"
WHERE year(tableA.DateColumn) >= (
select (max(year(tableB.DateColumn))-1)
from tableB
) Let me know if this works. Regards, Alex
... View more
12-15-2020
09:16 AM
1 Kudo
Hi Bhushan, The best way to approach this is to reach out to your account team as they will have a better idea of your environment and nuances. At a high level, an in-place upgrade from HDP/HDF 3 to CDP will be available early 2021. Regards, Alex
... View more
12-08-2020
01:53 PM
The reason why doing these operations as cloudbreak user fail is because this is a service user for accessing the cluster's machines only and performing admin tasks on them. this user does not have access to the data (no kerberos principal and no IDBroker mapping). Instead, you can SSH to your cluster's EC2 machines with your username and workload password. That way you will have a kerberos principal working. Another thing to check is to make sure your user has IDBroker mapping to access S3 resources and potentially to access DynamoDB resources as well, since S3Guard relies on Dynamo. Hope this helps, Alex
... View more
12-08-2020
01:28 PM
1 Kudo
This could be a good start: https://community.cloudera.com/t5/Support-Questions/Using-NiFi-to-load-data-from-localFS-to-HDFS/td-p/212124
... View more
12-08-2020
01:23 PM
1 Kudo
I haven't been able to try this with distcp, but a similar thing happens with hdfs dfs commands. What I found is if you have your target folder created (e.g. hdfs dfs -mkdir /e/f/), then copying into that folder will give you all of your CSVs as separate files. If you don't have /e/f/ created ahead of time, then Hadoop will create it for you and rename your source csv to be called "f". Hope that makes sense and helps.
... View more
11-20-2020
10:57 AM
There is a way to provide frequency inside coordinator.xml that allows you to specify day-of-week. See here for details: https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_oozie_cron.html
... View more
11-20-2020
09:47 AM
You also want to check the structure of your raw data. Specifically, look for any instances where there are extra delimiters (e.g. a string field that includes commas as part of the string).
... View more