Member since
09-15-2020
4
Posts
1
Kudos Received
0
Solutions
12-09-2020
08:13 AM
1 Kudo
Those two things together did the trick. Added a proper IAM role to the IDbroker mapping and logged in with my workload user. Thanks for the helpful insight!
... View more
12-08-2020
10:35 AM
I'm trying to learn more about s3guard and was attempting to follow along with some of the CLI examples in the CDP documentation. Any command I try results in a warning: WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties followed by an error: java.lang.IllegalStateException: Authentication with IDBroker failed. Please ensure you have a Kerberos token by using kinit. When I try running kinit I see: Client 'cloudbreak@[FQDN]' not found in Kerberos database while getting initial credentials Where "FQDN" corresponds to the VM I am ssh into. I've tried a couple machines in both my Data Hub and Data Lake clusters. Does anyone have insight on how to properly interact with my environment's s3guard setup?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Kerberos
12-08-2020
09:12 AM
Yes, that was exactly it. Since our data is created by a separate pipeline that predates our CDP usage, the s3guard table was out of sync with what was actually in the bucket. Disabling let me get things up and running again while I learn more about s3guard. I'd missed that re:invent announcement, so thanks for the help on multiple fronts!
... View more
12-07-2020
11:36 AM
I have an external table pointing to partitioned parquet data in an AWS S3 bucket. I realized our write-out process was creating too many files within a partition, so I tweaked our code and overwrote the parquet data in that S3 location to be more compact. I then dropped the table and re-ran the `CREATE EXTERNAL TABLE` and `ALTER TABLE ... RECOVER PARTITIONS` statements. The issue I'm running into now is that the table seems to be pointing to both the old and new parquet data. If I run a `SHOW FILES IN` command I see both old and new files listed for the table. This leads to errors when I try to access data in the table, as it's seem to be trying to read from a file that no longer exists in S3. Is there a cache or something similar that needs to be cleared in these types of situations?
... View more
Labels:
- Labels:
-
Apache Impala