About kueyama

kueyama · ‎12-09-2020

Those two things together did the trick. Added a proper IAM role to the IDbroker mapping and logged in with my workload user. Thanks for the helpful insight!

kueyama · ‎12-08-2020

I'm trying to learn more about s3guard and was attempting to follow along with some of the CLI examples in the CDP documentation. Any command I try results in a warning: WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties followed by an error: java.lang.IllegalStateException: Authentication with IDBroker failed. Please ensure you have a Kerberos token by using kinit. When I try running kinit I see: Client 'cloudbreak@[FQDN]' not found in Kerberos database while getting initial credentials Where "FQDN" corresponds to the VM I am ssh into. I've tried a couple machines in both my Data Hub and Data Lake clusters. Does anyone have insight on how to properly interact with my environment's s3guard setup?

kueyama · ‎12-08-2020

Yes, that was exactly it. Since our data is created by a separate pipeline that predates our CDP usage, the s3guard table was out of sync with what was actually in the bucket. Disabling let me get things up and running again while I learn more about s3guard. I'd missed that re:invent announcement, so thanks for the help on multiple fronts!

kueyama · ‎12-07-2020

I have an external table pointing to partitioned parquet data in an AWS S3 bucket. I realized our write-out process was creating too many files within a partition, so I tweaked our code and overwrote the parquet data in that S3 location to be more compact. I then dropped the table and re-ran the `CREATE EXTERNAL TABLE` and `ALTER TABLE ... RECOVER PARTITIONS` statements. The issue I'm running into now is that the table seems to be pointing to both the old and new parquet data. If I run a `SHOW FILES IN` command I see both old and new files listed for the table. This leads to errors when I try to access data in the table, as it's seem to be trying to read from a file that no longer exists in S3. Is there a cache or something similar that needs to be cleared in these types of situations?

Online	Offline
Last Visited	‎01-28-2021 05:17 PM

Member Since	‎09-15-2020 07:22 AM
Last Visited	‎01-28-2021 05:17 PM
Posts	4
Kudos received	1

Cloudera Community

Re: accessing s3guard with hadoop cli

accessing s3guard with hadoop cli

Re: impala - `recover partitions` points to old da...

impala - `recover partitions` points to old data