Member since
09-20-2023
22
Posts
6
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
276 | 10-16-2024 12:10 PM | |
852 | 10-30-2023 07:48 AM |
12-04-2024
04:52 AM
1 Kudo
Good to know it's not an Azure specific issue then. Thanks @DanielR
... View more
12-03-2024
08:52 AM
I'm running several ETL flows in CDF/Nifi. The flows all involve writing some data to Iceberg tables. These flows are mostly running on a schedule that ranges from hourly to weekly (i.e., infrequent writes). Each data flow deployment has a few independent DAGs that all end with the PutIceberg processor, such that there are multiple separate ETL processes running at different intervals all in the same Nifi deployment. The problem I see occurs pretty rarely (~once a month) and only for some deployments. Occasionally, the PutIceberg processor will error with 'Failed to acquire a SAS token' (see full error log below). This will continue happening every time the processor runs unless I restart the flow and then it will run fine again. It seems to happen more often when the processor runs at a daily interval. My flows are using Nifi runtime 1.27.0.2.3.14.0-14 and I'm on CDP Public Cloud on Azure. ``` org.apache.hadoop.fs.azurebfs.contracts.exceptions.SASTokenProviderException: Failed to acquire a SAS token for create-file on [my-data-warehouse-bucket-and-table]/metadata/bb545710-14ea-4b07-b0f5-668978be4e8d-m1.avro due to org.apache.hadoop.security.AccessControlException: org.apache.ranger.raz.intg.RangerRazException: <!doctype html><html lang="en"><head><title>HTTP Status 401 ??? Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 ??? Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> org.apache.hadoop.security.authentication.util.SignerException: Invalid signature</p><p><b>Description</b> The request has not been applied to the target resource because it lacks valid authentication credentials for that resource.</p><hr class="line" /><h3>Apache Tomcat/8.5.96</h3></body></html>; HttpStatus: 401 at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1233) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1199) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.createPath(AbfsClient.java:396) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.conditionalCreateOverwriteFile(AzureBlobFileSystemStore.java:625) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:568) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:335) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1177) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1157) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1046) at org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85) ... 19 common frames omitted ```
... View more
10-17-2024
04:56 AM
1 Kudo
Correction: 'Check the checkbox to Allow users to Run ML Runtimes'
... View more
10-16-2024
12:10 PM
Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons. Then, started a new session with Spark enabled
... View more
10-16-2024
07:20 AM
I have a CML project using a JupyterLab Runtime with Python 3.10 and I want to start a Spark cluster with my CDP Datalake. I'm using the predefined Spark Data Lake Connection in CML which looks like this: ``` import cml.data_v1 as cmldata # Sample in-code customization of spark configurations #from pyspark import SparkContext #SparkContext.setSystemProperty('spark.executor.cores', '1') #SparkContext.setSystemProperty('spark.executor.memory', '2g') CONNECTION_NAME = "hiaa-dl" conn = cmldata.get_connection(CONNECTION_NAME) spark = conn.get_spark_session() # Sample usage to run query through spark EXAMPLE_SQL_QUERY = "show databases" spark.sql(EXAMPLE_SQL_QUERY).show() ``` When I execute this I get the error: IllegalArgumentException: The value of property spark.app.name must not be null I'm using the predefined spark-defaults.conf which looks like this: ``` spark.executor.memory=1g spark.executor.cores=1 spark.yarn.access.hadoopFileSystems=abfs://[container]@[storage-account].dfs.core.windows.net ``` Is there something else I need to configure in the CML session or at the data lake level?
... View more
Labels:
10-01-2024
04:46 AM
1 Kudo
Thanks, but the reason I'm trying to establish an ODBC connection is because I'm using R.
... View more
09-26-2024
08:26 AM
I want to establish an ODBC connection to my Impala data warehouse in a CML project. I'm running a CML session and have configured my odbc.ini and odbcinst.ini files, but I need to install the driver. Everything I see here https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-2.html describes installing it via the installation wizard. Since CML runs a linux pod, I tried downloading the .deb file for the ODBC driving and then executing `dpkg clouderaimpalaodbc_2.7.2.1011-2_amd64.deb` but it has to be as root. If I run `su` it asks for a password. I'm not sure how this password gets generated by the system. I tried my workload password and portal password but neither worked. Is there a more principled way of connecting via ODBC in CML or is there a way to run as root in CML?
... View more
Labels:
09-26-2024
08:02 AM
I finally had success using HTTP transport. My .odbc.ini looks something like this: ``` [ODBC] ; Specify any global ODBC configuration here such as ODBC tracing. [ODBC Data Sources] Impala = Cloudera ODBC Driver for Impala [Impala] Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib Description=Cloudera Impala ODBC Driver DSN Host=[datahub-name]-master0.hiaa-cdp.uvmh-kdle.a4.cloudera.site Port=443 Schema=default AuthMech=3 UseSASL=1 SSL=1 TransportMode=http httpPath=[datahub-name]/cdp-proxy-api/impala ```
... View more
07-16-2024
05:46 AM
I have two hive tables and I want to manually create a lineage relationship between them using the Atlas API. I'm trying to run this POST request: ``` curl --location 'https://[url]/atlas/api/atlas/v2/entity' \ --header 'Content-Type: application/json' \ --header 'Authorization: ••••••' \ --data-raw '{ "entity": { "typeName": "hive_process", "qualifiedName": "my_etl_process@cluster", "name": "my_etl_process", "description": "ETL process from input_table to output_table", "attributes": { "inputs": [ { "guid": "9768381b-8783-49c3-850d-39bf1f14b73f" } ], "outputs": [ { "guid": "998de7ba-2254-418d-b405-656eba428643" } ] } } } ' ``` I'm getting this 404 response: ``` { "errorCode": "ATLAS-404-00-007", "errorMessage": "Invalid instance creation/updation parameters passed : hive_process.qualifiedName: mandatory attribute value missing in type Referenceable" } ``` Any suggestions for how to manually create lineage using the API would be appreciated.
... View more
Labels:
04-15-2024
06:17 AM
When I run this command via the cdp cli, it just returns a JSON like this: ``` { "archiveName": "[MY_FLOW_NAME].tar.gz" } ``` I want to get the actual flow definition.
... View more