About ipson

ipson · ‎12-10-2024

Do you know how to keep it from expiring or renew the token from within Nifi?

ipson · ‎12-04-2024

Good to know it's not an Azure specific issue then. Thanks @DanielR

ipson · ‎12-03-2024

I'm running several ETL flows in CDF/Nifi. The flows all involve writing some data to Iceberg tables. These flows are mostly running on a schedule that ranges from hourly to weekly (i.e., infrequent writes). Each data flow deployment has a few independent DAGs that all end with the PutIceberg processor, such that there are multiple separate ETL processes running at different intervals all in the same Nifi deployment. The problem I see occurs pretty rarely (~once a month) and only for some deployments. Occasionally, the PutIceberg processor will error with 'Failed to acquire a SAS token' (see full error log below). This will continue happening every time the processor runs unless I restart the flow and then it will run fine again. It seems to happen more often when the processor runs at a daily interval. My flows are using Nifi runtime 1.27.0.2.3.14.0-14 and I'm on CDP Public Cloud on Azure. ``` org.apache.hadoop.fs.azurebfs.contracts.exceptions.SASTokenProviderException: Failed to acquire a SAS token for create-file on [my-data-warehouse-bucket-and-table]/metadata/bb545710-14ea-4b07-b0f5-668978be4e8d-m1.avro due to org.apache.hadoop.security.AccessControlException: org.apache.ranger.raz.intg.RangerRazException: <!doctype html><html lang="en"><head><title>HTTP Status 401 ??? Unauthorized</title><style type="text/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 401 ??? Unauthorized</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Message</b> org.apache.hadoop.security.authentication.util.SignerException: Invalid signature</p><p><b>Description</b> The request has not been applied to the target resource because it lacks valid authentication credentials for that resource.</p><hr class="line" /><h3>Apache Tomcat/8.5.96</h3></body></html>; HttpStatus: 401 at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1233) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.appendSASTokenToQuery(AbfsClient.java:1199) at org.apache.hadoop.fs.azurebfs.services.AbfsClient.createPath(AbfsClient.java:396) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.conditionalCreateOverwriteFile(AzureBlobFileSystemStore.java:625) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:568) at org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:335) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1177) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1157) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1046) at org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85) ... 19 common frames omitted ```

ipson · ‎10-17-2024

Correction: 'Check the checkbox to Allow users to Run ML Runtimes'

ipson · ‎10-16-2024

Resolved. I had ML Runtimes Addons disabled. Went into CML > Site Administrations > Settings and Under Feature Flags, unchecked the checkbox next to Allow users to Run ML Runtimes Addons. Then, started a new session with Spark enabled

ipson · ‎10-16-2024

I have a CML project using a JupyterLab Runtime with Python 3.10 and I want to start a Spark cluster with my CDP Datalake. I'm using the predefined Spark Data Lake Connection in CML which looks like this: ``` import cml.data_v1 as cmldata # Sample in-code customization of spark configurations #from pyspark import SparkContext #SparkContext.setSystemProperty('spark.executor.cores', '1') #SparkContext.setSystemProperty('spark.executor.memory', '2g') CONNECTION_NAME = "hiaa-dl" conn = cmldata.get_connection(CONNECTION_NAME) spark = conn.get_spark_session() # Sample usage to run query through spark EXAMPLE_SQL_QUERY = "show databases" spark.sql(EXAMPLE_SQL_QUERY).show() ``` When I execute this I get the error: IllegalArgumentException: The value of property spark.app.name must not be null I'm using the predefined spark-defaults.conf which looks like this: ``` spark.executor.memory=1g spark.executor.cores=1 spark.yarn.access.hadoopFileSystems=abfs://[container]@[storage-account].dfs.core.windows.net ``` Is there something else I need to configure in the CML session or at the data lake level?

ipson · ‎10-01-2024

Thanks, but the reason I'm trying to establish an ODBC connection is because I'm using R.

ipson · ‎09-26-2024

I want to establish an ODBC connection to my Impala data warehouse in a CML project. I'm running a CML session and have configured my odbc.ini and odbcinst.ini files, but I need to install the driver. Everything I see here https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-2.html describes installing it via the installation wizard. Since CML runs a linux pod, I tried downloading the .deb file for the ODBC driving and then executing `dpkg clouderaimpalaodbc_2.7.2.1011-2_amd64.deb` but it has to be as root. If I run `su` it asks for a password. I'm not sure how this password gets generated by the system. I tried my workload password and portal password but neither worked. Is there a more principled way of connecting via ODBC in CML or is there a way to run as root in CML?

ipson · ‎09-26-2024

I finally had success using HTTP transport. My .odbc.ini looks something like this: ``` [ODBC] ; Specify any global ODBC configuration here such as ODBC tracing. [ODBC Data Sources] Impala = Cloudera ODBC Driver for Impala [Impala] Driver=/opt/cloudera/impalaodbc/lib/universal/libclouderaimpalaodbc.dylib Description=Cloudera Impala ODBC Driver DSN Host=[datahub-name]-master0.hiaa-cdp.uvmh-kdle.a4.cloudera.site Port=443 Schema=default AuthMech=3 UseSASL=1 SSL=1 TransportMode=http httpPath=[datahub-name]/cdp-proxy-api/impala ```

ipson · ‎07-16-2024

I have two hive tables and I want to manually create a lineage relationship between them using the Atlas API. I'm trying to run this POST request: ``` curl --location 'https://[url]/atlas/api/atlas/v2/entity' \ --header 'Content-Type: application/json' \ --header 'Authorization: ••••••' \ --data-raw '{ "entity": { "typeName": "hive_process", "qualifiedName": "my_etl_process@cluster", "name": "my_etl_process", "description": "ETL process from input_table to output_table", "attributes": { "inputs": [ { "guid": "9768381b-8783-49c3-850d-39bf1f14b73f" } ], "outputs": [ { "guid": "998de7ba-2254-418d-b405-656eba428643" } ] } } } ' ``` I'm getting this 404 response: ``` { "errorCode": "ATLAS-404-00-007", "errorMessage": "Invalid instance creation/updation parameters passed : hive_process.qualifiedName: mandatory attribute value missing in type Referenceable" } ``` Any suggestions for how to manually create lineage using the API would be appreciated.

Online	Offline
Last Visited	‎04-16-2025 07:18 AM

Member Since	‎09-20-2023 04:44 AM
Last Visited	‎04-16-2025 07:18 AM
Posts	23
Kudos received	7

Cloudera Community

Re: Issues running Spark in CML

Re: CDF/Nifi error with Date field and PutIceberg ...

Re: Nifi PutIceberg Processor failed to acquire SA...

Re: Nifi PutIceberg Processor failed to acquire SA...

Nifi PutIceberg Processor failed to acquire SAS to...

Re: Issues running Spark in CML

Re: Issues running Spark in CML

Issues running Spark in CML

Re: Installing Impala ODBC Driver on a CML Project...

Installing Impala ODBC Driver on a CML Project - R...

Re: Failure to connect to Impala via unixODBC on M...

Manually creating data lineage in Apache Atlas on ...