About zzeng

zzeng · ‎09-10-2024

In CDP Public Cloud CDW Impala, you can only use HTTP+SSL to access, So you have to Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver] AllowHostNameCNMismatch = 0 CheckCertRevocation = 0 TransportMode = http AuthMech=3 https://community.cloudera.com/t5/Community-Articles/How-to-Connect-to-CDW-Impala-VW-Using-the-Power-BI-Desktop/ta-p/393013#toc-hId-1805728480

zzeng · ‎09-08-2024

With the Hive (newer than Hive 2.2), you can use Merge INTO MERGE INTO target_table AS target USING source_table AS source ON target.id = source.id WHEN MATCHED THEN UPDATE SET target.name = source.name, target.age = source.age WHEN NOT MATCHED THEN INSERT (id, name, age) VALUES (source.id, source.name, source.age);

zzeng · ‎09-03-2024

Summary Last week I posted an article [How to Connect to Impala Using the Power BI Desktop + Cloudera ODBC Impala Driver with Kerberos Authentication], So far (2024 Sep), CDP Public Cloud CDW supports Basic Authentication (HTTP), today I will share how to Connect to CDW Impala VW Using the Power BI Desktop + Cloudera ODBC Impala Driver with Basic Authentication. Pre-requisites Power BI Desktop Edition https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop Impala in CDP Public Cloud CDW Impala ODBC Connector 2.7.0 for Cloudera Enterprise https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html How-To-do in Power BI Desktop Step1 : Install the [Impala ODBC Connector 2.7.0 for Cloudera Enterprise] Step 2: Copy the ODBC folder to Power BI Desktop folder Assume your Power BI Desktop is in [ C:\Program Files\Microsoft Power BI Desktop\ ], then please copy the ODBC Driver [ C:\Program Files\Cloudera ODBC Driver for Impala ] to [C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala] Step 3: Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Impala ODBC Driver.ini [Simba Impala ODBC Driver] # Oringinally PowerBI will use its embedded driver, we can change it to Cloudera version # Driver=Simba Impala ODBC Driver\ImpalaODBC_sb64.dll Driver=Cloudera ODBC Driver for Impala\lib\ClouderaImpalaODBC64.dll Step 4: Edit the config file to specify ODBC Driver parameter C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver] AllowHostNameCNMismatch = 0 CheckCertRevocation = 0 TransportMode = http AuthMech=3 * Cloudera CDW Impala VW doesn't need [httpPath] parameter, while Cloudera Datahub Impala cluster need [httpPath=cliservice]. Please be careful. Then save these two files, and restart your Power BI Desktop. How-To-do in Power BI Service (On-premise Data Gateway) Step 1: Edit the config file to specify Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Impala ODBC Driver.ini [Simba Impala ODBC Driver] # Driver=Simba Impala ODBC Driver\ImpalaODBC_sb64.dll Driver=Cloudera ODBC Driver for Impala\lib\ClouderaImpalaODBC64.dll Step 2: Edit the Simba .ini file to specify Driver parameter C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc [Driver] AllowHostNameCNMismatch = 0 CheckCertRevocation = 0 TransportMode = http AuthMech=3 Reference: https://community.fabric.microsoft.com/t5/Desktop/Power-BI-Impala-connector-SSL-certificate-error/m-p/2344481#M845491

zzeng · ‎08-29-2024

Pre-requisites Power BI Desktop Edition Impala in CDP Private Cloud Base Impala ODBC Connector 2.7.0 for Cloudera Enterprise CDP Public Cloud Datahub + Kerberos + Power BI Desktop (in the future) Process Step 1: Install the [Impala ODBC Connector 2.7.0 for Cloudera Enterprise] Step 2: Copy the ODBC folder to Power BI Desktop folder Assume your Power BI Desktop is in [ C:\Program Files\Microsoft Power BI Desktop\ ], then please copy the ODBC Driver [ C:\Program Files\Cloudera ODBC Driver for Impala ] to [C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala] Step 3: Edit the config file [ C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Impala ODBC Driver.ini ] [Simba Impala ODBC Driver] # Oringinally PowerBI will use its embedded driver, we can change it to Cloudera version # Driver=Simba Impala ODBC Driver\ImpalaODBC_sb64.dll Driver=Cloudera ODBC Driver for Impala\lib\ClouderaImpalaODBC64.dll # If you don't use SSL SSL=0 Step 4: Run Power BI Desktop If you want to use Windows AD Kerberos, then please don't install MIT Kerberos, and make sure you are using the Windows AD domain account to log in and run the Power BI Desktop application. * Reference : In the [Cloudera-ODBC-Connector-for-Impala-Install-Guide.pdf], it mentioned that, ============== Configuring Kerberos Authentication for Windows Active Directory The Cloudera ODBC Connector for Apache Impala supports Active Directory Kerberos on Windows. There are two prerequisites for using Active Directory Kerberos on Windows: MIT Kerberos is not installed on the client Windows machine. The MIT Kerberos Hadoop realm has been configured to trust the Active Directory realm, according to Apache's documentation, so that users in the Active Directory realm can access services in the MIT Kerberos Hadoop realm. ============== Step 5: Test Connection using ODBC Open the [ODBC Data Source (64-bit)] application, and add a new DSN for testing. I didn't use SSL, so I left the SSL un-checked. For debug, you can set the log level to DEBUG. As we will be using Kerberos to connect to Impala, please ensure that the AD server and DNS server are correctly configured. Step 6: Fetch data in Power BI Desktop In Power BI Desktop, use the [Get data -> more], Input [Impala] to search for the connector, Then, input the server info: After that, you can use Windows Authentication (as Kerberos), input the DOMAIN\account password Then you can see the window below: At last, you can see the data after loading:

zzeng · ‎05-19-2024

Purpose: Run SELECT to ingest data from Oracle 19c, and save the data into Azure ADLS Gen2 object storage, in Parquet format. Steps Step 1 Prepare the environment Make sure the Oracle 19c environment works well. Prepare an Oracle table: CREATE TABLE demo_sample ( column1 NUMBER, column2 NUMBER, column3 NUMBER, column4 VARCHAR2(10), column5 VARCHAR2(10), column6 VARCHAR2(10), column7 VARCHAR2(10), column8 VARCHAR2(10), column9 VARCHAR2(10), column10 VARCHAR2(10), column11 VARCHAR2(10), column12 VARCHAR2(10), CONSTRAINT pk_demo_sample PRIMARY KEY (column1, column2, column3, column4, column5, column6, column7, column8, column9) ); Prepare 20000 records data: import cx_Oracle import random # Oracleデータベース接続情報 dsn = cx_Oracle.makedsn("<your Oracle database>", 1521, service_name="PDB1") connection = cx_Oracle.connect(user="<your user name>", password="<your password>", dsn=dsn) # データ挿入関数 def insert_data(): cursor = connection.cursor() sql = """ INSERT INTO demo_sample ( column1, column2, column3, column4, column5, column6, column7, column8, column9, column10, column11, column12 ) VALUES ( :1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12 ) """ batch_size = 10000 data = [] for i in range(20000): # 2万件 record = ( random.randint(1, 1000), random.randint(1, 1000), random.randint(1, 1000), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)), ''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)) ) data.append(record) if len(data) == batch_size: cursor.executemany(sql, data) connection.commit() data = [] if data: cursor.executemany(sql, data) connection.commit() cursor.close() # メイン処理 try: insert_data() finally: connection.close() Step 2 Processor: ExecuteSQLRecord This ExecuteSQLRecord uses two Service, Database Connection Pooling Service: DBCPConnectionPool Processor, named EC2-DBCPConnectionPool. ParquetRecordSetWriter, named ParquetRecordSetWriter. Step 3: Create DBCPConnectionPool Download the Oracle JDBC Driver from here https://www.oracle.com/jp/database/technologies/appdev/jdbc-downloads.html Save the jdbc driver here (or anywhere your nifi can access): /Users/zzeng/Downloads/tools/Oracle_JDBC/ojdbc8-full/ojdbc8.jar DBCPConnectionPool Properties: Database Connection URL: The JDBC Driver URI. eg. jdbc:oracle:thin:@//ec2-54-222-333-444.compute-1.amazonaws.com:1521/PDB1 Database Driver Class Name: oracle.jdbc.driver.OracleDriver Database Driver Location(s) : /Users/zzeng/Downloads/tools/Oracle_JDBC/ojdbc8-full/ojdbc8.jar Database User: my Oracle access user name, eg zzeng Password: Password, will be automatically encrypted by NiFi Step 4: Create ParquetRecordSetWriter service We can use default settings here. Step 5: UpdateAttribute to set the file name in Azure Add a value : Key: azure.filename Value : ${uuid:append('.ext')} Step 6: Use PutAzureDataLakeStorage to save data into Azure Step 7: Create ADLSCredentialsControllerService service for PutAzureDataLakeStorage so that we can save data into Azure Storage Account Name: the value in your Azure account SAS Token: The value in your Azure account Step 8: Enable the 3 services Step 9: Have a try Choose `Run Once` And you will find the files are there

zzeng · ‎04-05-2024

Share my memo on setting up the .cde/config.yaml: user:(my user in CDP, not the email address) vcluster-endpoint: (find it in the Adminitration -> Virtual Cluster details -> JOBS API URL)

zzeng · ‎03-25-2024

Purpose Detect updates to S3 files and insert the updated files into Aurora PostgreSQL with NiFi Data flow The finished dataflow Process 1.) Download the dataflow file (JSON file Import_S3_To_Aurora_PostgreSQL.json) 2.) Create a new processor group. When creating this processor group, choose the following JSON file to upload. Step 1: Choose processor group Step 2 : Step 3 : Finish upload. 3.) Install JDBC Driver wget https://jdbc.postgresql.org/download/postgresql-42.7.3.jar mkdir /tmp/nifi mv postgresql-42.7.3.jar /tmp/nifi/ 4.) Set parameters in NiFi Set ListS3 parameters S3 Access Key set The values input was protected as access key/values are sensitive values. Only "Sensitive value set" be shown. 4.2) Start the controllver service AWSCredentialsProviderControllerService (for saving AWS sensitive values) 4.3) Start the CSVReader Controller service 4.4) Start the JDBC Connection pool（DBCPConnectionPool-postgreSQL）service 5. ) Set bucket value 5.1) Fix the Bucket,Prefix values 5.2) Fix the PostgreSQL table name for INSERT PutDatabaseRecord setting: 6.) Start the processors 7.) Check the provenance:

zzeng · ‎03-14-2024

Purpose: Access CDW Iceberg table with Snowflake Architecture Image source: Configure an external volume for Iceberg tables Initiate Configure an External Volume in Snowflake Step 1: Configure access permissions for the S3 bucket Create a policy in AWS IAM. I created a policy called zzeng-Snowflake-ext: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:GetObjectVersion", "s3:DeleteObject", "s3:DeleteObjectVersion" ], "Resource": "arn:aws:s3:::<my-bucket>/data/zzeng/*" }, { "Effect": "Allow", "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::<my-bucket>", "Condition": { "StringLike": { "s3:prefix": [ "data/zzeng/*" ] } } } ] } Set it in the AWS console Step 2: Create an IAM role in AWS zzeng-Snowflake-ext-role: Step 3: Grant privileges required for SSE-KMS encryption to the IAM role (optional) (Ref: Grant privileges required for SSE-KMS encryption to the IAM role (optional)) Skip. Step 4: Create an external volume in Snowflake SQL in Snowflake: CREATE OR REPLACE EXTERNAL VOLUME extIcebergVolC STORAGE_LOCATIONS = ( ( NAME = 'zzeng-iceberg-se-s3-ap-northeast-1' STORAGE_PROVIDER = 'S3' STORAGE_BASE_URL = 's3://<my-bucket>/data/zzeng/' STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<my-AWS-id*****>:role/zzeng-Snowflake-ext-role' ) ); Step 5: Retrieve the AWS IAM user for your Snowflake account DESC EXTERNAL VOLUME extIcebergVolC; Result: zzeng#COMPUTE_WH@ZZENG.PUBLIC>DESC EXTERNAL VOLUME extIcebergVolC; +-------------------+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+ | parent_property | property | property_type | property_value | property_default | |-------------------+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------| | | ALLOW_WRITES | Boolean | true | true | | STORAGE_LOCATIONS | STORAGE_LOCATION_1 | String | {"NAME":"zzeng-iceberg-se-s3-ap-northeast-1","STORAGE_PROVIDER":"S3","STORAGE_BASE_URL":"s3://<my-bucket-id>/data/zzeng/","STORAGE_ALLOWED_LOCATIONS":["s3://<my-bucket-id>/data/zzeng/*"],"STORAGE_REGION":"us-east-2","PRIVILEGES_VERIFIED":true,"STORAGE_AWS_ROLE_ARN":"<STORAGE_AWS_ROLE_ARN>","STORAGE_AWS_IAM_USER_ARN":"<STORAGE_AWS_ROLE_ARN>","STORAGE_AWS_EXTERNAL_ID":"<a long string for STORAGE_AWS_EXTERNAL_ID>","ENCRYPTION_TYPE":"NONE","ENCRYPTION_KMS_KEY_ID":""} | | | STORAGE_LOCATIONS | ACTIVE | String | zzeng-iceberg-se-s3-ap-northeast-1 | | +-------------------+--------------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------+ 3 Row(s) produced. Time Elapsed: 0.949s zzeng#COMPUTE_WH@ZZENG.PUBLIC> Take a memo for STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID. Step 6: Grant the IAM user permissions to access bucket objects Modify the trustship in IAM, use the value in Step 5: { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "<snowflake_user_arn>" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "<snowflake_external_id>" } } } ] } Create an Iceberg Table in CDW Create folder: s3a://${my-test-bucket}/data/${user_id}/airlines/airlines use my own value as the ${my-test-bucket} and ${user_id}: CREATE DATABASE ${user_id}_airlines_ice; drop table if exists ${user_id}_airlines_ice.airlines; CREATE EXTERNAL TABLE ${user_id}_airlines_ice.airlines (code string, description string) STORED BY ICEBERG STORED AS PARQUET LOCATION 's3a://${cdp_env_bucket}/data/${user_id}/airlines/airlines' tblproperties("format-version"="2",'external.table.purge'='true'); INSERT INTO ${user_id}_airlines_ice.airlines SELECT * FROM ${user_id}_airlines_csv.airlines_csv; select * from ${user_id}_airlines_ice.airlines; select count(*) from ${user_id}_airlines_ice.airlines; Check my DDL: Create an Iceberg Table in Snowflake, pointing to the exist table in CDW SQL: CREATE OR REPLACE ICEBERG TABLE airlines CATALOG='zzengIcebergCatalogInt' EXTERNAL_VOLUME='extIcebergVolC' BASE_LOCATION='airlines/airlines' METADATA_FILE_PATH='metadata/00001-7e28a998-42f9-4466-8884-32d450af5c85.metadata.json' ; Check the value In Snowflake::snowflake: zzeng#COMPUTE_WH@ZZENG.PUBLIC>select count(*) from AIRLINES; +----------+ | COUNT(*) | |----------| | 1491 | +----------+ 1 Row(s) produced. Time Elapsed: 0.393s zzeng#COMPUTE_WH@ZZENG.PUBLIC>select * from AIRLINES limit 3; +------+--------------------+ | CODE | DESCRIPTION | |------+--------------------| | 02Q | Titan Airways | | 04Q | Tradewind Aviation | | 05Q | "Comlux Aviation | +------+--------------------+ 3 Row(s) produced. Time Elapsed: 4.705s zzeng#COMPUTE_WH@ZZENG.PUBLIC> In CDW Hive HUE: select count(*) from ${user_id}_airlines_ice.airlines; Modify Data INSERT DATA Snowflake can’t insert into an external Iceberg table Manage an Iceberg table Iceberg tables It is mentioned that You can use INSERT and UPDATE statements to modify an Iceberg table that uses Snowflake as the catalog, I got an Error in Snowflake when inserting it into the Iceberg table: zzeng#COMPUTE_WH@ZZENG.PUBLIC>INSERT INTO airlines (code, description) VALUES ('A1', 'Airline 1 Description'), ('A2', 'Airline 2 Description'), ('A3', 'Airline 3 Description'), ('A4', 'Airline 4 Description'), ('A5', 'Airline 5 Description'), ('A6', 'Airline 6 Description'), ('A7', 'Airline 7 Description'), ('A8', 'Airline 8 Description'), ('A9', 'Airline 9 Description'), ('A10', 'Airline 10 Description'), ('A11', 'Airline 11 Description'), ('A12', 'Airline 12 Description'), ('A13', 'Airline 13 Description'), ('A14', 'Airline 14 Description'), ('A15', 'Airline 15 Description'), ('A16', 'Airline 16 Description'), ('A17', 'Airline 17 Description'), ('A18', 'Airline 18 Description'), ('A19', 'Airline 19 Description'), ('A20', 'Airline 20 Description'); 091357 (42601): SQL Compilation error: Iceberg table AIRLINES with an external catalog integration is a read-only table and cannot be modified zzeng#COMPUTE_WH@ZZENG.PUBLIC> Hive Insert INSERT INTO `zzeng_airlines_ice`.`airlines` (code, description) VALUES ('A1', 'Airline 1 Description'), ('A2', 'Airline 2 Description'), ('A3', 'Airline 3 Description'), ('A4', 'Airline 4 Description'), ('A5', 'Airline 5 Description'), ('A6', 'Airline 6 Description'), ('A7', 'Airline 7 Description'), ('A8', 'Airline 8 Description'), ('A9', 'Airline 9 Description'), ('A10', 'Airline 10 Description'), ('A11', 'Airline 11 Description'), ('A12', 'Airline 12 Description'), ('A13', 'Airline 13 Description'), ('A14', 'Airline 14 Description'), ('A15', 'Airline 15 Description'), ('A16', 'Airline 16 Description'), ('A17', 'Airline 17 Description'), ('A18', 'Airline 18 Description'), ('A19', 'Airline 19 Description'), ('A20', 'Airline 20 Description'); Refresh the metadata for an Iceberg table created from files in object storage Issue: Hive changed the Iceberg table, but after that, even when I ran the Snowflake query, the data was not updated: zzeng#COMPUTE_WH@ZZENG.PUBLIC>select count(*) from AIRLINES; +----------+ | COUNT(*) | |----------| | 1491 | +----------+ 1 Row(s) produced. Time Elapsed: 0.678s zzeng#COMPUTE_WH@ZZENG.PUBLIC> In Snowflake, The metadata files do not identify the most recent snapshot of an Iceberg table. Ref URL: Iceberg tables I have to alter my Snowflake Iceberg table definition manually: Then refresh the metadata: ALTER ICEBERG TABLE AIRLINES REFRESH 'metadata/00002-c33ae888-1af3-4f64-830b-eac9e0a95983.metadata.json'; Result: zzeng#COMPUTE_WH@ZZENG.PUBLIC>ALTER ICEBERG TABLE AIRLINES REFRESH 'metadata/00002-c33ae888-1af3-4f64-830b-eac9e0a95983.metadata.json'; +----------------------------------+ | status | |----------------------------------| | Statement executed successfully. | +----------------------------------+ 1 Row(s) produced. Time Elapsed: 10.199s zzeng#COMPUTE_WH@ZZENG.PUBLIC>select count(*) from AIRLINES; +----------+ | COUNT(*) | |----------| | 1511 | +----------+ 1 Row(s) produced. Time Elapsed: 0.204s zzeng#COMPUTE_WH@ZZENG.PUBLIC> Considerations and limitations Iceberg tables The following is an excerpt from Snowflake documents, where considerations and limitations apply to Iceberg tables, and are subject to change: Iceberg Versions 1 and 2 of the Apache Iceberg specification are supported, excluding the following features: Row-level deletes (either position deletes or equality deletes). Using the history.expire.min-snapshots-to-keep table property to specify the default minimum number of snapshots to keep. For more information, see Metadata and snapshots. Iceberg partitioning with the bucket transform function impacts performance for queries that use conditional clauses to filter results. Iceberg tables created from files in object storage aren’t supported if the following conditions are true: The table contains a partition spec that defines an identity transform. The source column of the partition spec does not exist in a Parquet file. For Iceberg tables that are not managed by Snowflake, time travel to any snapshot generated after table creation is supported as long as you periodically refresh the table before the snapshot expires. File formats Support is limited to Apache Parquet files. Parquet files that use the unsigned integer logical type are not supported. External volumes You must access the cloud storage locations in external volumes using direct credentials. Storage integrations are not supported. The trust relationship must be configured separately for each external volume that you create. Metadata files The metadata files do not identify the most recent snapshot of an Iceberg table. You cannot modify the location of the data files or snapshot using the ALTER ICEBERG TABLE command. To modify either of these settings, you must recreate the table (using the CREATE OR REPLACE ICEBERG TABLE syntax).

zzeng · ‎03-05-2024

Goal : Use CLI to create / import a dataflow in CDF-PC Then, run Teraform to create/import Dataflow into CDF-PC. Process : Install CDP CLI (I am using MacOS, so I install cdpcli in this way. pip3 install cdpcli Then confirm it's installed correctly [zzeng@zeng-mbp ~]$ ll ~/Library/Python/3.9/bin/ | grep cdp -rwxr-xr-x@ 1 zzeng staff 250 Mar 4 12:22 cdp -rwxr-xr-x@ 1 zzeng staff 250 Mar 4 12:22 cdp_completer [zzeng@zeng-mbp ~]$ export PATH="$HOME/Library/Python/3.9/bin:$PATH" [zzeng@zeng-mbp ~]$ cdp --version 0.9.107 [zzeng@zeng-mbp ~]$ Configure CLI: (Reference Link) from [Management Console] ->[User] -> [Profile] Then I can find your API key here. If it does not exist, create one. Then, configure the API key in my MacOS (Reference): cdp configure After this config, check the latest status: [zzeng@zeng-mbp ~]$ cdp iam get-user { "user": { "userId": ***** "status": "ACTIVE", "workloadPasswordDetails": { "isPasswordSet": true } } } Use CLI to create dataflow: cdp df import-flow-definition \ --name "zzeng2-fetch_from_S3_folder" \ --file "/<<PATH_TO_UPDATE>>/fetch_from_S3_folder.json" \ --comments "Initial Version" Example: $ cdp df import-flow-definition --name "zzeng-fetch_from_S3_folder" --description "Description for this flow" --file "/Users/zzeng/Library/CloudStorage/OneDrive-Personal/38_CLDR_Docs/50_demo/FetchFromS3Folder/fetch_from_S3_folder.json" --comments "Initial Version" { "crn": "crn:cdp:df:us-west-1:******:flow:zzeng-fetch_from_S3_folder", "name": "zzeng-fetch_from_S3_folder", "versionCount": 1, "createdTimestamp": 1709632435790, "description": "Description for this flow", "modifiedTimestamp": 1709632435790, "versions": [ { "crn": "crn:cdp:df:us-west-1:******:flow:zzeng-fetch_from_S3_folder/v.1", "bucketIdentifier": "https://s3.us-west-2.amazonaws.com/*****.cloudera.com/******", "author": "Zhen Zeng", "version": 1, "timestamp": 1709632435792, "deploymentCount": 0, "comments": "Initial Version", "draftCount": 0, "tags": [] } ] } Deploy the flow to an environment: We can use a wizard to do this or use the wizard to generate the command. Result: $ cdp df create-deployment \ > --service-crn crn:cdp:df:us-west-1:******:service:***** \ > --flow-version-crn "crn:cdp:df:us-west-1:*****:flow:zzeng-fetch_from_S3_folder/v.1" \ > --deployment-name "zzeng-deploy-01" \ > --project-crn "crn:cdp:df:us-west-1:****:project:*****" \ > --cfm-nifi-version 1.24.0.2.3.13.0-9 \ > --auto-start-flow \ > --cluster-size-name EXTRA_SMALL \ > --static-node-count 1 \ > --no-auto-scaling-enabled { "deploymentCrn": "crn:cdp:df:us-west-1:******:deployment:*****/*****" } Reference: Update Deployment mc cli client setup cdf deploy flow cli

zzeng · ‎01-15-2024

Update: In the latest NiFi, now we can directly connect "GenerateTableFetch" and and "ExecuteSQL" with a connection. NiFi is evolving 🙂

Online	Online
Last Visited	‎02-25-2025 06:06 AM

Member Since	‎01-15-2019 07:24 PM
Last Visited	‎02-25-2025 06:06 AM
Posts	55
Kudos received	36

Cloudera Community

Re: Setup a 3-5 node CDP cluster in AWS

Re: Failed to configure urlScheme property for Sol...

Re: Load Iceberg Table on PowerBI Desktop

Re: update data in Hive using join

How to Connect to CDW Impala VW Using the Power BI...

How to Connect to Impala Using the Power BI Deskto...

How to ingest data from Oracle into Azure ADLS Gen...

Re: Working with Iceberg in CDE Spark Sessions

NiFi: How to detect updates to S3 files and insert...

How to connect to CDW Iceberg table in Snowflake

Automation : Using CLI to create / import a datafl...

Re: How to fetch rows from a table in parallel wh...