Member since
01-15-2019
60
Posts
37
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2975 | 07-20-2021 01:05 AM | |
16556 | 11-28-2019 06:59 AM |
09-09-2025
11:36 AM
Hi @DianaTorres, Thanks for your reply. I resolved this issue by modifying the networking. Best regards, Shubham Rai.
... View more
08-19-2025
01:05 AM
Several keys needed to be added: This is an example of the properties we used in KConnect in DH ---------------------------- 1- producer.override.sasl.jaas.config org.apache.kafka.common.security.plain.PlainLoginModule required username="<your-workload-name>" password="<password>"; 2- producer.override.security.protocol SASL_SSL 3- producer.override.sasl.mechanism PLAIN ----------------------------
... View more
06-04-2025
06:27 AM
Excellent article @zzeng 👍 !!
... View more
03-18-2025
01:14 AM
Hi, @APentyala Thanks for pointing out this. Impala drivers also works well on this. Both Impala and Hive drivers can work on this. I will replace the images so that it matches the descriptions 👍🏻
... View more
09-10-2024
05:34 PM
1 Kudo
In CDP Public Cloud CDW Impala, you can only use HTTP+SSL to access, So you have to Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver]
AllowHostNameCNMismatch = 0
CheckCertRevocation = 0
TransportMode = http
AuthMech=3 https://community.cloudera.com/t5/Community-Articles/How-to-Connect-to-CDW-Impala-VW-Using-the-Power-BI-Desktop/ta-p/393013#toc-hId-1805728480
... View more
09-08-2024
10:36 PM
With the Hive (newer than Hive 2.2), you can use Merge INTO MERGE INTO target_table AS target
USING source_table AS source
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET
target.name = source.name,
target.age = source.age
WHEN NOT MATCHED THEN
INSERT (id, name, age)
VALUES (source.id, source.name, source.age);
... View more
09-03-2024
06:27 PM
Summary Last week I posted an article [How to Connect to Impala Using the Power BI Desktop + Cloudera ODBC Impala Driver with Kerberos Authentication], So far (2024 Sep), CDP Public Cloud CDW supports Basic Authentication (HTTP), today I will share how to Connect to CDW Impala VW Using the Power BI Desktop + Cloudera ODBC Impala Driver with Basic Authentication. Pre-requisites Power BI Desktop Edition https://www.microsoft.com/en-us/power-platform/products/power-bi/desktop Impala in CDP Public Cloud CDW Impala ODBC Connector 2.7.0 for Cloudera Enterprise https://www.cloudera.com/downloads/connectors/impala/odbc/2-7-0.html How-To-do in Power BI Desktop Step1 : Install the [Impala ODBC Connector 2.7.0 for Cloudera Enterprise] Step 2: Copy the ODBC folder to Power BI Desktop folder Assume your Power BI Desktop is in [ C:\Program Files\Microsoft Power BI Desktop\ ], then please copy the ODBC Driver [ C:\Program Files\Cloudera ODBC Driver for Impala ] to [C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala] Step 3: Edit the config file to specify ODBC Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Impala ODBC Driver.ini [Simba Impala ODBC Driver]
# Oringinally PowerBI will use its embedded driver, we can change it to Cloudera version
# Driver=Simba Impala ODBC Driver\ImpalaODBC_sb64.dll
Driver=Cloudera ODBC Driver for Impala\lib\ClouderaImpalaODBC64.dll Step 4: Edit the config file to specify ODBC Driver parameter C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc.ini [Driver]
AllowHostNameCNMismatch = 0
CheckCertRevocation = 0
TransportMode = http
AuthMech=3 * Cloudera CDW Impala VW doesn't need [httpPath] parameter, while Cloudera Datahub Impala cluster need [httpPath=cliservice]. Please be careful. Then save these two files, and restart your Power BI Desktop. How-To-do in Power BI Service (On-premise Data Gateway) Step 1: Edit the config file to specify Driver C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Simba Impala ODBC Driver.ini [Simba Impala ODBC Driver]
# Driver=Simba Impala ODBC Driver\ImpalaODBC_sb64.dll
Driver=Cloudera ODBC Driver for Impala\lib\ClouderaImpalaODBC64.dll Step 2: Edit the Simba .ini file to specify Driver parameter C:\Program Files\Microsoft Power BI Desktop\bin\ODBC Drivers\Cloudera ODBC Driver for Impala\lib\cloudera.impalaodbc [Driver]
AllowHostNameCNMismatch = 0
CheckCertRevocation = 0
TransportMode = http
AuthMech=3 Reference: https://community.fabric.microsoft.com/t5/Desktop/Power-BI-Impala-connector-SSL-certificate-error/m-p/2344481#M845491
... View more
Labels:
05-19-2024
12:43 AM
2 Kudos
Purpose:
Run SELECT to ingest data from Oracle 19c, and save the data into Azure ADLS Gen2 object storage, in Parquet format.
Steps
Step 1 Prepare the environment
Make sure the Oracle 19c environment works well.
Prepare an Oracle table:
CREATE TABLE demo_sample (
column1 NUMBER,
column2 NUMBER,
column3 NUMBER,
column4 VARCHAR2(10),
column5 VARCHAR2(10),
column6 VARCHAR2(10),
column7 VARCHAR2(10),
column8 VARCHAR2(10),
column9 VARCHAR2(10),
column10 VARCHAR2(10),
column11 VARCHAR2(10),
column12 VARCHAR2(10),
CONSTRAINT pk_demo_sample PRIMARY KEY (column1, column2, column3, column4, column5, column6, column7, column8, column9)
);
Prepare 20000 records data:
import cx_Oracle
import random
# Oracleデータベース接続情報
dsn = cx_Oracle.makedsn("<your Oracle database>", 1521, service_name="PDB1")
connection = cx_Oracle.connect(user="<your user name>", password="<your password>", dsn=dsn)
# データ挿入関数
def insert_data():
cursor = connection.cursor()
sql = """
INSERT INTO demo_sample (
column1, column2, column3, column4, column5, column6,
column7, column8, column9, column10, column11, column12
) VALUES (
:1, :2, :3, :4, :5, :6, :7, :8, :9, :10, :11, :12
)
"""
batch_size = 10000
data = []
for i in range(20000): # 2万件
record = (
random.randint(1, 1000),
random.randint(1, 1000),
random.randint(1, 1000),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10)),
''.join(random.choices('ABCDEFGHIJKLMNOPQRSTUVWXYZ', k=10))
)
data.append(record)
if len(data) == batch_size:
cursor.executemany(sql, data)
connection.commit()
data = []
if data:
cursor.executemany(sql, data)
connection.commit()
cursor.close()
# メイン処理
try:
insert_data()
finally:
connection.close()
Step 2 Processor: ExecuteSQLRecord
This ExecuteSQLRecord uses two Service,
Database Connection Pooling Service: DBCPConnectionPool Processor, named EC2-DBCPConnectionPool.
ParquetRecordSetWriter, named ParquetRecordSetWriter.
Step 3: Create DBCPConnectionPool
Download the Oracle JDBC Driver from here https://www.oracle.com/jp/database/technologies/appdev/jdbc-downloads.html
Save the jdbc driver here (or anywhere your nifi can access):
/Users/zzeng/Downloads/tools/Oracle_JDBC/ojdbc8-full/ojdbc8.jar
DBCPConnectionPool Properties:
Database Connection URL: The JDBC Driver URI. eg. jdbc:oracle:thin:@//ec2-54-222-333-444.compute-1.amazonaws.com:1521/PDB1
Database Driver Class Name: oracle.jdbc.driver.OracleDriver
Database Driver Location(s) : /Users/zzeng/Downloads/tools/Oracle_JDBC/ojdbc8-full/ojdbc8.jar
Database User: my Oracle access user name, eg zzeng
Password: Password, will be automatically encrypted by NiFi
Step 4: Create ParquetRecordSetWriter service
We can use default settings here.
Step 5: UpdateAttribute to set the file name in Azure
Add a value :
Key: azure.filename Value : ${uuid:append('.ext')}
Step 6: Use PutAzureDataLakeStorage to save data into Azure
Step 7: Create ADLSCredentialsControllerService service for PutAzureDataLakeStorage so that we can save data into Azure
Storage Account Name: the value in your Azure account
SAS Token: The value in your Azure account
Step 8: Enable the 3 services
Step 9: Have a try
Choose `Run Once`
And you will find the files are there
... View more
Labels:
04-05-2024
02:21 AM
Share my memo on setting up the .cde/config.yaml: user:(my user in CDP, not the email address)
vcluster-endpoint: (find it in the Adminitration -> Virtual Cluster details -> JOBS API URL)
... View more
03-25-2024
02:43 AM
1 Kudo
Purpose Detect updates to S3 files and insert the updated files into Aurora PostgreSQL with NiFi Data flow The finished dataflow Process 1.) Download the dataflow file (JSON file Import_S3_To_Aurora_PostgreSQL.json) 2.) Create a new processor group. When creating this processor group, choose the following JSON file to upload. Step 1: Choose processor group Step 2 : Step 3 : Finish upload. 3.) Install JDBC Driver wget https://jdbc.postgresql.org/download/postgresql-42.7.3.jar
mkdir /tmp/nifi
mv postgresql-42.7.3.jar /tmp/nifi/ 4.) Set parameters in NiFi Set ListS3 parameters S3 Access Key set The values input was protected as access key/values are sensitive values. Only "Sensitive value set" be shown. 4.2) Start the controllver service AWSCredentialsProviderControllerService (for saving AWS sensitive values) 4.3) Start the CSVReader Controller service 4.4) Start the JDBC Connection pool(DBCPConnectionPool-postgreSQL)service 5. ) Set bucket value 5.1) Fix the Bucket,Prefix values 5.2) Fix the PostgreSQL table name for INSERT PutDatabaseRecord setting: 6.) Start the processors 7.) Check the provenance:
... View more
Labels: