About ebeb

ebeb · ‎05-05-2023

I have read the descriptions of both these two Nifi processors and I am unable to understand the difference. The description seems to be almost same for both. It will be nice if Nifi docs description provided some note on the difference. Main difference seems to be: QueryDatabaseTable: Query result will be converted to Avro format. QueryDatabaseTableRecord: Query result will be converted to the format specified by the record writer. So there is a separate Record Writer field in the properties.

ebeb · ‎01-31-2023

I am curious how Cloudera managed to get hold of a Centos 8.6 version out of the blue when apparently Centos final release was as per https://en.wikipedia.org/wiki/CentOS Final release 8.5.2111[2] (16 November 2021; 14 months ago)

ebeb · ‎04-06-2022

Found some new info in latest Cloudera CDP guides for Impala. It would be nice if they made Hive and Impala more similar for SQL standard syntax but unfortunately it is not same: https://docs.cloudera.com/runtime/7.2.14/impala-sql-reference/topics/impala-identifiers.html Impala identifiers Provides information about using Identifiers as the names of databases, tables, or columns when creating the objects. The following rules apply to identifiers in Impala: The minimum length of an identifier is 1 character. The maximum length of an identifier is currently 128 characters, enforced by the Metastore database.

ebeb · ‎02-09-2022

Hello Experts, I want to run a simple Hive Insert SQL statement in Nifi periodically for example: insert overwrite Table1 select * from Table2; All SQL values are fixed hardcoded and don't need to be changed dynamically in the flow. As a newbie I was thinking I could write the flow as: ReplaceText->success->PutHiveQL Search Value: (?s)(^.*$) Replacement Value: insert overwrite Table1 select * from Table2; But I have an error in ReplaceText which probably needs an incoming flowfile which I don't have since the SQL is fixed hardcoded: "Upstream Connections is invalid because Processor requires an upstream connection but currently has none." The other option I could try but not sure it will work is: GenerateFlowFile -> success -> PutHiveQL GenerateFlowFile Custom Text: insert overwrite Table1 select * from Table2;

ebeb · ‎01-24-2022

Thanks @MattWho , actually found a way to filter/search process groups by name using the Summary option in the top right menu. This is very useful to find all the ETL pipelines once we give proper names and then by entering a part of the name we can show all matching Process Groups. Being a newbie I am trying to compare Streamsets UI to Nifi UI so I can work the same way. Streamsets provides an initial list of all ETL pipelines to filter by name etc. I guess if just after Nifi login if we saw two links: Summary and Canvas then users can click intuitively on the summary screen and review all their Process groups and click on specific PG they want to work with. This would make it similar to other ETL tools like Streamsets.

ebeb · ‎01-20-2022

Hello, Is it still the best practice to create say 100 process groups for 100 dataflow/etl pipelines each of which has multiple processors in each pipeline. Wont 100 process groups be difficult to see on a single canvas? Or any better way so we can easily see and search the 100 ETL pipelines using some filter like name, userid, date etc. to narrow down the list.

ebeb · ‎01-01-2022

Hello @GangWar , Yes I can do kinit -kt for all the userids including yarn, livy and own userid from the same server.

ebeb · ‎12-29-2021

Hello, I am getting an error after upgrade of CDH5.16 to CDP7.1.7. The logs show it is unable to connect to the KMS endpoint. First start a Spark session in sparkmagic then I run below example pyspark code: Starting Spark application ID YARN Application ID Kind State Spark UI Driver log Current session? 23 application_1639802810085_6070 pyspark idle Link Link ✔ SparkSession available as 'spark'. ###############sample pyspark code################### from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').getOrCreate() # load data from .csv file in HDFS # tips = spark.read.csv("/user/hive/warehouse/tips/", header=True, inferSchema=True) # OR load data from table in Hive metastore tips = spark.table('db1.table1') from pyspark.sql.functions import col, lit, mean # query using DataFrame API #tips \ # query using SQL spark.sql("select name from db1.table1").show(3) spark.stop() An error occurred while calling o85.showString. : java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:255) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:252) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175) at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.getDelegationToken(LoadBalancingKMSClientProvider.java:252) Caused by: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1916) at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1029) ... 67 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: http://kmshostxyz.com:16000/kms/v1/?op=GETDELEGATIONTOKEN&renewer=yarn%2Fyarnhost%40KERBEROSREALM at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:237) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) ... 68 more Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:365) at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205) ... 78 more Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ... 79 more Traceback (most recent call last): File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 381, in show print(self._jdf.showString(n, 20, vertical)) File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) Py4JJavaError: An error occurred while calling o85.showString. : java.io.IOException: java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051) There is also an old related thread but no resolution: https://community.cloudera.com/t5/Support-Questions/Not-able-to-access-the-files-in-HDFS-encryption-zone-from/m-p/332970#M231320

ebeb · ‎12-29-2021

Hi did you find a solution to this?

ebeb · ‎12-21-2021

UPDATE: One possible workaround to suppress these quotes from displaying in select * is to create a view like below in Impala: CREATE VIEW db1.view1 AS SELECT replace(table1.quotedcol1, '"', '') quotedcol1, replace(table1.quotedcol2, '"', '') quotedcol2 FROM db1.table1;

Online	Offline
Last Visited	‎12-20-2023 04:37 PM

Member Since	‎09-14-2017 07:07 AM
Last Visited	‎12-20-2023 04:37 PM
Posts	120
Kudos received	11

Cloudera Community

Re: HUE SAML error after upgrade to CDP 7.1.6

Re: CDP 7.2.4 upgrade - cloudera agents not starti...

Re: How to run Python script in Hue through oozie

Re: Cluster installation failure - src file /opt/c...

Re: spark.yarn.executor.memoryOverhead

Difference between Nifi QueryDatabaseTable and Que...

Re: CDP Private Cloud on CentOS 9

Re: Hive naming limits

Nifi - how to run a simple standalone INSERT SQL s...

Re: NiFi - Understanding how to use Process Groups...

Re: NiFi - Understanding how to use Process Groups...

Re: Kerberos error while authenticating with KMS e...

Kerberos error while authenticating with KMS endpo...

Re: Not able to access the files in HDFS encrypti...

Re: Remove double quotation in impala output