Member since
09-14-2017
120
Posts
11
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3133 | 06-17-2021 06:55 AM | |
1922 | 01-13-2021 01:56 PM | |
17196 | 11-02-2017 06:35 AM | |
18991 | 10-04-2017 02:43 PM | |
34408 | 09-14-2017 06:40 PM |
05-05-2023
01:25 PM
I have read the descriptions of both these two Nifi processors and I am unable to understand the difference. The description seems to be almost same for both. It will be nice if Nifi docs description provided some note on the difference. Main difference seems to be: QueryDatabaseTable: Query result will be converted to Avro format. QueryDatabaseTableRecord: Query result will be converted to the format specified by the record writer. So there is a separate Record Writer field in the properties.
... View more
Labels:
- Labels:
-
Apache NiFi
01-31-2023
09:32 AM
1 Kudo
I am curious how Cloudera managed to get hold of a Centos 8.6 version out of the blue when apparently Centos final release was as per https://en.wikipedia.org/wiki/CentOS Final release 8.5.2111[2] (16 November 2021; 14 months ago)
... View more
04-06-2022
01:00 PM
Found some new info in latest Cloudera CDP guides for Impala. It would be nice if they made Hive and Impala more similar for SQL standard syntax but unfortunately it is not same: https://docs.cloudera.com/runtime/7.2.14/impala-sql-reference/topics/impala-identifiers.html Impala identifiers Provides information about using Identifiers as the names of databases, tables, or columns when creating the objects. The following rules apply to identifiers in Impala: The minimum length of an identifier is 1 character. The maximum length of an identifier is currently 128 characters, enforced by the Metastore database.
... View more
02-09-2022
08:57 AM
Hello Experts, I want to run a simple Hive Insert SQL statement in Nifi periodically for example: insert overwrite Table1 select * from Table2; All SQL values are fixed hardcoded and don't need to be changed dynamically in the flow. As a newbie I was thinking I could write the flow as: ReplaceText->success->PutHiveQL Search Value: (?s)(^.*$) Replacement Value: insert overwrite Table1 select * from Table2; But I have an error in ReplaceText which probably needs an incoming flowfile which I don't have since the SQL is fixed hardcoded: "Upstream Connections is invalid because Processor requires an upstream connection but currently has none." The other option I could try but not sure it will work is: GenerateFlowFile -> success -> PutHiveQL GenerateFlowFile Custom Text: insert overwrite Table1 select * from Table2;
... View more
Labels:
- Labels:
-
Apache NiFi
01-24-2022
08:33 AM
Thanks @MattWho , actually found a way to filter/search process groups by name using the Summary option in the top right menu. This is very useful to find all the ETL pipelines once we give proper names and then by entering a part of the name we can show all matching Process Groups. Being a newbie I am trying to compare Streamsets UI to Nifi UI so I can work the same way. Streamsets provides an initial list of all ETL pipelines to filter by name etc. I guess if just after Nifi login if we saw two links: Summary and Canvas then users can click intuitively on the summary screen and review all their Process groups and click on specific PG they want to work with. This would make it similar to other ETL tools like Streamsets.
... View more
01-20-2022
11:57 AM
Hello, Is it still the best practice to create say 100 process groups for 100 dataflow/etl pipelines each of which has multiple processors in each pipeline. Wont 100 process groups be difficult to see on a single canvas? Or any better way so we can easily see and search the 100 ETL pipelines using some filter like name, userid, date etc. to narrow down the list.
... View more
01-01-2022
05:14 PM
Hello @GangWar , Yes I can do kinit -kt for all the userids including yarn, livy and own userid from the same server.
... View more
12-29-2021
09:24 AM
Hello, I am getting an error after upgrade of CDH5.16 to CDP7.1.7. The logs show it is unable to connect to the KMS endpoint. First start a Spark session in sparkmagic then I run below example pyspark code: Starting Spark application ID YARN Application ID Kind State Spark UI Driver log Current session? 23 application_1639802810085_6070 pyspark idle Link Link ✔ SparkSession available as 'spark'. ###############sample pyspark code################### from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').getOrCreate() # load data from .csv file in HDFS # tips = spark.read.csv("/user/hive/warehouse/tips/", header=True, inferSchema=True) # OR load data from table in Hive metastore tips = spark.table('db1.table1') from pyspark.sql.functions import col, lit, mean # query using DataFrame API #tips \ # query using SQL spark.sql("select name from db1.table1").show(3) spark.stop() An error occurred while calling o85.showString.
: java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051)
at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:255)
at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$1.call(LoadBalancingKMSClientProvider.java:252)
at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:175)
at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.getDelegationToken(LoadBalancingKMSClientProvider.java:252) Caused by: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1916)
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1029)
... 67 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Error while authenticating with endpoint: http://kmshostxyz.com:16000/kms/v1/?op=GETDELEGATIONTOKEN&renewer=yarn%2Fyarnhost%40KERBEROSREALM
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.wrapExceptionWithMessage(KerberosAuthenticator.java:237) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
... 68 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.doSpnegoSequence(KerberosAuthenticator.java:365)
at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:205)
... 78 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ... 79 more
Traceback (most recent call last):
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 381, in show
print(self._jdf.showString(n, 20, vertical))
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p0.15945976/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o85.showString.
: java.io.IOException: java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.crypto.key.kms.KMSClientProvider.getDelegationToken(KMSClientProvider.java:1051) There is also an old related thread but no resolution: https://community.cloudera.com/t5/Support-Questions/Not-able-to-access-the-files-in-HDFS-encryption-zone-from/m-p/332970#M231320
... View more
Labels:
12-29-2021
07:39 AM
Hi did you find a solution to this?
... View more
12-21-2021
09:51 AM
UPDATE: One possible workaround to suppress these quotes from displaying in select * is to create a view like below in Impala: CREATE VIEW db1.view1 AS SELECT replace(table1.quotedcol1, '"', '') quotedcol1, replace(table1.quotedcol2, '"', '') quotedcol2 FROM db1.table1;
... View more