Member since
11-04-2015
260
Posts
44
Kudos Received
33
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3126 | 05-16-2024 03:10 AM | |
1683 | 01-17-2024 01:07 AM | |
1705 | 12-11-2023 02:10 AM | |
2420 | 10-11-2023 08:42 AM | |
1700 | 09-07-2023 01:08 AM |
05-02-2022
08:36 AM
Hi @gfragkos, thanks for checking. Let's step back then. Is the Impala service TLS/SSL enabled at all? Can you verify that with openssl tools, like: echo | openssl s_client -connect cdp-tdh-de3-master0.cdp-tdh.u5te-1stu.cloudera.site:21050 -CAfile /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem
... View more
04-28-2022
08:24 AM
Hello Gozde @gfragkos , Have you checked whether the connectivity works with the given sslTrustStore file with a Java based client? (for example with beeline) As I see your application tries to use unixODBC to connect to a CDP / Impala service. However from the shared connection details I see that the truststore is a Java keystore file (JKS), and since the "nanodbc.cpp" is not a Java based application, it probably cannot recognize that as a valid truststore file. Please try to use a "pem" format trustrstore file instead. Please also review the Impala ODBC Driver documentation: https://downloads.cloudera.com/connectors/impala_odbc_2.6.14.1016/Cloudera-ODBC-Connector-for-Impala-Install-Guide.pdf Thanks Miklos
... View more
04-27-2022
12:54 AM
Hi @jarededrake , that's a good track, the issue currently seems to be that the cluster has Kerberos enabled, and that needs an extra configuration. In the workflow editor, in the right upper corner of the Spark action you will find a cogwheel icon for advanced settings. There on the Credentials tab enable the "hcat" and "hbase" credentials to let the Spark client obtain delegation tokens for the Hive (Hive metastore) and HBase services - in case the spark application wants to use those services (Spark does not know this in advance, so it obtains those DTs). You can disable this behavior too if you are sure that the Spark applicatino will not connect to Hive (using Spark SQL) or HBase, just add the following to the Spark action option list: --conf spark.security.credentials.hadoopfs.enabled=false --conf spark.security.credentials.hbase.enabled=false --conf spark.security.credentials.hive.enabled=false but it's easier to just enable these credentials in the settings page. For similar Kerberos related issues in other actions, please see the following guide: https://gethue.com/hadoop-tutorial-oozie-workflow-credentials-with-a-hive-action-with-kerberos/
... View more
04-26-2022
05:09 AM
Hi @jarededrake , sorry for the delay, I was away for a couple of days. You should use your thin jar (application only - without the dependencies) in the target directory ("SparkTutorial-1.0-SNAPSHOT.jar"). The NoClassDefFoundError for the SparkConf suggests that you've tried a Java action. It is highly suggested to use a Spark action in Oozie workflow editor when running a Spark application to make sure that the environment is set up properly for the application.
... View more
04-14-2022
09:16 AM
So is it "/tmp/kbr5cc_dffe" or "krb5cc_cldr"? Or where do you see the "KRB5CCNAME=/tmp/kbr5cc_dffe"? The "krb5cc_cldr" is used for all (? not sure, but all which I've quickly verified had that) services - we can say it's hardcoded - it is anyways "private" to the process itself, that holds the kerberos ticket cache which only that process is using (and renewing if needed).
... View more
04-14-2022
09:12 AM
I see. Have you verified that the built jar contains this package structure and class names? Can you also show where the jar is uploaded and how is it referenced in the oozie workflow? Thanks, Miklos
... View more
04-14-2022
07:42 AM
Hi, I'm doing well, thank you, hope you're good too. That property usually points to a relative path - which exists in the process directory: KRB5CCNAME='krb5cc_cldr' if that's not the case, I would look into whether the root user's (or maybe the "cloudera-scm" user's) .bashrc file has overridden that KRB5CCNAME environment variable by any chance.
... View more
04-14-2022
01:45 AM
Hi @yagoaparecidoti , in general, the "supervisor.conf" in the process directory (actually the whole process directory) is prepared by Cloudera Manager (server) before starting a process (CM server sends the whole package of information including config files to the CM agent which extracts it in a new process directory). The supervisor.conf file contains all the environment and command related information which is needed for the Supervisor daemon to start the process. There might be some default values taken from the cluster or from the service type. Do you have some specific questions about it?
... View more
04-13-2022
02:36 AM
1 Kudo
Hi @Seaport , the "RegexSerDe" is in the contrib package, which is not supported officially, and as such you can use it in some parts of the platform but the different components may not give you full support for that. I would recommend you to preprocess the datafiles to have a commonly consumable format (CSV) before ingesting them into the cluster. Alternatively you can ingest it into a table which has only a single (string) column, and then do the processing/validation/formatting/transforming of the data with inserting it into a proper final table with the columns you need. During the insert you can still use "regex" or "substring" type of functions / UDFs to extract the fields you need from the fixed-width datafiles (from the table with a single column). I hope this helps, Best regards, Miklos
... View more
04-13-2022
02:03 AM
Hi @jarededrake , The "ClassNotFoundException: Class Hortonwork.SparkTutorial.Main not found" suggests that in the Java program's main class package name might have a typo (in your workflow definiton), the Hortonwork should be Hortonworks. Can you check that?
... View more