About mszurap

mszurap · ‎03-30-2022

Hello @Jared , The "ClassNotFoundException" means the JVM responsible for running the code has not found one of the required Java classes - which the code relies on. It's great that you have added those jars to your IntelliJ development environment, however that does not mean it will be available during runtime. One way would be to package all the dependencies in your jar, creating a so called "fat jar", however that is not recommended as with that your application will not be able to benefit from the future bugfixes which will be deployed in the cluster as the cluster is upgraded/patched. That would also have a risk that your application will fail after upgrades due to different class conflicts. The best way is to set up the running environment to have the needed classes. Hue / Java editor actually creates a one-time Oozie workflow with a single Java action in it, however it does not really give you flexibilty around customizing all the parts of this workflow and the running environment including what other jars you need to be shipped with the code. Since your code relies on SparkConf I assume it is actually a Spark based application. It would be a better option to create an Oozie workflow (you can also start from Hue > Apps > Scheduler > change Documents dropdown to Actions) with a Spark action. That will set up all the classpath needed for running Spark apps. That way you do not need to reference any Spark related jars, just the jar with your custom code. Hope this helps. Best regards Miklos

mszurap · ‎03-28-2022

Hello @Sayed016 , In general the java.io.IOException: Filesystem closed message happens when the same or a different thread in the same JVM called the "FileSystem.close()" (see JavaDoc) method - and something later tries to access the HDFS filesystem. (in this case the "EventLoggingListener.stop()" tries to access the HDFS to flush the Spark event logs to HDFS) FileSystem.close() should not be called by any custom code, as there is a single shared instance of the FileSystem object in any given JVM instance and it can cause failures for the still running frameworks like Spark. This suggests that the Spark application has the above FileSystem.close() call somewhere in the code. Please review the code and remove those. Hope that helps. Best regards, Miklos

mszurap · ‎03-25-2022

Hi Rama, yes, you can configure that in the "Hue Service Advanced Configuration Snippet (Safety Valve) for hue_safety_valve.ini". OPSAPS-41615 is still open, in the future you can ask the status from any of your account team contacts. If you don't know who are those contacts, please ask/clarify that through the already open support case. Best regards, Miklos

mszurap · ‎03-24-2022

Hello @ram76 , You can configure Hue to use the XFF header: [desktop] use_x_forwarded_host=true See hue.ini reference: https://github.com/cloudera/hue/blob/master/desktop/conf.dist/hue.ini If not already done, besides using an external load-balancer (like F5 - to let the end users remember only a single Hue login URL) please consider to add "Hue Load Balaner" role in CM > Hue service (which sets up an Apache httpd) to serve the static contents. See the following for more: https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/hue_use_add_lb.html#hue_use_add_lb Hope this helps. Best regards, Miklos

mszurap · ‎03-22-2022

Hello @mhsyed , thanks for reporting this. I see similar description on our Partner page too: https://www.cloudera.com/downloads/partner/intel.html Seems the link is broken because the "Intel-bigdata" Github does not have the "mkl-wrappers-parcel-repo" anymore: https://github.com/orgs/Intel-bigdata/repositories I have involved our respective teams to get in touch with Intel to fix this. Unfortunately I cannot offer any other workaround in the meantime, we ask for your patience. Best regards Miklos Szurap Customer Operations Engineer, Cloudera

mszurap · ‎03-10-2022

Hi @M129 , the error message is not too descriptive. Can you please check the HiveMetaStore logs what is the complete error message - and reason for the failure? Thanks Miklos

mszurap · ‎02-24-2022

One more item to add to have a complete picture. SparkSQL does not support directly the usage of Hive ACID tables. For that in CDP you can use the Hive Warehouse Connector (HWC), please see: https://docs.cloudera.com/cdp-private-cloud-base/7.1.3/integrating-hive-and-bi/topics/hive_hivewarehouseconnector_for_handling_apache_spark_data.html

mszurap · ‎02-23-2022

Hi @Rajeshhadoop , The Spark DataSource API has the following syntax: val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:...")...load() Please see: https://spark.apache.org/docs/2.4.0/sql-data-sources-jdbc.html The problem with this approach - using with Hive or Impala is that since the above may run on multiple executors, this could overwhelm and essentially DDOS the Hive / Impala service. As the documentation states this is not a supported way of connecting from Spark to Hive/Impala. However you should be able to still connect to Hive and Impala through a simple JDBC connection using "java.sql.DriverManager" or "java.sql.Connection". That in contrast runs only on a single thread, on the Spark driver side - and will create a single connection to a HiveServer2 / Impala daemon instance. The throughput between the Spark driver and Hive/Impala of course is limited with this approach, please use it for simple queries or submitting DDL/DML queries. Please see https://www.cloudera.com/downloads/connectors/hive/jdbc.html https://www.cloudera.com/downloads/connectors/impala/jdbc.html for the JDBC drivers and for examples. Independently of the above, you can still access Hive tables' data through SparkSQL with val df = spark.sql("select ... from ...") which is the recommended way of accessing and manipulating Hive table data from Spark as it is parallelized through the Spark executors. See docs: https://spark.apache.org/docs/2.4.0/sql-data-sources-hive-tables.html I hope this clarifies it. Best regards Miklos

mszurap · ‎02-22-2022

Hi @Jmeks ,Please check again how the table has been created or how those partitions were created ("describe formatted <table> partition <partspec>"), as the same is still working for me even on HDP 3.1.0: create table mdatetest (col1 string) partitioned by (`date` date) location '/tmp/md'; alter table mdatetest add partition (`date`="2022-02-22"); show partitions mdatetest; +------------------+ | partition | +------------------+ | date=2022-02-22 | +------------------+ alter table mdatetest drop partition (`date`="2022-02-22");

mszurap · ‎02-21-2022

Hi @Jmeks , Can you clarify which CDH/HDP/CDP version do you have? What is the datatype of that "date" partitioning column? The mentioned syntax works for both string and date datatypes in CDH 6.x.

Online	Offline
Last Visited	‎12-10-2024 10:10 AM

Member Since	‎11-04-2015 11:53 PM
Last Visited	‎12-10-2024 10:10 AM
Posts	260
Kudos received	44

Cloudera Community

Re: Hive fails to start with "Caused by: java.lang...

Re: The heap memory usage of NameNode is much high...

Re: Hue and Sqoop white spaces in query

Re: straight SELECT and SELECT via CTE produce dif...

Re: Best practices for partition tables in Impala ...

Re: Getting "Caused by: java.lang.NoClassDefFoundE...

Re: Spark job through Oozie is failing with - sche...

Re: Preserve Source client IP address when use Loa...

Re: Preserve Source client IP address when use Loa...

Re: 404 when Adding MKL wrapper parcel

Re: Error while Creating an external table in hive...

Re: Spark unsupported fearutes in CDP

Re: Spark unsupported fearutes in CDP

Re: Delete partition Hive error

Re: Delete partition Hive error