Member since
07-30-2018
60
Posts
14
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1353 | 06-20-2019 10:14 AM | |
13512 | 06-11-2019 07:04 AM | |
1384 | 03-05-2019 07:25 AM | |
3172 | 01-03-2019 10:42 AM | |
8004 | 12-04-2018 11:59 PM |
02-26-2020
05:10 AM
Hi, I understand that you have a spark java code, Which is taking 2 hours to process 4MB of data and you like to improve the performance of this application. I recommend you to check the below documents, Which helps in performance tuning both in code and configuration level. https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/ https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/ Thanks Jerry
... View more
06-20-2019
10:14 AM
Hi, Yes, We can enforce a job to run in a particular queue based on user using placement policy. We can define a secondary group for each user. Whenever the user submits a job will land on the secondary group queue. reference Link: https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/ Thanks Jerry
... View more
06-11-2019
07:19 AM
2 Kudos
Hi Jerry, It has starting working after I changed the property "offsets.topic.replication.factor" 3 to 1. Thanks for your support. Aamir.
... View more
06-03-2019
01:46 PM
from spark or pyspark shell use the below commands to access hive database objects. spark.sql("show databases;") spark.sql("select * from databasename.tablename;") or spark.read.table("databasename.tablename") You can give any query inside spark.sql which will give you results.
... View more
05-28-2019
03:50 AM
Hi, Can you try to execute a sample spark application. Let us know the results spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --keytab <location>/<filename>.keytab --principal <principle name> /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 1 1 Thanks Jerry
... View more
05-14-2019
10:47 AM
Hi, The 'major.minor version 52.0' error is possibly because the jar was compiled in JDK 1.8, but you are trying to run it using a JDK 1.7 environment. The reported number is the required number, not the number you are using. To solve this, it's always better to have the JDK and JRE pointed to the same version.
... View more
04-29-2019
11:58 AM
Thanks, Agreed. I also found the bug details. Based on the URL https://spark.apache.org/docs/1.6.0/#downloading you shared, it contains details which says it is compatible with 2.6+ and 3.1+ which is totally misleading since 3.6 is 3.1+ I have started working to upgrade my app to spark 2. Any suggestiosn on Spark 1.6 to Spark 2 migration guide on Cloudera cluster
... View more
04-17-2019
06:00 PM
1 Kudo
In it's default configuration, metadata is cached until an "INVALIDATE METADATA" command evicts the table from the cache. Or until the catalog is restarted. In 5.16 and 6.1+ there are some non-default options that will evict metadata after a particular timeout. At some point these will become the defaults. Table stats are collected and stored in the hive metastore when you run a "compute stats" command. They are then just part of the table metadata.
... View more
03-31-2019
12:42 AM
1 Kudo
Hi, I assume that you work on the managed table instead of external table? This could be because of lack of permissions from the user who tried to run the DROP command to remove the underlining HDFS path. Check HMS server log to see if you can find any error messages.
... View more
03-12-2019
09:26 AM
Hi Naveen, If you have limited number of ports available. You can assign port for each application. --conf "spark.driver.port=4050" —conf "spark.executor.port=51001" --conf "spark.ui.port=4005" Hope it helps Thanks Jerry
... View more