Member since
07-30-2018
60
Posts
14
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1289 | 06-20-2019 10:14 AM | |
12658 | 06-11-2019 07:04 AM | |
1327 | 03-05-2019 07:25 AM | |
3058 | 01-03-2019 10:42 AM | |
7833 | 12-04-2018 11:59 PM |
02-26-2020
05:10 AM
Hi, I understand that you have a spark java code, Which is taking 2 hours to process 4MB of data and you like to improve the performance of this application. I recommend you to check the below documents, Which helps in performance tuning both in code and configuration level. https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-1/ https://blog.cloudera.com/how-to-tune-your-apache-spark-jobs-part-2/ Thanks Jerry
... View more
06-20-2019
10:14 AM
Hi, Yes, We can enforce a job to run in a particular queue based on user using placement policy. We can define a secondary group for each user. Whenever the user submits a job will land on the secondary group queue. reference Link: https://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/ Thanks Jerry
... View more
06-11-2019
07:04 AM
Hi, Please use the below steps if your cluster is unsecure. kafka-topics --create --zookeeper <zk host>:2181 --replication-factor 3 --partitions 1 --topic Testmessage Run a Kafka console producer: $ kafka-console-producer --broker-list <broker hostnome>:9092 --topic Testmessage Run a Kafka console consumer: $ kafka-console-consumer --new-consumer --topic Testmessage --from-beginning --bootstrap-server <broker hostnome>:9092 Thanks Jerry
... View more
06-10-2019
04:33 AM
Hi, From the logs, It seems its unable to find the broker "Connection to node -1 (localhost/127.0.0.1:9092) could not be established" Make sure you have broker running in the node and listening to 9092 Also try adding the fully qualified name or IP instead of localhost Thanks Jerry
... View more
05-28-2019
03:56 AM
2 Kudos
Hi Harish, You can create a hive context and can access the hive table. Example Program: from pyspark.sql import HiveContext hive_context = HiveContext(sc) sample = hive_context.table("default.<tablename>") sample.show() Reference Link: https://stackoverflow.com/questions/36051091/query-hive-table-in-pyspark
... View more
05-28-2019
03:50 AM
Hi, Can you try to execute a sample spark application. Let us know the results spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --keytab <location>/<filename>.keytab --principal <principle name> /opt/cloudera/parcels/CDH/jars/spark-examples*.jar 1 1 Thanks Jerry
... View more
04-29-2019
11:51 AM
1 Kudo
Hi Based on the error message you have shared. ... TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module' This error corresponds to bug JIRA SPARK-19019 [1]. This bug relates to a compatibility issue between Spark and Python 3.6 Spark 1.6 requires Python 2.6+ as per the Document[2] [1] https://issues.apache.org/jira/browse/SPARK-19019 [2] https://spark.apache.org/docs/1.6.0/#downloading [3] https://www.cloudera.com/documentation/enterprise/5-14-x/topics/spark_python.html#spark_python__section_ark_lkn_25
... View more
04-22-2019
01:45 AM
Hi, Its showing that it is running in unsupported version. Could you let us know the below version Kafka, CDH, System Java version. 'major.minor version 52.0' Will occur if you are running the application in JAVA version which is less than 1.8
... View more
04-16-2019
08:32 AM
2 Kudos
Hi, Impala query usually faster on 2nd time than 1st attempt of same query. This is because of OS cache, Which will keep the files in memory and reuse it. It is OS level feature and not specific to Impala. For further performance improvement, there is a concept of "HDFS caching" which is utilized by Impala. HDFS Caching helps further to improve the speed of query results Reference Link below: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_perf_hdfs_caching.html Thanks Jerry
... View more
03-26-2019
07:53 AM
Hi, If it is a External table we need to delete the directory manually. ALTER TABLE table_name DROP [IF EXISTS] PARTITION <>; hadoop fs -rm -r <partition file path> Thanks Jerry
... View more