Member since
07-05-2017
74
Posts
3
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7132 | 08-13-2019 03:54 PM | |
7132 | 05-10-2019 07:11 PM | |
12094 | 01-09-2019 12:17 AM | |
9571 | 01-08-2019 01:54 PM | |
23207 | 05-24-2018 02:43 PM |
01-09-2019
12:17 AM
@Pavel Stejskal Using the HiveWarehouseConnector + Hiveserver2Interactive(LLAP for managed tables) is mandatory and the reasons are explained in the HDP3 documentation, if you're not using it then for sure the properties are not OK, if the namespace part of it is not configured to point to the hiveserver2Interactive znode ( I think that's what you meant), then that is not correct. To read a table into a DF, you have to use HiveWarehouseSession's API, i.e: val df = hive.executeQuery("select * from web_sales") I'd like to suggest reading throught this entire article. BR.
... View more
01-08-2019
07:48 PM
Spark-submit looks fine, this issue will take more than a forum to resolve, would require code and logs analysis I'd say. Meanwhile, I can only suggest to pass "-Dsun.security.krb5.debug=true" to the extraJavaOptions, and it would also help if you can set the following in log4j.properties file "log4j.logger.org.apache.spark.deploy.yarn.Client=DEBUG", then restart the application, hoping it will print more pointers. Also, if your KDC is an MIT KDC, double check that your principal has not set a 'Maximum Renewal Time' of 00:00:00 as explained here Another property to try out, depending on your application use case that may help is to set: --conf mapreduce.job.complete.cancel.delegation.tokens=false
... View more
01-08-2019
01:54 PM
1 Kudo
Hi @Nikhil Raina, In simple words, a speculative execution means that Hadoop in overall doesn't try to fix slow tasks as it is hard to detect the reason (misconfiguration, hardware issues, etc), instead, it just launches another parallel/backup task for each task that is performing slower than the expected, on faster nodes. So these backup tasks are called speculative tasks and it can be enabled/disabled as its benefits are per use case and up to the Hadoop Admin to consider it to be beneficial or not; speculative execution has an impact on the cluster throughput and resource usage. You can find this in MapReduce or Spark for example. Hope it helps, David
... View more
01-07-2019
08:26 PM
Hi @Michael Bronson, Is it deleting everything else but the .inprogress files? The following is already present and fixed on HDP 2.6.4: https://issues.apache.org/jira/browse/SPARK-8617 Where one of the proposed changes was to use loading time for inprogress files as lastUpdated and keep using the modTime for completed files. First one will prevent deletion of inprogress job files. The second one will ensure that lastUpdated time won't change for completed jobs in an event of HistoryServer reboot. - Can you double check the .inprogress files timestamp. - Check they do not correspond to actual running applications (streaming apps for example) - Check permission on these files, and perhaps try to manually delete one of these lingering .inprogress files logged in as the spark user and see if it lets you remove one of them. - Restart the SHS and check the log to see if it prints any errors while trying to remove these .inprogress files. Similar error messages like: case t: Exception => logError("Exception in cleaning logs", t) logError(s"IOException in cleaning ${attempt.logPath}", t) logInfo(s"No permission to delete ${attempt.logPath}, ignoring.") Regards, David
... View more
01-03-2019
06:57 PM
Can you share (masked) the spark submit command and the full "delegation token has expired" stacktrace? Also what is the use case of your app?
... View more
12-27-2018
06:58 PM
Hi Mani, you might also want to increase the number of executors then, and may probably be able to lower the memory size. Try with: spark-submit --master yarn --deploy-mode client --driver-memory 5g --num-executors 6 --executor-memory 8g myclass myjar.jar param1 param1 param3 param4 param5 Tunning this requires lots of other information like input data size, application use case, datasource information, cluster resources available, etc. Keep tunning --num-executors --executor-memory and --executor-cores (5 is usually a good number)
... View more
12-26-2018
06:58 AM
Hi Mani, use - - executor-memory 10g instead of 6g, and remove the overHead config property.
... View more
12-24-2018
06:44 PM
Sure, can you share your spark-submit command with the arguments as well? Mask any sensitive information please.
... View more
12-23-2018
07:02 PM
Hi @Aakriti Batra, The problem seems to be in the JAAS file passed to the executor, it would help to see it's content, but I'd rather suggest you to read this whole article instead: https://community.hortonworks.com/articles/56704/secure-kafka-java-producer-with-kerberos.html
... View more
12-23-2018
06:56 PM
hi @Ali, You might want to add "--keytab /path/to/the/headless-keytab", "--principal principalNameAsPerTheKeytab" and "--conf spark.hadoop.fs.hdfs.impl.disable.cache=true" to the spark-submit command.
... View more
- « Previous
-
- 1
- 2
- Next »