Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1308 | 09-11-2019 10:19 AM | |
8028 | 11-26-2018 07:04 PM | |
1832 | 11-14-2018 12:10 PM | |
3759 | 11-14-2018 12:09 PM | |
2535 | 11-12-2018 01:19 PM |
09-11-2018
10:17 PM
@Jon Page Try these before running spark-submit command: export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python
export PYSPARK_PYTHON=/opt/anaconda2/bin/python /opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes) HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
09-11-2018
02:35 PM
@Harshad M Good to hear you found the issue with the record. Please remember to login and mark the answer as accepted if it helped you in anyway. Thanks!
... View more
09-11-2018
12:18 PM
@Daniel Müller Could you share the explain extended for the above query? From the logical/physical plan details you could see whether filter pushdown is includes the limit. If this is spark with llap integration, I know this is not supported previous HDP 3.0. Starting HDP 3.0 we have added the HWC (hive warehouse connector) for spark, which will work as expected. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
09-10-2018
01:16 PM
@Harshad M Perhaps issue is data related. I see show 10 rows works fine so this means that when it needs to go over all the rows it is failing at some point due data may not be properly formatted. Could you check if underlying data has any additional commas or any other problem?
... View more
09-07-2018
12:54 PM
1 Kudo
@Michael Bronson Check if Driver is doing full garbage collection or if there could be a network issue between executor or driver. You can check the gc pause times in the spark UI and also you can add the gc logs to be printed as part of the output of the driver and executors. --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails" --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails" HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
09-07-2018
12:49 PM
@Michael Bronson In yarn master mode executors will run inside a yarn container. Spark will launch an Application Master that will be responsible of negotiating the containers with Yarn. Having that said only nodes running Nodemanager are eligible to run executors. First question: The executor logs you are looking for will be part of the yarn application logs for the container running on the specific node. (yarn logs -applicationId <appId>) Second question: Executor will notify in case heartbeat fails to reach driver for some network problem/timeout. So this should be in the executor log that is part of the application logs. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-30-2018
02:40 PM
@Sudharsan
Ganeshkumar perhaps a snapshot or a copy of a file pointing to the same blocks. Please remember to login and accept the answer if you think it has addressed your question.
... View more
08-30-2018
02:03 PM
1 Kudo
@Sudharsan
Ganeshkumar
Actually any file stored in hdfs is split in blocks (chunks of data) and each block is replicated 3 times by default. When you delete a file you remove the metadata pointing to the blocks that is stored in Namenode. Blocks are deleted when there is no reference to them in the Namenode metadata. This is important to mention since you could have snapshots, or files in Trash folders still referencing the blocks, if this happens those blocks wont be deleted until the snapshot of files under Trash folders are also removed. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more
08-30-2018
01:33 PM
@vishal dutt Please remember to login and accept the answer if you think it has addressed your question.
... View more
08-30-2018
01:20 PM
@heta desai Based on this jira https://issues.apache.org/jira/browse/HIVE-13290 and the hive docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints This is supported from hive 2.1.0 onwards only. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
... View more