About falbani

falbani · ‎09-11-2018

@Jon Page Try these before running spark-submit command: export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python export PYSPARK_PYTHON=/opt/anaconda2/bin/python /opt/anaconda2/bin/python should be the location of your 2.7 python (this should be same across all clsuter nodes) HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎09-11-2018

@Harshad M Good to hear you found the issue with the record. Please remember to login and mark the answer as accepted if it helped you in anyway. Thanks!

falbani · ‎09-11-2018

@Daniel Müller Could you share the explain extended for the above query? From the logical/physical plan details you could see whether filter pushdown is includes the limit. If this is spark with llap integration, I know this is not supported previous HDP 3.0. Starting HDP 3.0 we have added the HWC (hive warehouse connector) for spark, which will work as expected. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎09-10-2018

@Harshad M Perhaps issue is data related. I see show 10 rows works fine so this means that when it needs to go over all the rows it is failing at some point due data may not be properly formatted. Could you check if underlying data has any additional commas or any other problem?

falbani · ‎09-07-2018

@Michael Bronson Check if Driver is doing full garbage collection or if there could be a network issue between executor or driver. You can check the gc pause times in the spark UI and also you can add the gc logs to be printed as part of the output of the driver and executors. --conf "spark.driver.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails" --conf "spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails" HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎09-07-2018

@Michael Bronson In yarn master mode executors will run inside a yarn container. Spark will launch an Application Master that will be responsible of negotiating the containers with Yarn. Having that said only nodes running Nodemanager are eligible to run executors. First question: The executor logs you are looking for will be part of the yarn application logs for the container running on the specific node. (yarn logs -applicationId <appId>) Second question: Executor will notify in case heartbeat fails to reach driver for some network problem/timeout. So this should be in the executor log that is part of the application logs. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-30-2018

@Sudharsan Ganeshkumar perhaps a snapshot or a copy of a file pointing to the same blocks. Please remember to login and accept the answer if you think it has addressed your question.

falbani · ‎08-30-2018

@Sudharsan Ganeshkumar Actually any file stored in hdfs is split in blocks (chunks of data) and each block is replicated 3 times by default. When you delete a file you remove the metadata pointing to the blocks that is stored in Namenode. Blocks are deleted when there is no reference to them in the Namenode metadata. This is important to mention since you could have snapshots, or files in Trash folders still referencing the blocks, if this happens those blocks wont be deleted until the snapshot of files under Trash folders are also removed. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-30-2018

@vishal dutt Please remember to login and accept the answer if you think it has addressed your question.

falbani · ‎08-30-2018

@heta desai Based on this jira https://issues.apache.org/jira/browse/HIVE-13290 and the hive docs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints This is supported from hive 2.1.0 onwards only. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Online	Offline
Last Visited	‎10-24-2023 05:42 PM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎10-24-2023 05:42 PM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: multiple versions of python issues

Re: pyspark-java.lang.IllegalStateException: Input...

Re: Spark SQL: Limit clause performance issues

Re: pyspark-java.lang.IllegalStateException: Input...

Re: Spark failure detection - why datanode not sen...

Re: What are Spark executors that runs from the da...

Re: If I delete a file in hdfs having replication ...

Re: If I delete a file in hdfs having replication ...

Re: Stop Cluster

Re: enable to create hive table with primary key c...