Member since
09-15-2016
19
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5663 | 03-08-2017 06:03 PM |
03-09-2017
01:40 AM
Hi @X Long, how about up-voting some answers, the guys tried to help you but could not have imagined that the issue was something trivial with your IDE. Give and take. Tnx!
... View more
11-22-2016
07:24 PM
1 Kudo
Sorry for the late response. I'm not sure of a way to broadcast the data without collecting to the data driver first. Because we assume the size of hte broadcasted table is small, the collect and broadcast to and from the driver should be fairly quickly. You would have about the same network traffic if you were to someway skip the collect as we need a full copy of the smaller table on each worker anyways. The join(broadcast(right),) is giving spark a hint to do a broadcast join. So this will override the spark.sql.autoBroadcastJoinThreshold, which is 10mb by default. Don't try to broadcast anything larger than 2gb, as this is the limit for a single block in Spark and you will get an OOM or Overflow exception. The data structure of the blocks are capped at 2gb. This autoBroadcastJoinThreshold only applies to hive tables right now that have statistics previously ran on them. So the broadcast hint is going to be used for dataframes not in Hive or one where statistics haven't been run. The general Spark Core broadcast function will still work. In fact, underneath the hood, the dataframe is calling the same collect and broadcast that you would with the general api. The concept of partitions is still there, so after you do a broadcast join, you're free to run mapPartitions on it.
... View more
11-08-2016
10:07 PM
It is very clear, thanks
... View more
09-29-2016
04:36 PM
@X Long please accept the answer if it solved your problem. Thank you
... View more
09-13-2017
06:22 PM
@X Long I was facing same kind of issue. I have resolve this issue by using following steps:- 1) Edit Ambari->Hive->Configs->Advanced->Custom hive-site->Add Property..., add the following properties based on your HBase configurations(you can search in Ambari->HBase->Configs): custom hive-site.xml hbase.zookeeper.quorum=xyz (find this property value from hbase ) zookeeper.znode.parent=/hbase-unsecure (find this property value from hbase ) phoenix.schema.mapSystemTablesToNamespace=true phoenix.schema.isNamespaceMappingEnabled=true 2) Copy jar to /usr/hdp/current/hive-server2/auxlib from /usr/hdp/2.5.6.0-40/phoenix/phoenix-4.7.0.2.5.6.0-40-hive.jar /usr/hdp/2.5.6.0-40/phoenix/phoenix-hive-4.7.0.2.5.6.0-40-sources.jar If he jar is not working for you then just try to get following jar phoenix-hive-4.7.0.2.5.3.0-37.jar and copy this to /usr/hdp/current/hive-server2/auxlib 3) add property to custom-hive-env HIVE_AUX_JARS_PATH=/usr/hdp/current/hive-server2/auxlib/4) Add follwoing property to custom-hbase-site.xmlphoenix.schema.mapSystemTablesToNamespace=true phoenix.schema.isNamespaceMappingEnabled=true 5) Also run following command 1) jar uf /usr/hdp/current/hive-server2/auxlib/phoenix-4.7.0.2.5.6.0-40-client.jar /etc/hive/conf/hive-site.xml 2) jar uf /usr/hdp/current/hive-server2/auxlib/phoenix-4.7.0.2.5.6.0-40-client.jar /etc/hbase/conf/hbase-site.xml And I hope my solution will work for you 🙂
... View more