About falbani

falbani · ‎07-19-2018

@Bin Ye I recently presented image recognition using spark during meetup in Santiago, Chile. I've made code and presentation along with all necessary things to run the code public under github. Feel free to review it. I used Nifi to pull messages with images from twitter and send them to kafka topic. From there I used spark streaming to pull messages from topic and performed the image analysis. Finally I stored the results in hbase table. https://github.com/felixalbani/future-of-data-santiago-e1-spark-nifi HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-19-2018

@David Pocivalnik You can search for artifacts and versions on http://repo.hortonworks.com/ For example, I was able to find hbase-client maven dependency <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>2.0.0.3.0.0.0-1634</version> </dependency> HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-19-2018

@Muhammad Umar What python version are you using? One of the imports seems to point to python3 - If that is the case you will need to export few environment settings in order for this to run correctly. Check: https://community.hortonworks.com/questions/138351/how-to-specify-python-version-to-use-with-pyspark.html When running on yarn master deployment mode client the executors will run on any of the cluster worker nodes. This means that you need to make sure all the necessary python libraries you are using along with python desired version is installed on all cluster worker nodes in advanced. Finally, it would be good to have both the driver log (which is printed to stdout of spark-submit) along complete yarn logs -applicationId <appId> for further diagnosis. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-19-2018

@Melchicédec NDUWAYO you will probably need to escape quotes in the strings (or try using single quotes instead) so that it wont break the javascript. If you agree lets discuss this in different thread as it seems to be specific to running code with strings now and the initial question has been addressed.

falbani · ‎07-18-2018

@forest lin spark.driver.extraClassPath is not same as the one I shared for cluster mode. Could you confirm the code is running in client mode? And then try the exact settings I provided for cluster mode? Please let me know how it goes!

falbani · ‎07-18-2018

@Deb This looks to be related to parquet way for coding being different in spark than in hive. Have you tried reading a different non parquet table? Try adding the following configuration for the parquet table: .config("spark.sql.parquet.writeLegacyFormat","true") If that does not work please open a new thread on this issue and we can follow up on this new thread. Thanks!

falbani · ‎07-17-2018

@n c Please review this hcc link https://community.hortonworks.com/questions/57866/how-to-move-hive-and-associated-components-from-on.html Definitely the most important piece is to have a good database backup when hivemetastore is down. Then as outlined above move the mysql database (or used database) first. Then you can move the other components using ambari. And yes, webhcat is part of hive! HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-17-2018

@forest lin The above suggestion was for --deploy-mode client and I see you used --deploy-mode cluster instead. If you are willing to run in cluster mode you need to do this changes: cp /etc/hbase/conf/hbase-site.xml /etc/spark/conf cp /etc/hbase/conf/hbase-site.xml /etc/spark2/conf export SPARK_CLASSPATH="/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.2.0-205-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/lib/hbase-client.jar:/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.2.0-205.jar:/usr/hdp/current/phoenix-client/lib/hbase-common.jar:/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar:/usr/hdp/current/phoenix-client/lib/phoenix-core-4.7.0.2.6.2.0-205.jar" spark-submit \ --class com.test.SmokeTest \ --master yarn\ --deploy-mode client \ --driver-memory 1g \ --executor-memory 1g \ --executor-cores 4 \ --num-executors 2 \ --conf "spark.executor.extraClassPath=phoenix-4.7.0.2.6.2.0-205-spark2.jar:phoenix-client.jar:hbase-client.jar:phoenix-spark2-4.7.0.2.6.2.0-205.jar:hbase-common.jar:hbase-protocol.jar:phoenix-core-4.7.0.2.6.2.0-205.jar" \ --conf "spark.driver.extraClassPath=phoenix-4.7.0.2.6.2.0-205-spark2.jar:phoenix-client.jar:hbase-client.jar:phoenix-spark2-4.7.0.2.6.2.0-205.jar:hbase-common.jar:hbase-protocol.jar:phoenix-core-4.7.0.2.6.2.0-205.jar" \ --jars /usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.2.0-205-spark2.jar,/usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/hbase-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark2-4.7.0.2.6.2.0-205.jar,/usr/hdp/current/phoenix-client/lib/hbase-common.jar,/usr/hdp/current/phoenix-client/lib/hbase-protocol.jar,/usr/hdp/current/phoenix-client/lib/phoenix-core-4.7.0.2.6.2.0-205.jar \ --files /etc/hbase/conf/hbase-site.xml --verbose \ /tmp/test-1.0-SNAPSHOT.jar HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-17-2018

@Debananda Sahoo In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code: from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession \ .builder \ .appName("data_import") \ .config("spark.dynamicAllocation.enabled", "true") \ .config("spark.shuffle.service.enabled", "true") \ .enableHiveSupport() \ .getOrCreate() jdbcDF2 = spark.read \ .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test") More information and examples on this link: https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases Please let me know if that works for you. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎07-17-2018

@Debananda Sahoo In spark 2 you should leverage spark session instead of spark context. To read jdbc datasource just use the following code: from pyspark.sql import SparkSession from pyspark.sql import Row spark = SparkSession \ .builder \ .appName("data_import") \ .config("spark.dynamicAllocation.enabled", "true") \ .config("spark.shuffle.service.enabled", "true") \ .enableHiveSupport() \ .getOrCreate() jdbcDF2 = spark.read \ .jdbc("jdbc:sqlserver://10.24.40.29;database=CORE;username=user1;password=Passw0rd", "test") More information and examples on this link: https://spark.apache.org/docs/2.1.0/sql-programming-guide.html#jdbc-to-other-databases Please let me know if that works for you. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Online	Offline
Last Visited	‎02-05-2025 11:14 AM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎02-05-2025 11:14 AM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: Apache spark image recognition

Re: maven artifacts for hortonworks hdp3

Re: Pyspark code failing on cluster mode

Re: apache livy and ajax post requests: Method Not...

Re: spark-submit - NoSuchMethodError :saveToPhoen...

Re: AttributeError in Spark

Re: moving hive metastore from one node to another

Re: spark-submit - NoSuchMethodError :saveToPhoen...

Re: AttributeError in Spark

Re: AttributeError in Spark 2.3