About gbraccialli3

gbraccialli3 · ‎12-15-2015

Great article Wes!

gbraccialli3 · ‎12-11-2015

Awesome!

gbraccialli3 · ‎11-24-2015

@Kuldeep Kulkarni ambari restart is not needed, I tested and updated configs appears immediately. You are right about HDFS restart, also mentioned as step #2.

gbraccialli3 · ‎11-24-2015

Spark SQL comes with a nice feature called: "JDBC to other Databases", but, it practice, it's JDBC federation feature. "It can be used to create data frames from jdbc databases using scala/python, but it also works directly with Spark SQL Thrift server and allow us to query external JDBC tables seamless like other hive/spark tables." Example below using sandbox 2.3.2 and spark 1.5.1 TP (https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/). This feature works with spark-submit, spark-shell, zeppelin, spark-sql client and spark sql thift server. In this post two examples: #1 using spark-sql thrift server, #2 using spark-shell. Example #1 using Spark SQL Thrift Server 1- Run Spark SQL Thrift Server with mysql jdbc driver: sudo -u spark /usr/hdp/2.3.2.1-12/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10010 --jars "/usr/share/java/mysql-connector-java.jar" 2- Open beeline and connect to Spark SQL Thrift Server: beeline -u "jdbc:hive2://localhost:10010/default" -n admin 3- Create a jdbc federated table pointing to existing mysql database, using beeline: CREATE TABLE mysql_federated_sample USING org.apache.spark.sql.jdbc OPTIONS ( driver "com.mysql.jdbc.Driver", url "jdbc:mysql://localhost/hive?user=hive&password=hive", dbtable "TBLS" ); describe mysql_federated_sample; select * from mysql_federated_sample; select count(1) from mysql_federated_sample; Example #2 using Spark shell, scala code and data frames: 1- Open spark-shell with mysql jdbc driver spark-shell --jars "/usr/share/java/mysql-connector-java.jar" 2- Create a data frame pointing to mysql table val jdbcDF = sqlContext.read.format("jdbc").options( Map( "driver" -> "com.mysql.jdbc.Driver", "url" -> "jdbc:mysql://localhost/hive?user=hive&password=hive", "dbtable" -> "TBLS" ) ).load() jdbcDF.show See other spark jdbc examples / troubleshooting here: https://community.hortonworks.com/questions/1942/spark-to-phoenix.html

gbraccialli3 · ‎11-23-2015

@Kuldeep Kulkarni another workaround: 1- Execute commands below: /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p AMBARI-PASSWORD delete localhost CLUSTER-NAME hdfs-site "dfs.namenode.rpc-address" /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p AMBARI-PASSWORD delete localhost CLUSTER-NAME hdfs-site "dfs.namenode.http-address" /var/lib/ambari-server/resources/scripts/configs.sh -u admin -p AMBARI-PASSWORD delete localhost CLUSTER-NAME hdfs-site "dfs.namenode.https-address" 2- Restart HDFS

gbraccialli3 · ‎11-23-2015

I also found this jira: https://issues.apache.org/jira/browse/AMBARI-13946

gbraccialli3 · ‎11-23-2015

thank you @Kuldeep Kulkarni I have same issue with a prospect. Same happens to hdfs mover.

gbraccialli3 · ‎11-11-2015

4- Execute sql, using sql interpreter %sql select geohash_encode(1.11,1.11,3) from sample_07 limit 10 It fails with sql interpreter + zeppelin: java.lang.ClassNotFoundException: com.github.gbraccialli.hive.udf.UDFGeohashEncode at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) ...

gbraccialli3 · ‎11-11-2015

Second, the same with zeppelin: 1- Restart interpreter 2- Load dependencies %dep z.reset() z.load("com.github.gbraccialli:HiveUtils:1.0-SNAPSHOT") 3- Execute sql, using same scale code val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); sqlContext.sql("""create temporary function geohash_encode as 'com.github.gbraccialli.hive.udf.UDFGeohashEncode'"""); sqlContext.sql("""select geohash_encode(1.11,1.11,3) from sample_07 limit 10""").collect().foreach(println); It worked with scale code + zeppelin!!!!!

gbraccialli3 · ‎11-11-2015

2- Run spark-shell with dependency spark-shell --master yarn-client --packages "com.github.gbraccialli:HiveUtils:1.0-SNAPSHOT" 3- Run spark code val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc); sqlContext.sql("""create temporary function geohash_encode as 'com.github.gbraccialli.hive.udf.UDFGeohashEncode'"""); sqlContext.sql("""select geohash_encode(1.11,1.11,3) from sample_07 limit 10""").collect().foreach(println); spark-shell worked fine!

Online	Offline
Last Visited	‎09-28-2021 03:33 PM

Member Since	‎09-25-2015 05:42 PM
Last Visited	‎09-28-2021 03:33 PM
Posts	230
Kudos received	236

Cloudera Community

Re: Unofficial Storm and Kafka Best Practices Guid...

Re: Leveraging the upcoming HIVE 1.3 security UDFs...

Re: Balancer not working in hdfs HA

SparkSQL jdbc Federation

Re: Balancer not working in hdfs HA

Re: Balancer not working in hdfs HA

Re: Balancer not working in hdfs HA

Re: Using Hive UDF/UDAF/UDTF with SparkSQL

Re: Using Hive UDF/UDAF/UDTF with SparkSQL

Re: Using Hive UDF/UDAF/UDTF with SparkSQL