Member since
09-25-2015
230
Posts
276
Kudos Received
39
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
24913 | 07-05-2016 01:19 PM | |
8334 | 04-01-2016 02:16 PM | |
2075 | 02-17-2016 11:54 AM | |
5587 | 02-17-2016 11:50 AM | |
12541 | 02-16-2016 02:08 AM |
11-23-2015
04:36 PM
1 Kudo
I also found this jira: https://issues.apache.org/jira/browse/AMBARI-13946
... View more
11-23-2015
11:26 AM
1 Kudo
thank you @Kuldeep Kulkarni I have same issue with a prospect. Same happens to hdfs mover.
... View more
11-23-2015
10:19 AM
1 Kudo
@Ali Bajwa same here with sandbox 2.3.2, when I try to run Spark SQL Thrift server. @vshukla can you help? I think we also need to update our blog with instructions to solve this issue.
... View more
11-19-2015
02:43 PM
1 Kudo
@Neeraj Sabharwal and @Andrew Watson if you are using hdfs snapshot, it is important to backup hive metastore database as well, because hdfs snapshot only wont be enough without tables/partitions information if user accidentally execute drop tables or drop databases.
... View more
11-19-2015
12:25 AM
@Andrew Watson I understood Hortonworks is going to support 1.5.1 in December and not 1.5.2, that would be the reason to use 1.5.1 instead of 1.5.2.
... View more
11-17-2015
05:36 PM
1 Kudo
What components and folders to backup (only metadata, not data)? What would be the commands? - Namenode
- Ambari database
- Hive Metastore database
- Oozie database
- Hue database
- Ranger database
... View more
Labels:
11-16-2015
06:46 PM
@Neeraj should be the same, add hdp-updated repo, yum install, copy hive-site.xml
... View more
11-16-2015
05:35 PM
1 Kudo
I used steps from our blog and it worked: https://hortonworks.com/hadoop-tutorial/apache-spark-1-5-1-technical-preview-with-hdp-2-3/
... View more
11-16-2015
05:23 PM
7 Kudos
@Vedant Jain Example below works with Sandbox 2.3.2: PS: Note I haven't changed classpath, I only used --jars option from shell: spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar
inside spark-shell: //option 1, read table
val jdbcDF = sqlContext.read.format("jdbc").options(
Map(
"driver" -> "org.apache.phoenix.jdbc.PhoenixDriver",
"url" -> "jdbc:phoenix:sandbox.hortonworks.com:2181:/hbase-unsecure",
"dbtable" -> "TABLE1")).load()
jdbcDF.show
//option 2, read custom query
import java.sql.{Connection, DriverManager, DatabaseMetaData, ResultSet}
import org.apache.spark.rdd.JdbcRDD
def getConn(driverClass: => String, connStr: => String, user: => String, pass: => String): Connection = {
var conn:Connection = null
try{
Class.forName(driverClass)
conn = DriverManager.getConnection(connStr, user, pass)
}catch{ case e: Exception => e.printStackTrace }
conn
}
val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") ,
"select sum(10) from TABLE1 where ? <= id and id <= ?",
1, 10, 2)
myRDD.take(10)
val myRDD = new JdbcRDD( sc, () => getConn("org.apache.phoenix.jdbc.PhoenixDriver", "jdbc:phoenix:localhost:2181:/hbase-unsecure", "", "") ,
"select col1 from TABLE1 where ? <= id and id <= ?",
1, 10, 2)
myRDD.take(10)
Also note that Phoenix team recommends to use Phoenix Spark instead of jdbc directly: http://phoenix.apache.org/phoenix_spark.html Here an example with PhoenixSpark package: from shell: spark-shell --master yarn-client --jars /usr/hdp/current/phoenix-client/phoenix-client.jar,/usr/hdp/current/phoenix-client/lib/phoenix-spark-4.4.0.2.3.2.0-2950.jar --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar" inside spark-shell: import org.apache.phoenix.spark._
val df = sqlContext.load(
"org.apache.phoenix.spark",
Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181:/hbase-unsecure")
)
df.show
And here a sample project that can be built and executed thru spark-submit: https://github.com/gbraccialli/SparkUtils git clone https://github.com/gbraccialli/SparkUtils
cd SparkUtils/
mvn clean package
spark-submit --class com.github.gbraccialli.spark.PhoenixSparkSample target/SparkUtils-1.0.0-SNAPSHOT.jar Also check @Randy Gelhausen project that use Phoenix Spark to automatic load data from Hive to Phoenix: https://github.com/randerzander/HiveToPhoenix (I copied my pom.xml from Randy's project)
... View more
11-16-2015
12:39 PM
2 Kudos
@Laurence Da Luz Check these links: https://spark.apache.org/docs/latest/tuning.html http://www.slideshare.net/SparkSummit/deep-dive-into-project-tungsten-josh-rosen http://www.slideshare.net/cfregly/advanced-apache-spark-meetup-project-tungsten-nov-12-2015 http://www.slideshare.net/SparkSummit/building-debugging-and-tuning-spark-machine-leaning-pipelinesjoseph-bradley http://www.slideshare.net/SparkSummit/04-huang-duan-1 http://www.slideshare.net/SparkSummit/making-sense-of-spark-performancekay-ousterhout http://www.slideshare.net/SparkSummit/data-storage-tips-for-optimal-spark-performancevida-ha-databricks
... View more