Member since
09-25-2015
24
Posts
9
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1489 | 12-14-2015 05:59 PM | |
1651 | 12-13-2015 05:28 PM | |
3452 | 12-12-2015 10:04 PM |
12-14-2015
05:43 PM
I recommend launching the HDP 2.3 Sandbox directly on Azure as mentioned in the blog. You'll get a Centos VM with HDP services running on it. It is well tested and supported.
... View more
12-14-2015
05:31 PM
@Raghavendran Chellappa Tableau or any other BI tool for that matter can't connect directly to Spark Streaming. Spark Streaming only processes the data -- you still need to persist it in HDFS or somewhere else before Tableau or anything else can connect to it. In case you need to do interactive analysis with a very short SLA, you need a system which can index the data. Pure row scans won't cut it. One example would be to connect Spark Streaming to Solr. Solr will index the data as it is inserted. You can then build a read-only dashboard using Banana, or build a custom app which queries Solr for user-defined queries. So the flow is: Streaming Data -> Spark Streaming -> Solr -> Banana Dashboard (or a custom app if interactivity is desired) Look here for an example of streaming Tweets from Spark into Solr: https://doc.lucidworks.com/lucidworks-hdpsearch/2....
... View more
12-13-2015
05:49 PM
Spark is meant for application development. Tez is a library which is used by tools such as Hive to speed things up. Tez isn't suitable for end-user programming.
... View more
12-13-2015
05:28 PM
2 Kudos
@Cary Walker HDP repo is located on Github. For 2.3.0 dependencies, see here: https://github.com/hortonworks/hadoop-release/blob... You can find the RPM in our public maven repo. Search for "hadoop" here: http://repo.hortonworks.com/index.html
... View more
12-12-2015
10:16 PM
1 Kudo
Apache Phoenix is currently the only way to query HBase using SQL.
... View more
12-12-2015
10:04 PM
In addition to Vectors, you need to import the Spark Vector class explicitly since Scala imports its in-built Vector type by default. Try this: import org.apache.spark.mllib.linalg.{Vector, Vectors}
... View more
12-11-2015
05:18 AM
Which version of Spark and HDP are you using?
... View more
12-04-2015
06:19 PM
@bsaini Iterative computations are best in Spark for large data sets, not for CPU bound processes which use a small data set repeatedly.
... View more
12-04-2015
06:16 PM
1 Kudo
@Peter Coates
why do you need Spark if the data is very small and can fit on a single node? There are other excellent Monte Carlo simulation packages which can do this efficiently -- open source or otherwise. Even Excel has an add-in for this. edit: If you need more horsepower for Monte Carlo simulations which one node can't provide, you can look at MPI. Mpich is pretty good: https://www.mpich.org/ There's even a Yarn adapter for Mpich: https://github.com/alibaba/mpich2-yarn
... View more
12-04-2015
06:06 PM
As @Ali Bajwa wrote above, use the Zeppelin Service to install Zeppelin on HDP.
... View more
- « Previous
-
- 1
- 2
- Next »