Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2984 | 01-26-2018 04:02 AM | |
6276 | 12-22-2017 09:18 AM | |
3016 | 12-05-2017 06:13 AM | |
3279 | 10-16-2017 07:55 AM | |
9297 | 10-04-2017 08:08 PM |
07-21-2017
01:35 PM
(Spark 2.2 at this point?) The services you mention are really the history server services only. You can run both, for Spark 1 and Spark 2 parallel installations, yes. You probably want both gateway roles on all nodes that you intend to run both types of Spark jobs from, yeah, but that's up to you. I don't think there's anything else to know about managing them. The only thing to manage is the history server and it's a simple creature that CM manages. All the scripts are differently named (spark2-submit vs spark-submit) so there should be no conflict.
... View more
07-21-2017
01:33 PM
It's as integrated and supported as anything else. If you mean replacing Spark 1, no that's not possible as it would break any apps using Spark 1. If you mean you just don't want a separate add-on, I get it, but, it's just a delivery mechanism. C6 would not have both.
... View more
07-21-2017
05:50 AM
You can execute SQL statements in Pyspark. Same metastore, same data as you are accessing from Hive or Impala. I think one of the premises of the workbench is: edit code, not notebooks. Because that makes it much more realistic to create code that's then used in 'production'. The translation step is an obstacle. I personally think you should use your IDE to do non-interactive software development, and use the workbench for the interactive parts, all within one project. This was my take on it, for Scala: https://github.com/srowen/cdsw-simple-serving I think Jupyter is harder to fit into this vision because it's operating in terms of notebooks, not code at heart. Zeppelin, less so. So I think we're actually aligned and think the workbench is trying to do what you want. But that's the answer we provide. You can use Zeppelin but you're on your own, and if it's a little tricky, well yeah that's part of the point.
... View more
07-21-2017
02:34 AM
HUE is the supported and recommended tool for SQL (Impala, Hive). The HUE notebook is not supported. The workbench is the supported and recommended tool for Spark, Python, R, and Scala. Kerberos and security works. Zeppelin, Jupyter are not supported and it's safe to say there are no plans to do so. What features are you looking for? HUE + workbench should cover everything you mention. I don't know of a difference with Zeppelin in this respect. What's a blue elephant guy?
... View more
06-27-2017
05:04 AM
(Please start a new thread) Yes, all scores are cumluative and added across all input. I am not sure what your use case, but what you're suggesting is how it works: submitting (user, item, 1) adds 1 to the total strength of interaction.
... View more
06-19-2017
07:47 AM
Spark deals with arbitrary data, so its notion of partitions is not related to data that contains a key. However it's almost surely true that one key-based partition of data in, say, Parquet will map to one (or more) partitions of data in a DataFrame that just has data with that key.
... View more
06-19-2017
07:36 AM
If you mean partition in the sense of Parquet/Avro partitions by some key, that should be possible to preserve this way. In the general case of things like text files, a file is a partition already.
... View more
06-19-2017
06:55 AM
It should be pretty trivial to read the data in format X using Spark into a DataFrame or Dataset, then repartition it to a smaller number of partitions, and write it in format X using Spark. The round-trip ought not change the data, but worth verifying. It should however always result in fewer and therefore larger files.
... View more
06-12-2017
05:39 AM
Will it affect recommendations? yes. However, it doesn't affect the results much until you turn the LSH value down a lot. For example, you may find that 0.1 still yields good recommendations. With 1.7M items however, you should find it's already pretty fast. See http://oryx.io/docs/performance.html . Even with 250 latent features, at LSH=0.3, you could probably serve ~100qps on one modern server with latency ~15ms.
... View more
05-22-2017
02:13 AM
So from other sources, I see notes about the incompatibility. It sounds like the 0.10.2 release was fixed to be compatible across maintenance releases? so if the project used 0.10.2, I think that would work for you and all 0.10.x brokers?
... View more