Member since
08-01-2018
5
Posts
0
Kudos Received
0
Solutions
10-17-2018
09:01 AM
I have some questions regarding the profiler. How can the output of the PROFILER_GET function be used by other distributed processing applications? We would like to visualize features computed by the profiler and aggregated features computed by our models. Therefore, we need the output of the profiler as input for Pyspark in which we implemented our models. An other option would be to implement our pipeline completely in Pyspark, but then we don't have access to the already implemented windowing features of the profiler and since the profiler is a build-in functionality for Metron, we are not sure about the downsides about not using the profiler at all. What would be possible downsides if we implemented our pipeline completely in Pyspark? And even without Pyspark, how is it possible to visualize the profiler data and baselines in Zeppelin? We have more questions about the profiler, these are our critical ones regarding the profiler.
... View more
08-01-2018
02:54 PM
Thank you for your answer. We tried it. It worked using spark-submit on the command line, but it did not work in Zeppelin (in which we configured the config flags for Spark).
What we then tried, was setting the Spark submit options in Ambari (with no luck):
export
SPARK_SUBMIT_OPTIONS="--conf spark.pyspark.virtualenv.enabled=true --conf
spark.pyspark.virtualenv.bin.path=/usr/local/bin/virtualenv --conf
spark.pyspark.virtualenv.python_version=3.6 --conf spark.pyspark.virtualenv.requirements=/home/zeppelin/requirements.txt"
... View more
08-01-2018
10:17 AM
Currently, we have one big Data Science Python environment and we would like to be able to create multiple environments for different tasks. We build the environment on one machine on the cluster and then distribute it manually to all other nodes. Is there an easy way to ship one environment to all the nodes in a cluster? How can we activate a different environment per notebook without having to create a different interpreter (Spark / Livy) for each environment?
... View more
Labels:
- Labels:
-
Apache Spark