Member since
04-19-2020
9
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2973 | 07-22-2020 06:17 PM |
09-28-2020
01:56 PM
I managed to integrate Airflow with redis into Cloudera Manager. To run custom DAG, they need to be uploaded to the airflow dag folder on the node, where airflow scheduler and workers are dunning
... View more
07-22-2020
06:17 PM
Problem solved: the issue was related to topology.py which used python as a default interpreter which despite all env vars that are pointing to python3 was still resolved to python 2 so ended up overriding topology with path to python3
... View more
07-19-2020
09:25 PM
We are doing spark-submit from airflow (added it as a custom parcel into CDP 7.1) Airflow is built with python 3 however default python version on CDP is python2. As a result during spark-submit getting this issue: WARN net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py 10.228.86.42
ExitCodeException exitCode=1: File "/opt/cloudera/parcels/Airflow-1.10.10-python3.7.7_1.2.3/lib/python3.7/site.py", line 177
file=sys.stderr)
^
SyntaxError: invalid syntax Added PYSPARK_DRIVER_PYTHON and PYSPARK_PYTHON to spark-defaults as well as spark-env.sh pointing to python3. Also added spark.yarn.appMasterEnv.PYTHONHASHSEED = 0 however the problem remains. As soon as python version is being changed to python3 on the workers (basically the only available python becomes python 3) spark-submit starts working. I was wondering if there is something I am missing. Thanks
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
04-27-2020
08:42 PM
It is a bit of a late reply - haven't tried it myself yet but looks promising - https://blog.clairvoyantsoft.com/apache-airflow-csd-ac5b145d5e2d
... View more
04-20-2020
01:37 PM
Hi @StevenOD, I might have misunderstood the hosting detail of Management Console. As far as I understood it is going to be hosted at <https://console.cdp.cloudera.com> which is multi-tenant, Cloudera managed cloud resource that we don't have much control or visibility over. You are correct about my PoC being in public cloud however all the resources are in VPC, which is covered by the company policies in which case more control and visibility are available. Please correct me if I am wrong on that. My other question still stands regarding the alternatives for creating CDP in AWS environment apart from provisioning using Public Cloudera managed cloud. Thank you very much
... View more
04-19-2020
08:59 PM
Hi @StevenOD , I have similar question to @muslihuddin . I am trying to do a quick PoC with spinning up cloudera CDP Environment in AWS following this doc: https://community.cloudera.com/t5/Community-Articles/How-to-create-a-CDP-environment-in-AWS-with-minimal/ta-p/282916 however since Management Console is only in public cloud, which is not an option for my organisation, I am wondering if there is any other option available for trialing running CDP in AWS? Thank you
... View more