Code Repositories

Find and share code repositories
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.
Repo Description

Demo scenario: Every year approximately 20% of airline flights are delayed or cancelled, resulting in significant costs to both travelers and airlines. As our example use-case, we will build a supervised learning model that predicts airline delay from historical flight data and weather information. Currently there are 3 versions of this demo available: one with Python/Scikit-learn, one with Spark/Scala and one with R/Scalding.

Author: Ofer Mendelevitch

Repo Info
Github Repo URL https://github.com/abajwa-hw/hdp-datascience-demo
Github account name abajwa-hw
Repo name hdp-datascience-demo
2,398 Views
Comments

Hi Ali, Thanks for a great demo. I'm trying to set it up on HDP-2.3.2 sandbox, couldn't download the 2.2 image, it's more than 8G and Dropbox download didn't work (tried 3 times). So, trying to install and setup all software and had no issues with step1_runasroot.sh, but step2_runasdemo.sh fails on line 128, install pydoop:

cp -f $PROJECT_DIR/setup/hadoop_utils_22.py $HOME_DIR/pydoop/pydoop/hadoop_utils.py

PROJECT_DIR is referenced on this line for the first time, it was not created and cannot copy from it. Where can I find hadoop_utils_22.py?

Thanks @Predrag Minovic..I haven't tried this on 2.3 myself yet. $PROJECT_DIR refers to the location where the git was cloned to...it is defined at the start of the script here. The file is available on the same github repo here and should have been downloaded when you did the git clone

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎09-16-2022 07:47 AM
Updated by:
Contributors