Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Master Guru

Working with airbnb's Superset

This is a very cool open source Analytics platform based on some cool Python.

I installed this on a CentOS 7 edge node.

sudo yum upgrade python-setuptools
sudo yum install gcc libffi-devel python-devel python-pip
python-wheel openssl-devel libsasl2-devel openldap-devel
pip install virtualenv
virtualenv venv. ./venv/bin/activate
pip install --upgrade setuptools pip
pip install mysqlclient 
pip install pyhive
pip install superset
fabmanager create-admin --app superset
2017-01-27 18:15:37,864:INFO:flask_appbuilder.security.sqla.manager:Created Permission
View: menu access on Query Search2017-01-27
18:15:37,885:INFO:flask_appbuilder.security.sqla.manager:Added Permission menu
access on Query Search to role AdminRecognized Database Authentications.2017-01-27
18:15:37,907:INFO:flask_appbuilder.security.sqla.manager:Added user admin
Admin User admin created.
superset db upgrade
superset load_examples
superset init
superset runserver -p 8088

The main things you will need are Python

Browse to http://yourservername:8088/ and start running querys, building charts and reports. It does a lot of things that commercial reporting tools do, but fully open source. Superset + Zeppelin + CLI + ODBC + JDBC give me all the access to my Hadoop, Druid, SparkSQL and MariaDB data that I need.

11855-superset-login.png

This is admin with the password you set in the fabmanager create admin.

11856-superset-listtables.png

Browsing tables is easy in the web based platform.

11858-superset-queryeditor.png

The results of running a query which shows the intellisense that suggests table names for you.

11857-superset-fancyreport.png

This was a built-in example report that shows you how powerful and professional reports you can build with this tool.

The SQL Lab is a great place to try out queries and examine data.

11859-superset-sqllab-results.png

SQL Lab lets you run queries and explore the data. You get quick access to your previous queries and run status.

11860-superset-sqllab.png

This is a simple report that was autogenerated for me by picking a query on one table.

11861-superset-examplechart.png

This is your home page that will show you dashboards you have built and recent activity. A very nice github style interface.

11862-superset1.png

Reference:

24,724 Views
Comments
avatar
New Contributor

Hi Timothy,

I am very interested to use airbnb's superset as the front visualization tool to connect with hdp (hive). I found your post and mentioned "Superset + Zeppelin + CLI + ODBC + JDBC give me all the access to my Hadoop, Druid, SparkSQL and MariaDB data that I need.".

Could you tell us how you create the superset data source with hdp's hive? The superset asks for the SQLAlchemy URI which I did not know what is for the hive database; I used the hdp's sandbox (2.5 version), it will be great to see the hdp's hive data from the superset fabulas UI.

Thanks,

Steve

avatar
Contributor

Hi,

How do we connect to Hive using presto://? can you please help me out in changing the DB from SQLite to Hive/MySql?

Thanks,

Ram

avatar
Expert Contributor

@Timothy Spann

I see in the Superset github page that it is tested only on python 2.7 but i am using hdp 2.5.3 which installed python 2.6 and looks like i cannot upgrade to 2.7 because hortonworks doesnt support it.... does it work good with python 2.6?

avatar
Master Guru

That may run, best running it on a separate unrelated node. It doesn't need to be on the Hadoop cluster.

For HDP 2.6 and later, it can be installed via Ambari as part of the cluster and will work fine with Druid.

I was just running it standalone for testing.

Best and easiest course of action is to upgrade to HDP 2.6 and install druid and superset through that.

avatar
Expert Contributor

@Timothy Spann Thanks a lot for the reply. I will try it but I am having hard time finding the dependencies like python-wheel is see only for python 2.7... does it still work? Is there any other way to install superset like from a tar ball?

avatar
Master Guru

Superset is a really hard install. Best to install through ambari. I installed it once and it was painful and didn't use it after that.

I am hoping that https://superset.incubator.apache.org/will improve the install process.

Try this

https://superset.incubator.apache.org/installation.html#getting-started

avatar
Expert Contributor
@Timothy Spann

Yes it seems to be soooo hard to find dependencies... But i am trying to do pip install Mardown with my python 2.6 but it is throwin an error: "AttributeError: iter" and when i looked up online it says markdown is compatible only for python version 2.7 or higher... i cannot upgrade python to 2.7 because i am trying to install on hdp edge node which uses 2.6 and i cant really upgrade the hdp cluster to 2.6 because it is in production i want to use and we dont intend to upgrade until some time for sure.

avatar
Master Guru

Install this on a new machine that has Python 2.7 on it. Don't install on an existing HDP node.

Install all the PiP and items needed.

Don't use an HDP edge node. just a generic blank server. As long as it has network access to Hive and Druid it should be good for queries.

You could even install it on your laptop and connect from there to your server.

avatar
Expert Contributor
@Timothy Spann

Thank you so much for your reply. I followed your suggestion and installed on my desktop VM. I started the runserver too and i see this:
superset runserver Starting server with command: gunicorn -w 2 --timeout 60 -b 0.0.0.0:8088 --limit-request-line 0 --limit-request-field_size 0 superset:app [2017-08-23 14:47:08 +0000] [50614] [INFO] Starting gunicorn 19.7.1 [2017-08-23 14:47:08 +0000] [50614] [INFO] Listening at: http://0.0.0.0:8088 (50614) [2017-08-23 14:47:08 +0000] [50614] [INFO] Using worker: sync [2017-08-23 14:47:08 +0000] [50619] [INFO] Booting worker with pid: 50619 [2017-08-23 14:47:08 +0000] [50621] [INFO] Booting worker with pid: 50621

Nothing happens or shows up after the above, i then go to my chrome and type in the vm ip with 8088 port but it says the page cannot be displayed .. httpd is running too..

avatar
Contributor

Thanks for your information.

I think

virtualenv venv. ./venv/bin/activate

should be

virtualenv venv

. ./venv/bin/activate