Created on 01-30-2017 05:15 AM - edited 09-16-2022 01:38 AM
Working with airbnb's Superset
This is a very cool open source Analytics platform based on some cool Python.
I installed this on a CentOS 7 edge node.
sudo yum upgrade python-setuptools sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel pip install virtualenv virtualenv venv. ./venv/bin/activate pip install --upgrade setuptools pip pip install mysqlclient pip install pyhive pip install superset fabmanager create-admin --app superset 2017-01-27 18:15:37,864:INFO:flask_appbuilder.security.sqla.manager:Created Permission View: menu access on Query Search2017-01-27 18:15:37,885:INFO:flask_appbuilder.security.sqla.manager:Added Permission menu access on Query Search to role AdminRecognized Database Authentications.2017-01-27 18:15:37,907:INFO:flask_appbuilder.security.sqla.manager:Added user admin Admin User admin created. superset db upgrade superset load_examples superset init superset runserver -p 8088
The main things you will need are Python
Browse to http://yourservername:8088/ and start running querys, building charts and reports. It does a lot of things that commercial reporting tools do, but fully open source. Superset + Zeppelin + CLI + ODBC + JDBC give me all the access to my Hadoop, Druid, SparkSQL and MariaDB data that I need.
This is admin with the password you set in the fabmanager create admin.
Browsing tables is easy in the web based platform.
The results of running a query which shows the intellisense that suggests table names for you.
This was a built-in example report that shows you how powerful and professional reports you can build with this tool.
The SQL Lab is a great place to try out queries and examine data.
SQL Lab lets you run queries and explore the data. You get quick access to your previous queries and run status.
This is a simple report that was autogenerated for me by picking a query on one table.
This is your home page that will show you dashboards you have built and recent activity. A very nice github style interface.
Reference:
Created on 01-30-2017 09:50 PM
Hi Timothy,
I am very interested to use airbnb's superset as the front visualization tool to connect with hdp (hive). I found your post and mentioned "Superset + Zeppelin + CLI + ODBC + JDBC give me all the access to my Hadoop, Druid, SparkSQL and MariaDB data that I need.".
Could you tell us how you create the superset data source with hdp's hive? The superset asks for the SQLAlchemy URI which I did not know what is for the hive database; I used the hdp's sandbox (2.5 version), it will be great to see the hdp's hive data from the superset fabulas UI.
Thanks,
Steve
Created on 05-04-2017 09:50 AM
Hi,
How do we connect to Hive using presto://? can you please help me out in changing the DB from SQLite to Hive/MySql?
Thanks,
Ram
Created on 08-22-2017 06:31 PM
I see in the Superset github page that it is tested only on python 2.7 but i am using hdp 2.5.3 which installed python 2.6 and looks like i cannot upgrade to 2.7 because hortonworks doesnt support it.... does it work good with python 2.6?
Created on 08-22-2017 06:53 PM
That may run, best running it on a separate unrelated node. It doesn't need to be on the Hadoop cluster.
For HDP 2.6 and later, it can be installed via Ambari as part of the cluster and will work fine with Druid.
I was just running it standalone for testing.
Best and easiest course of action is to upgrade to HDP 2.6 and install druid and superset through that.
Created on 08-22-2017 07:08 PM
@Timothy Spann Thanks a lot for the reply. I will try it but I am having hard time finding the dependencies like python-wheel is see only for python 2.7... does it still work? Is there any other way to install superset like from a tar ball?
Created on 08-22-2017 08:16 PM
Superset is a really hard install. Best to install through ambari. I installed it once and it was painful and didn't use it after that.
I am hoping that https://superset.incubator.apache.org/will improve the install process.
Try this
https://superset.incubator.apache.org/installation.html#getting-started
Created on 08-23-2017 04:24 PM
Yes it seems to be soooo hard to find dependencies... But i am trying to do pip install Mardown with my python 2.6 but it is throwin an error: "AttributeError: iter" and when i looked up online it says markdown is compatible only for python version 2.7 or higher... i cannot upgrade python to 2.7 because i am trying to install on hdp edge node which uses 2.6 and i cant really upgrade the hdp cluster to 2.6 because it is in production i want to use and we dont intend to upgrade until some time for sure.
Created on 08-23-2017 05:31 PM
Install this on a new machine that has Python 2.7 on it. Don't install on an existing HDP node.
Install all the PiP and items needed.
Don't use an HDP edge node. just a generic blank server. As long as it has network access to Hive and Druid it should be good for queries.
You could even install it on your laptop and connect from there to your server.
Created on 08-23-2017 09:56 PM
Thank you so much for your reply. I followed your suggestion and installed on my desktop VM. I started the runserver too and i see this:
superset runserver
Starting server with command:
gunicorn -w 2 --timeout 60 -b 0.0.0.0:8088 --limit-request-line 0 --limit-request-field_size 0 superset:app
[2017-08-23 14:47:08 +0000] [50614] [INFO] Starting gunicorn 19.7.1
[2017-08-23 14:47:08 +0000] [50614] [INFO] Listening at: http://0.0.0.0:8088 (50614)
[2017-08-23 14:47:08 +0000] [50614] [INFO] Using worker: sync
[2017-08-23 14:47:08 +0000] [50619] [INFO] Booting worker with pid: 50619
[2017-08-23 14:47:08 +0000] [50621] [INFO] Booting worker with pid: 50621
Nothing happens or shows up after the above, i then go to my chrome and type in the vm ip with 8088 port but it says the page cannot be displayed .. httpd is running too..
Created on 10-26-2017 06:47 AM
Thanks for your information.
I think
virtualenv venv. ./venv/bin/activate
should be
virtualenv venv . ./venv/bin/activate