08-05-2015 10:26 AM
I am trying to connect Microsofts PowerBI desktop to HDFS, but I am unable to do so. I keep getting the error 400, I also try to connect using web instead of HDFS but get the same error.
By default the HDFS data sorces set the port to 50070, the secure port is 50470 currently looking for a way to change that. I also have updated my JAAS settings to forward a kerberos ticket, but didnt have an effect.
Has anyone tried this or made this to work on a secure cluster? Any ideas greatly appricaiated.
08-13-2015 07:30 PM
08-14-2015 07:07 AM
Ok I figured out the issue, the entire URL is needed for the full path of the files in HDFS
Port 8020 is for back end hdfs communication you cannot connect to that port directly.
The URL to connect ot HDFS is listed below, please keep in mind these are the default ports and can be changed.
However my issue is when I go to create the connection from PowerBI I only have one option which is to input the server name no description or help of what goes in this field, or namenode where HDFS web is running. After inputting this info i get the below error.
DataSource.Error: HDFS cannot connect to server 'namenode01.test.com'. Unable to connect to the remote server.
Instead of putting only the server name I placed "http://<namenode>:50470/webhdfs/v1/<directory>" in the server field replacing namenode with the name of the server and the HDFS path of where I want to get the data.
The issue now is PowerBI does not support Parquet or sequence file format, /cry, only text or open formats currently seem to work which is not unexpected.
07-20-2016 03:54 AM
Hi Rusty M,
I am trying to do the same thing as you did. I am able to see all the files and directories in my HDFS when I connect Power BI to HDFS. But I cannot actually pull the data from those files. Power BI sees these files as binary files and for the queries only imports parameters like data executed, folder path etc and DOES NOT seem to import the data in the files. I have tried to import an xlsx file and a text file into Power BI which were stored in HDFS.
Can you please guide me on how to import a excel xlsx file stored into the HDFS and import it into Power BI?
Please do see the attached image below. To connect to the HDFS from Power BI, I am using the IP address of the Cloudera VM (I assume this is the correct way to connect to HDFS from Power BI). Here is the complete URL for connecting Power BI ti HDFS.
http://<ip address of the VM>:50070/webhdfs/v1/user/cloudera/output_join11
http://<ip address of the VM>:50070/webhdfs/v1/user/cloudera/words.txt
Is it some kind of a driver issue or is it something that I am missing. Please answer considering the fact that I am just a beginner trying to learn cloudera and Visualization tools.
Thanks in advance.
07-20-2016 04:30 AM - edited 07-20-2016 04:31 AM
Hi Rusty M,
Please read this post after my last post as a continuation of the last one.
I saved a csv file in HDFS and tried to import it into Power BI. When I do that, Power BI considers it as a Binary file and when I try to open it, Power BI pops up an error message as shown below.
10-22-2016 04:34 PM - edited 10-22-2016 04:35 PM
Hello you have the solution? I have the same problem and I need to solve it.
I'd appreciate it very much.
11-02-2016 04:57 AM
Yes, I made it work.
Please use this as server name:
http://(ip address of your cloudera VM):50070/webhdfs/v1/user/cloudera/(the name of the directory which you want to import)
11-28-2016 10:04 AM
I'm getting a timeout error with PowerBI, when I use curl from my Linux desktop all works fine.
I'm just using the 'free' PowerBI download, anyone know if this version is fully functional?
Any chance anyone's been able to read Parquet formatted data into PowerBI (our bigger question).