Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Connect to HDFS from Microsoft PowerBI with Kerberos

avatar
Explorer

I am trying to connect Microsofts PowerBI desktop to HDFS, but I am unable to do so. I keep getting the error 400, I also try to connect using web instead of HDFS but get the same error. 

 

By default the HDFS data sorces set the port to 50070, the secure port is 50470 currently looking for a way to change that. I also have updated my JAAS settings to forward a kerberos ticket, but didnt have an effect. 

 

Has anyone tried this or made this to work on a secure cluster? Any ideas greatly appricaiated. 

 

Thanks,

1 ACCEPTED SOLUTION

avatar
Explorer

Ok I figured out the issue, the entire URL is needed for the full path of the files in HDFS

 

Port 8020 is for back end hdfs communication you cannot connect to that port directly.

 

The URL to connect ot HDFS is listed below, please keep in mind these are the default ports and can be changed.

 

  • Non-Secure: http://<namenode>:50070/webhdfs/v1/<directory>
  • Secure: http://<namenode>:50470/webhdfs/v1/<directory>

 

However my issue is when I go to create the connection from PowerBI I only have one option which is to input the server name no description or help of what goes in this field, or namenode where HDFS web is running. After inputting this info i get the below error.

 

DataSource.Error: HDFS cannot connect to server 'namenode01.test.com'. Unable to connect to the remote server.
Details:
    DataSourceKind=Hdfs
    DataSourcePath=http://namenode01.test.com:50070/webhdfs/v1
    Url=http://namenode01.test.com:50070/webhdfs/v1/

 

Instead of putting only the server name I placed "http://<namenode>:50470/webhdfs/v1/<directory>" in the server field replacing namenode with the name of the server and the HDFS path of where  I want to get the data.

 

The issue now is PowerBI does not support Parquet or sequence file format, /cry, only text or open formats currently seem to work which is not unexpected.

 

Thanks,

Rusty

View solution in original post

11 REPLIES 11

avatar
50070 and 50470 refer to the Namenode's HTTP ports, not for sending RPC calls to browse HDFS.

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_ports_cdh5.html

Have you tried getting the tool to connect to port 8020 on the Namenode host instead of 50070/50470?
Regards,
Gautam Gopalakrishnan

avatar
Explorer

Ok I figured out the issue, the entire URL is needed for the full path of the files in HDFS

 

Port 8020 is for back end hdfs communication you cannot connect to that port directly.

 

The URL to connect ot HDFS is listed below, please keep in mind these are the default ports and can be changed.

 

  • Non-Secure: http://<namenode>:50070/webhdfs/v1/<directory>
  • Secure: http://<namenode>:50470/webhdfs/v1/<directory>

 

However my issue is when I go to create the connection from PowerBI I only have one option which is to input the server name no description or help of what goes in this field, or namenode where HDFS web is running. After inputting this info i get the below error.

 

DataSource.Error: HDFS cannot connect to server 'namenode01.test.com'. Unable to connect to the remote server.
Details:
    DataSourceKind=Hdfs
    DataSourcePath=http://namenode01.test.com:50070/webhdfs/v1
    Url=http://namenode01.test.com:50070/webhdfs/v1/

 

Instead of putting only the server name I placed "http://<namenode>:50470/webhdfs/v1/<directory>" in the server field replacing namenode with the name of the server and the HDFS path of where  I want to get the data.

 

The issue now is PowerBI does not support Parquet or sequence file format, /cry, only text or open formats currently seem to work which is not unexpected.

 

Thanks,

Rusty

avatar
Explorer

Hi Rusty M,

 

I am trying to do the same thing as you did. I am able to see all the files and directories in my HDFS when I connect Power BI to HDFS. But I cannot actually pull the data from those files. Power BI sees these files as binary files and for the queries only imports parameters like data executed, folder path etc and DOES NOT seem to import the data in the files. I have tried to import an xlsx file and a text file into Power BI which were stored in HDFS.

 

Can you please guide me on how to import a excel xlsx file stored into the HDFS and import it into Power BI?

 

Please do see the attached image below. To connect to the HDFS from Power BI, I am using the IP address of the Cloudera VM (I assume this is the correct way to connect to HDFS from Power BI). Here is the complete URL for connecting Power BI ti HDFS.

 

 

http://<ip address of the VM>:50070/webhdfs/v1/user/cloudera/output_join11

 

http://<ip address of the VM>:50070/webhdfs/v1/user/cloudera/words.txt

 

Is it some kind of a driver issue or is it something that I am missing. Please answer considering the fact that I am just a beginner trying to learn cloudera and Visualization tools.

 

Thanks in advance.

Untitled1.png

 

avatar
Explorer

 

Hi Rusty M,

 

Please read this post after my last post as a continuation of the last one.

 

I saved a csv file in HDFS and tried to import it into Power BI. When I do that, Power BI considers it as a Binary file and when I try to open it, Power BI pops up an error message as shown below.

 

Please guide.

 

Untitled2.png

 

 


Untitled3.png

 

avatar
New Contributor

Hello you have the solution? I have the same problem and I need to solve it.

I'd appreciate it very much.

Thank you

avatar
Explorer

Hi,

Yes, I made it work.

 

Please use this as server name:

 

http://(ip address of your cloudera VM):50070/webhdfs/v1/user/cloudera/(the name of the directory which you want to import)

 

example:

 

http://127.0.0.1:50070/webhdfs/v1/user/cloudera/sensordata

avatar
Explorer

Hello,

I'm getting a timeout error with PowerBI, when I use curl from my Linux desktop all works fine.

I'm just using the 'free' PowerBI download, anyone know if this version is fully functional?

Any chance anyone's been able to read Parquet formatted data into PowerBI (our bigger question).

Thanks,

Craig

avatar
New Contributor

Hi Rizi, 

 

unable to open the given url

 

"http://127.0.0.1:50070/webhdfs/v1/user/cloudera/sensordata"

avatar
Contributor

Hey Guys .

same issure facing plz help someone for same.

 

 

 

Thanks

HadoopHelp