Support Questions
Find answers, ask questions, and share your expertise

Zeppelin + pyspark - Data Frames with same names being Overwritten

Explorer

Hi All,

I have a Hadoop cluster (HDP 2.5.3) with Zeppelin (0.6.2). Ran few notebooks and initially it was fine, but later when more than one user started using the notebooks executing in parallel we hit with a potential issue.

User1:

Start pyspark
Assign a dataframe "df" by loading a json file (file1.json)

User2:
Start pyspark
Assign a dataframe "df" by loading a json file (file2.json)


User1:
Run the df.show(). This would populate data loaded file2.json (User2)

Can you please confirm, Zeppelin 0.6.2 does not support multi-tenancy? If not supported, is there an work around to get it fixed? Please advise.

Thanks,

Amalan Jagan

2 REPLIES 2

Re: Zeppelin + pyspark - Data Frames with same names being Overwritten

Expert Contributor

Amalan Jagan

Zeppelin does support multi tenancy, you can tune the scope of your interpreter session. This of course eats up extra resources but also ensures data integrity and security.

Checkout this documentation. You likely want to change the interpreter settings.

Re: Zeppelin + pyspark - Data Frames with same names being Overwritten

Explorer

Thanks for your response Matt. But I believe the interpreter initialisation option (per user/per note) is NOT available in 0.6.2.

Regards,
Amalan Jagan