Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zeppelin + pyspark - Data Frames with same names being Overwritten

Highlighted

Zeppelin + pyspark - Data Frames with same names being Overwritten

Explorer

Hi All,

I have a Hadoop cluster (HDP 2.5.3) with Zeppelin (0.6.2). Ran few notebooks and initially it was fine, but later when more than one user started using the notebooks executing in parallel we hit with a potential issue.

User1:

Start pyspark
Assign a dataframe "df" by loading a json file (file1.json)

User2:
Start pyspark
Assign a dataframe "df" by loading a json file (file2.json)


User1:
Run the df.show(). This would populate data loaded file2.json (User2)

Can you please confirm, Zeppelin 0.6.2 does not support multi-tenancy? If not supported, is there an work around to get it fixed? Please advise.

Thanks,

Amalan Jagan

2 REPLIES 2
Highlighted

Re: Zeppelin + pyspark - Data Frames with same names being Overwritten

Expert Contributor

Amalan Jagan

Zeppelin does support multi tenancy, you can tune the scope of your interpreter session. This of course eats up extra resources but also ensures data integrity and security.

Checkout this documentation. You likely want to change the interpreter settings.

Highlighted

Re: Zeppelin + pyspark - Data Frames with same names being Overwritten

Explorer

Thanks for your response Matt. But I believe the interpreter initialisation option (per user/per note) is NOT available in 0.6.2.

Regards,
Amalan Jagan

Don't have an account?
Coming from Hortonworks? Activate your account here