Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How does zeppelin storage query result?

Solved Go to solution
Highlighted

How does zeppelin storage query result?

Rising Star

I am working on designing a hdfs query system based on spark, which containing a paging function, and zeppelin seems be a good sample for me.

Now I have a problem. I see spark or spark sql query results are existed even I refresh or reopen the notebook. So the results must be saved on some place.

So I am wondering where these result data is saved on? If the data is saved on database, what if the result data size is pretty huge so that causing the database performance problem?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: How does zeppelin storage query result?

@Junfeng Chen,

Yes. Zeppelin notebook results are stored in JSON format HDFS (from HDP 2.6) and on native filesystem prior to this version.

It is stored in HDFS , so it will not be a problem even if the size is huge. You can check the output here

Native FS path : /usr/hdp/current/zeppelin-server/notebook/{notebook-id}/note.json
HDFS path:  /user/zeppelin/notebook/{notebook-id}/note.json

You can check for results key in the note.json.

.

If this helps , please take a moment to login and "Accept" the answer

View solution in original post

5 REPLIES 5
Highlighted

Re: How does zeppelin storage query result?

@Junfeng Chen,

Yes. Zeppelin notebook results are stored in JSON format HDFS (from HDP 2.6) and on native filesystem prior to this version.

It is stored in HDFS , so it will not be a problem even if the size is huge. You can check the output here

Native FS path : /usr/hdp/current/zeppelin-server/notebook/{notebook-id}/note.json
HDFS path:  /user/zeppelin/notebook/{notebook-id}/note.json

You can check for results key in the note.json.

.

If this helps , please take a moment to login and "Accept" the answer

View solution in original post

Re: How does zeppelin storage query result?

Rising Star

@Aditya Sirna Thanks Aditya

So what about paging? Since the whole results are saved on hdfs in JSON format, if I need to load part of whole result, just load the whole json file and cut out part of it by given page size and page number in memory ? In practice for zeppelin, will it have out of memory problem if the size is too huge?

Highlighted

Re: How does zeppelin storage query result?

@Junfeng Chen,

There will be interpreter level properties. For ex: spark has (zeppelin.spark.maxResult) whose default value is 1000. So even if there are more than 1000 rows it will just fetch 1000 rows. If you need more rows, you can increase the limit.

You may need to tweak ( zeppelin.interpreter.output.limit, zeppelin.websocket.max.text.message.size, ZEPPELIN_MEM, ZEPPELIN_INTP_MEM ) these properties according to your output size. Refer this link for more info on all the properties

https://zeppelin.apache.org/docs/0.7.2/install/configuration.html

Highlighted

Re: How does zeppelin storage query result?

Rising Star
@Aditya Sirna

So in default , there are up to 1000 lines of results stored on hdfs for each query?

If I increase the limit, will it have some negative effects? Such as slow http transferring? Or result receiving failed?

Highlighted

Re: How does zeppelin storage query result?

1000 is for spark. You can set common.max_count at a global level. You should not have negative results if you increase the limit. But if your data size if very huge then you may need to tweak the above mentioned params accordingly.

Don't have an account?
Coming from Hortonworks? Activate your account here