Support Questions

Find answers, ask questions, and share your expertise

what setting controls the max # of rows can be exported from Zeppelin as CSV

avatar
 
2 REPLIES 2

avatar
Contributor

Hello @Raju

The thing to know when handling dataframes in Zeppelin is that a resultset is imported into the notebook when queried. That table is also read into memory when a particular notebook is accessed.

To prevent a query for bringing back too much data and crashing the server (this can happen very quickly), each interpreter set a limit to the number of rows that are brought back to zeppelin ( default for most interpreters is ~1000 ).

=> This does not mean that queries of more than a thousand rows will not be executed, just that only the first 1000 rows are actually shown in Zeppelin.

You can adjust the number of rows interperter by interpreter, look for "maxResult" properties
Go to the interpreter configuration page (upper right hand corner)

40618-screen-shot-2017-10-02-at-171455.png

Ex : for SPARK

zeppelin.spark.maxResult 

or for Livy

zeppelin.livy.spark.sql.maxResult

For the JDBC interperter ( there's always an exception to the rule 🙂 )

 common.max_count

When using Zeppelin's dataframe export to CSV feature you simply exports what has been pushed back to zeppelin. If the the max number of rows is set to a thousand, then you'll never have more than a thousand rows in your csv

=> The actual number of rows in your result set may be larger, it simply hasn't been fully read back into Zeppelin.

This feature is great when working with small resultsets. It can however be deceiving as the results can be arbitrarily truncated when the max number of rows has been reached.

If you're looking to export an entire table or a large subset, you should probably do it programmatically, for example by saving a table to a file system such as HDFS

For Spark (2.0):

dataframe.coalesce(1)
 .write
 .option("header", "true")
 .csv("/path/to/sample_file.csv") 

//Note the coalesce(1)  => will bring all result to a single file otherwise you'll have 1 file per executor 

FYI Notebooks are simply JSON object organized by paragraph. Just open up any notebook to get a sense of the structure

In HDP they are saved in :

/usr/hdp/current/zeppelin-server/notebook/[id of notebook]/note.json

avatar