Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

what setting controls the max # of rows can be exported from Zeppelin as CSV

Highlighted

what setting controls the max # of rows can be exported from Zeppelin as CSV

 
2 REPLIES 2

Re: what setting controls the max # of rows can be exported from Zeppelin as CSV

Cloudera Employee

Hello @Raju

The thing to know when handling dataframes in Zeppelin is that a resultset is imported into the notebook when queried. That table is also read into memory when a particular notebook is accessed.

To prevent a query for bringing back too much data and crashing the server (this can happen very quickly), each interpreter set a limit to the number of rows that are brought back to zeppelin ( default for most interpreters is ~1000 ).

=> This does not mean that queries of more than a thousand rows will not be executed, just that only the first 1000 rows are actually shown in Zeppelin.

You can adjust the number of rows interperter by interpreter, look for "maxResult" properties
Go to the interpreter configuration page (upper right hand corner)

40618-screen-shot-2017-10-02-at-171455.png

Ex : for SPARK

zeppelin.spark.maxResult 

or for Livy

zeppelin.livy.spark.sql.maxResult

For the JDBC interperter ( there's always an exception to the rule )

 common.max_count

When using Zeppelin's dataframe export to CSV feature you simply exports what has been pushed back to zeppelin. If the the max number of rows is set to a thousand, then you'll never have more than a thousand rows in your csv

=> The actual number of rows in your result set may be larger, it simply hasn't been fully read back into Zeppelin.

This feature is great when working with small resultsets. It can however be deceiving as the results can be arbitrarily truncated when the max number of rows has been reached.

If you're looking to export an entire table or a large subset, you should probably do it programmatically, for example by saving a table to a file system such as HDFS

For Spark (2.0):

dataframe.coalesce(1)
 .write
 .option("header", "true")
 .csv("/path/to/sample_file.csv") 

//Note the coalesce(1)  => will bring all result to a single file otherwise you'll have 1 file per executor 

FYI Notebooks are simply JSON object organized by paragraph. Just open up any notebook to get a sense of the structure

In HDP they are saved in :

/usr/hdp/current/zeppelin-server/notebook/[id of notebook]/note.json
Highlighted

Re: what setting controls the max # of rows can be exported from Zeppelin as CSV

Don't have an account?
Coming from Hortonworks? Activate your account here