Support Questions
Find answers, ask questions, and share your expertise

Unable to see all output in pyspark paragraphs

Highlighted

Unable to see all output in pyspark paragraphs

Rising Star

Running Zeppelin 0.7.0.2.6.1.0-129 under HDP 2.6.1, we are unable to see all REPL output in pyspark paragraphs. Only the last output is typically passed to the interpreter output.

%pyspark

a=1
b=2
a
b
a
1
%pyspark

1+1
sc.version
u'1.6.3'

According to the Apache Zeppelin JIRA and release notes, this was addressed in 0.7.0 under ZEPPELIN-1197 , but it appears to have reverted to the original behavior (or not been fixed) in the 0.7.0 version released in HDP 2.6.1.

Can anyone confirm the same behavior? Does anyone know whether this is patchable in 2.6.1 and whether the same behavior persists in 2.6.2 or later?

For reference, the intended output here would be more like this:

%pyspark

1+1
sc.version
2
u'1.6.3'

I have been able to replicate this behavior in the %pyspark and %livy.pyspark interpreters in Zeppelin 0.7.0.2.6.1.0-129 (HDP Sandbox) and Zeppelin 0.7.0.2.6.1.0-129 (HDP 2.6.1).

1 REPLY 1

Re: Unable to see all output in pyspark paragraphs

Rising Star

Per discussions with Hortonworks support, and after reading more detailed notes on the pull request associated with ZEPPELIN-1197, it looks like this was only addressed to the extent that the last item in the stdout will be automatically formatted and presented in the Zeppelin output window. Explicitly printing to the console appears to be the workaround for now.

I'd love to see this addressed with a switch at the interpreter level to allow for all stdout to be dumped to the Zeppelin output window. This would allow us to define an alternate copy of the Livy interpreter, for example, for use when debugging or developing, that could be swapped out easily for a more "runtime" version of the notebook or paragraph.

Don't have an account?