Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Do the Spark REPLs have a way to list current variables?

avatar

I'm specifically using pyspark and I'm wondering if there is something similar to Pig's "aliases" command that shows all currently available variables. If there is something like that in pyspark, I'm just missing it and I hope someone straightens me out! 😉 I'm not using spark-shell much, but knowing how to do this in that REPL would be useful, too.

1 ACCEPTED SOLUTION

avatar
New Member

pyspark:

  • dir(): the list of in scope variables
  • globals(): the dictionary of global variables
  • locals(): the dictionary of local variables

spark-shell:

  • $intp.allDefinedNames: the list of all defined variables

View solution in original post

4 REPLIES 4

avatar
New Member

The pyspark shell is just Python too. So using dir() should show all existing python variables (although it also shows all imports and a bunch of things you may not be looking for).

avatar
Super Collaborator

$intp.definedTerms.foreach(println)

That will print all the variables.

avatar
New Member

pyspark:

  • dir(): the list of in scope variables
  • globals(): the dictionary of global variables
  • locals(): the dictionary of local variables

spark-shell:

  • $intp.allDefinedNames: the list of all defined variables

avatar
New Member

If you keep using `globals()` you will eventually get an error, as it keeps adding itself to itself and you eventually get one of the following errors:

RuntimeError: maximum recursion depth exceeded while getting the repr of a list

or

RuntimeError: dictionary changed size during iteration