Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Do the Spark REPLs have a way to list current variables?

avatar

I'm specifically using pyspark and I'm wondering if there is something similar to Pig's "aliases" command that shows all currently available variables. If there is something like that in pyspark, I'm just missing it and I hope someone straightens me out! 😉 I'm not using spark-shell much, but knowing how to do this in that REPL would be useful, too.

1 ACCEPTED SOLUTION

avatar
New Contributor

pyspark:

  • dir(): the list of in scope variables
  • globals(): the dictionary of global variables
  • locals(): the dictionary of local variables

spark-shell:

  • $intp.allDefinedNames: the list of all defined variables

View solution in original post

4 REPLIES 4

avatar
Explorer

The pyspark shell is just Python too. So using dir() should show all existing python variables (although it also shows all imports and a bunch of things you may not be looking for).

avatar
Super Collaborator

$intp.definedTerms.foreach(println)

That will print all the variables.

avatar
New Contributor

pyspark:

  • dir(): the list of in scope variables
  • globals(): the dictionary of global variables
  • locals(): the dictionary of local variables

spark-shell:

  • $intp.allDefinedNames: the list of all defined variables

avatar
New Contributor

If you keep using `globals()` you will eventually get an error, as it keeps adding itself to itself and you eventually get one of the following errors:

RuntimeError: maximum recursion depth exceeded while getting the repr of a list

or

RuntimeError: dictionary changed size during iteration