Support Questions
Find answers, ask questions, and share your expertise
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

How to get datatype for specific field name from schema attribute of pyspark dataframe (from parquet files)?

Expert Contributor

Have a folder of parquet files that I am reading into a pyspark session. How can I inspect / parse the individual schema field types and other info (eg. for the purpose of comparing schemas between dataframes to see exact type differences)?

I can see the parquet schema and specific field names with something like...

from pyspark.sql import SparkSession
from pyspark.sql.functions import *sparkSession = SparkSession.builder.appName("data_debugging").getOrCreate()df ="header", "true").parquet("hdfs://")
df.schema # or df.printSchema()

So I can see the schema


but not sure how to get the values for specific fields, eg. something like...

#or df.schema.getData("SOME_FIELD_001") #type: dict

Does anyone know how to do something like this?