Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

PySpark: ValueError: can not infer schema from empty dataset

Highlighted

PySpark: ValueError: can not infer schema from empty dataset

New Contributor
sc = SparkContext() 

sqlContext = SQLContext(sc) 

try: 

	df = sqlContext.createDataFrame(jsonobj) 
except IOError: 

	logger.exception(jsonobj) 

schema = df.printSchema() 

sc.stop() 

return schema

The above code throws, cannot infer schema on empty dataset for some datasets. What does this error mean?

How do I fix this?

    df = sqlContext.createDataFrame(jsonobj)
  File "/remote/vgrnd104/guntaka/anaconda3/lib/python3.6/site-packages/pyspark/sql/context.py", line 302, in createDataFrame
    return self.sparkSession.createDataFrame(data, schema, samplingRatio, verifySchema)
  File "/remote/vgrnd104/guntaka/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 691, in createDataFrame
    rdd, schema = self._createFromLocal(map(prepare, data), schema)
  File "/remote/vgrnd104/guntaka/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 410, in _createFromLocal
    struct = self._inferSchemaFromList(data, names=schema)
  File "/remote/vgrnd104/guntaka/anaconda3/lib/python3.6/site-packages/pyspark/sql/session.py", line 337, in _inferSchemaFromList
    raise ValueError("can not infer schema from empty dataset")
ValueError: can not infer schema from empty dataset