Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Who agreed with this topic

Spark-sql fails to use "SELECT" on Aliases on Parquet files (as defined in Avro schema)

New Contributor

We are using Spark-sql and Parquet data-format. Avro is used as the schema format. We are trying to use “aliases” on field names and are running into issues while trying to use alias-name in SELECT.

 

Sample schema, where each field has both a name and a alias:

 

{ "namespace": "com.test.profile",
  "type": "record",
  "name": "profile",
  "fields": [
    {"name": "ID", "type": "string"},
    {"name": “F1", "type": ["null","int"], "default": "null", "aliases": [“F1_ALIAS"]},
    {"name": “F2", "type": ["null","int"], "default": "null", "aliases": [“F2_ALIAS"]}
  ]
}

 
Code for SELECT:

 

val profile = sqlContext.read.parquet(“/user/test/parquet_files/*”)
profile.registerTempTable(“profile")
val features = sqlContext.sql("“SELECT F1_ALIAS from profile”)

 

It will throw the following exception:

 

org.apache.spark.sql.AnalysisException: cannot resolve ‘`F1_ALIAS`' given input columns: [ID, F1, F2]

 

Any suggestions for this use case? 

 

On a side note, what characters are allowed in aliases? e.g. is "!" allowed?

 

Thank you in advance!

Who agreed with this topic