Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Who agreed with this topic

Spark-sql fails to use "SELECT" on Aliases on Parquet files (as defined in Avro schema)

avatar
Reader

We are using Spark-sql and Parquet data-format. Avro is used as the schema format. We are trying to use “aliases” on field names and are running into issues while trying to use alias-name in SELECT.

 

Sample schema, where each field has both a name and a alias:

 

{ "namespace": "com.test.profile",
  "type": "record",
  "name": "profile",
  "fields": [
    {"name": "ID", "type": "string"},
    {"name": “F1", "type": ["null","int"], "default": "null", "aliases": [“F1_ALIAS"]},
    {"name": “F2", "type": ["null","int"], "default": "null", "aliases": [“F2_ALIAS"]}
  ]
}

 
Code for SELECT:

 

val profile = sqlContext.read.parquet(“/user/test/parquet_files/*”)
profile.registerTempTable(“profile")
val features = sqlContext.sql("“SELECT F1_ALIAS from profile”)

 

It will throw the following exception:

 

org.apache.spark.sql.AnalysisException: cannot resolve ‘`F1_ALIAS`' given input columns: [ID, F1, F2]

 

Any suggestions for this use case? 

 

On a side note, what characters are allowed in aliases? e.g. is "!" allowed?

 

Thank you in advance!

Who agreed with this topic