Created on 10-05-2016 11:38 AM - edited 09-16-2022 08:42 AM
We are using Spark-sql and Parquet data-format. Avro is used as the schema format. We are trying to use “aliases” on field names and are running into issues while trying to use alias-name in SELECT.
Sample schema, where each field has both a name and a alias:
{ "namespace": "com.test.profile",
"type": "record",
"name": "profile",
"fields": [
{"name": "ID", "type": "string"},
{"name": “F1", "type": ["null","int"], "default": "null", "aliases": [“F1_ALIAS"]},
{"name": “F2", "type": ["null","int"], "default": "null", "aliases": [“F2_ALIAS"]}
]
}
Code for SELECT:
val profile = sqlContext.read.parquet(“/user/test/parquet_files/*”)
profile.registerTempTable(“profile")
val features = sqlContext.sql("“SELECT F1_ALIAS from profile”)
It will throw the following exception:
org.apache.spark.sql.AnalysisException: cannot resolve ‘`F1_ALIAS`' given input columns: [ID, F1, F2]
Any suggestions for this use case?
On a side note, what characters are allowed in aliases? e.g. is "!" allowed?
Thank you in advance!