Support Questions

msun · ‎10-05-2016

We are using Spark-sql and Parquet data-format. Avro is used as the schema format. We are trying to use “aliases” on field names and are running into issues while trying to use alias-name in SELECT.

Sample schema, where each field has both a name and a alias:

{ "namespace": "com.test.profile",
  "type": "record",
  "name": "profile",
  "fields": [
    {"name": "ID", "type": "string"},
    {"name": “F1", "type": ["null","int"], "default": "null", "aliases": [“F1_ALIAS"]},
    {"name": “F2", "type": ["null","int"], "default": "null", "aliases": [“F2_ALIAS"]}
  ]
}

Code for SELECT:

val profile = sqlContext.read.parquet(“/user/test/parquet_files/*”)
profile.registerTempTable(“profile")
val features = sqlContext.sql("“SELECT F1_ALIAS from profile”)

It will throw the following exception:

org.apache.spark.sql.AnalysisException: cannot resolve ‘`F1_ALIAS`' given input columns: [ID, F1, F2]

Any suggestions for this use case?

On a side note, what characters are allowed in aliases? e.g. is "!" allowed?

Thank you in advance!

Cloudera Community

Support Questions

Who agreed with this topic

Spark-sql fails to use "SELECT" on Aliases on Parquet files (as defined in Avro schema)