Member since
10-05-2016
1
Post
0
Kudos Received
0
Solutions
10-14-2016
10:25 AM
Regarding the questions asked:
> What characters are allowed in aliases? e.g. is "!" allowed?
In regards to Avro aliases, it follows the name rules[1], essentially:
1) Must start with [A-Za-z_]
2) Subsequently contain only [A-Za-z0-9_]
> Is there anyway to get the alias information once it's loaded to Dataframe? There's a "printSchema"
> API that lets you print the schema names, but there's not a counterpart for printing the aliases. Is
> it possible to get the mapping from name to aliases from DF?
Per our CDH 5.7 documentation[2], the spark-avro library strips all doc, aliases and other fields[3] when they are loaded into Spark.
To work around this issue, we recommend to use the original name rather than the aliased name of the field in the table, as the Avro aliases are stripped during loading into Spark.
References:
[1] http://avro.apache.org/docs/1.8.1/spec.html#names
[2] https://www.cloudera.com/documentation/enterprise/5-7-x/topics/spark_avro.html
[3] Avro to Spark SQL Conversion:
The spark-avro library supports conversion for all Avro data types:
boolean -> BooleanType
int -> IntegerType
long -> LongType
float -> FloatType
double -> DoubleType
bytes -> BinaryType
string -> StringType
record -> StructType
enum -> StringType
array -> ArrayType
map -> MapType
fixed -> BinaryType
The spark-avro library supports the following union types:
union(int, long) -> LongType
union(float, double) -> DoubleType
union(any, null) -> any
The spark-avro library does not support complex union types.
All doc, aliases, and other fields are stripped when they are loaded into Spark.
... View more