Am trying to ingest data from Oracle to S3 in Avro Format and creating Hive tables on top of location using auto generated AVSC schema file. But i found datatype mismatch from oracle to hive. By default columns with Date Type is converted to bigint(epoc), [Number, decimal,and varchar to String].
As per our requirement we dont date type columns in epoch ,to handle this during ingestion process using sqoop we map the column to string, And columns with Number are mapped to Integer, Decimal mapped to Double.
Now in Hive date type column is in String, Decimal type is in Double.
1.Does using date type column in string would effect any spark processing time.
2.Does this column will be used to compare the results ? To perform any kind of aggregations between two dates
3. Or casting all date type columns to timestamp in later stages using spark would solve this issues. Final requirement , all date type columns should be in TIMESTAMP,
But as per our requirement we need Date columns in timestamp in hive