About Viraj_Thakur

Viraj_Thakur · ‎03-08-2017

Thanks for the response. But converting from microseconds to seconds returns a date but not the right date. In Spark SQL: select from_unixtime(startdateofweek/1000000) as ts from TEST where location = 2290 and week = 19 and year = 2015 Returns: 1985-05-03 20:00:00 1985-05-03 20:00:00 In Postgres. Running the query: select startdateofweek as ts from TEST_PARQUET where location = 2290 and week = 19 and year = 2015 Returns: 2015-05-04 00:00:00

Viraj_Thakur · ‎03-08-2017

Hello: I have a table in Hawq which is stored as Parquet. The problem is reading the parquet file in Spark returns timestamp columns as Long. The table ddl is: CREATE TABLE test ( location integer, year integer, week integer, startdateofweek timestamp without time zone, enddateofweek timestamp without time zone ) WITH (APPENDONLY=true, ORIENTATION=parquet, COMPRESSTYPE=snappy, OIDS=FALSE ) DISTRIBUTED RANDOMLY; Note the columns startdateofweek and enddateofweek. These are created as timestamp type. Next I read this table in Spark using the parquet file location rather than connecting to Hawq using a JDBC connection. val path = "/hawq_default/16385/312220/479532/*" val df = spark.read.parquet(path) df.printSchema df.show(2) This returns the timestamp columns as long rather than timestamp. |-- location: integer (nullable = true) |-- year: integer (nullable = true) |-- week: integer (nullable = true) |-- startdateofweek: long (nullable = true) |-- enddateofweek: long (nullable = true) +--------+----+----+---------------+---------------+ |location|year|week|startdateofweek| enddateofweek| +--------+----+----+---------------+---------------+ | 2290|2015| 19|484012800000000|484531200000000| | 2290|2015| 19|484012800000000|484531200000000| I have tried many different things to convert this long to timestamp. But nothing works. Some of the things I have tried are: 1) Specifying the schema for dataset using a case class. Still returns as long. case class myLog(location: Integer, year: Integer,week:Integer, startdateofweek:java.sql.Timestamp, enddateofweek:java.sql.Timestamp) val path = "/hawq_default/16385/312220/479532/*" val peopleDS = spark.read.parquet(path).as[myLog] peopleDS.show(2) 2) Use from_unixtime function. This returns bogus dates. select from_unixtime(startdateofweek/1000) as ts from TEST where location = 2290 and week = 19 and year = 2015 17307-10-03 20:00:00 17307-10-03 20:00:00 Any help is much appreciated in how to deal with this issue. Thanks much

Online	Offline
Last Visited	‎03-14-2017 12:44 PM

Member Since	‎07-21-2016 03:47 PM
Last Visited	‎03-14-2017 12:44 PM
Posts	4

Cloudera Community

Re: Hawq Timestamp datatype in Spark

Hawq Timestamp datatype in Spark