About learnsqoop

learnsqoop · ‎08-05-2018

Solved it. Noticed that writing to Postgresql was accurate if i read parquet with second option below. parquet("/user-data/xyz/input/TABLE/*) // WRONG numbers in PostgreSQL parquet("/user-data/xyz/input/TABLE/evnt_month=*) // Correct numbers in postgreSQL If someone is aware of such problem, please comment.

mqadri · ‎11-19-2017

Its probably a Spark config issue, can you share the detail log, the information you share doesn't give enough information to identify root cause

learnsqoop · ‎11-15-2017

@Matt Burgess , this issue was resolved by downloading HDF version of Nifi 1.2.0.. Thanks

Shu_ashu · ‎11-06-2017

@Team Spark Your TEMP_tab table having 3 columns and your insert query having 4 columns(* means 3 columns from temp_tab and substr(mytime,0,10) means extra 1 column) use the below query will work for your case FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT id,age,substr(mytime,0,10) as mytime; * *in addition in the above insert statement you are going to miss mytime column value as you are doing sub string that means source data is going to miss from temp_tab table to main_tab table. Ex:- temp_tab having 2017-10-12 12:20:23 but main_tab will have 2017-10-12, here we are going to miss 12:20:23 time from temp_tab to main _tab. In case if you dont want to miss the data then create main tab table with 4 columns in with dt as partition column CREATE TABLE IF NOT EXISTS main_TAB(id int,mytime STRING,age int) PARTITIONED BY (dt string) STORED AS ORC tblproperties ("orc.compress"="ZLIB"); then do insert statement as below FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT *,substr(mytime,0,10) as mytime; in this case partition column would be dt and you are not missing temp_tab data at all.

Online	Offline
Last Visited	‎11-09-2017 03:56 AM

Member Since	‎07-18-2017 05:09 AM
Last Visited	‎11-09-2017 03:56 AM
Posts	15

Cloudera Community

Re: PostgreSQL count higher than Spark dataframe

Re: PostgreSQL count higher than Spark dataframe

Re: Executor app finished with state KILLED exitSt...

Re: Nifi Error on PutHiveStreaming

Re: Hive Dynamic partition issue