Member since
07-18-2017
15
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1177 | 08-05-2018 06:56 AM |
08-05-2018
06:56 AM
Solved it. Noticed that writing to Postgresql was accurate if i read parquet with second option below. parquet("/user-data/xyz/input/TABLE/*) // WRONG numbers in PostgreSQL parquet("/user-data/xyz/input/TABLE/evnt_month=*) // Correct numbers in postgreSQL If someone is aware of such problem, please comment.
... View more
11-19-2017
07:31 PM
Its probably a Spark config issue, can you share the detail log, the information you share doesn't give enough information to identify root cause
... View more
11-15-2017
05:31 AM
@Matt Burgess , this issue was resolved by downloading HDF version of Nifi 1.2.0.. Thanks
... View more
11-09-2017
03:27 PM
You can use MergeContent with a Merge Strategy of "Avro" and a Max Bin Size equal to (some multiple of) your HDFS block size, then PutHDFS to place the Avro file(s) into your location above (/user/test/csvData/AVRO). Then you should be able to query it from Hive. Alternatively if you can configure your Hive Server according to these requirements, and if you can create your table backed by ORC instead of Avro and can set TBLPROPERTIES("transactional"="true") (see link for more info), then you could use PutHiveStreaming to send your Avro files to Hive.
... View more
11-06-2017
01:35 PM
@Team Spark Your TEMP_tab table having 3 columns and your insert query having 4 columns(* means 3 columns from temp_tab and substr(mytime,0,10) means extra 1 column) use the below query will work for your case FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB
PARTITION (mytime)
SELECT id,age,substr(mytime,0,10) as mytime; * *in addition in the above insert statement you are going to miss mytime column value as you are doing sub string that means source data is going to miss from temp_tab table to main_tab table. Ex:- temp_tab having 2017-10-12 12:20:23 but main_tab will have 2017-10-12, here we are going to miss 12:20:23 time from temp_tab to main _tab. In case if you dont want to miss the data then create main tab table with 4 columns in with dt as partition column CREATE TABLE IF NOT EXISTS main_TAB(id int,mytime STRING,age int)
PARTITIONED BY (dt string)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB"); then do insert statement as below FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB PARTITION (mytime) SELECT *,substr(mytime,0,10) as mytime; in this case partition column would be dt and you are not missing temp_tab data at all.
... View more
10-04-2017
03:54 AM
Sqoop2 is now deprecated since CDH5.9.x and will be removed in CDH6. I strongly advise that you discontinue usage of Sqoop2 and switch to Sqoop1 instead, as Sqoop1 is more stable and has long term support.
... View more
07-19-2017
12:42 AM
Is there someone who as got this issue?.
... View more