About siddarth_wardha

siddarth_wardha · ‎07-03-2018

That works, thank you.

siddarth_wardha · ‎07-03-2018

I am exporting Hive table data to csv files in HDFS using such queries FROM Table T1 INSERT OVERWRITE DIRECTORY '<HDFS Directory>' SELECT *; Hive is writing many small csv files(1-2MB) to the destination directory. Is there a way to control the number of files or the size of csv files? Note: 1) These csv files are not used for creating tables out of them so cannot replace the query with INSERT INTO TABLE... 2) Already tried these setting values to no avail hive.merge.mapfiles=true; hive.merge.mapredfiles hive.merge.smallfiles.avgsize hive.merge.size.per.task mapred.max.split.size mapred.min.split.size; TIA I have many tables in Hive with varying size. Some are very large and some are small. I am fine if for large tables many files are generated till each file is larger than 16 MB. I don't want to explicitly set the number of mappers because that will hamper query performance for large tables.

siddarth_wardha · ‎04-21-2018

@ShuThank you so much. First approach worked. INSERT INTO Target_table(col_1, col_2, col_3) SELECT col_1, col_2,int(null) col_3 FROM Source_table;

siddarth_wardha · ‎04-21-2018

I have two tables in Hive. CREATE TABLE Target_table( col_1 timestamp, col_2 int, col_3 int) CLUSTERED BY (col_1) INTO 50 BUCKETS STORED AS ORC TBLPROPERTIES('transactional'='true') CREATE TABLE Source_table( col_1 timestamp, col_2 int) I am trying to execute this query INSERT INTO Target_table (col_1, col_2) SELECT col_1, col_2 FROM Source_table; Query runs successfully in Beeline. Same query fails when executed via Hortonworks ODBC Driver with the error ERROR [HY000] [Hortonworks][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: Error while compiling statement: FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into target table because column number/types are different 'Targer': Table insclause-0 has 3 columns, but query has 2 columns. Looks like Hive is completely ignoring the column list in the Insert clause. Other Details Cluster: Azure HDInsight Cluster Hortonworkds Data Platform: HDP-2.6.2.25 OS: Windows 10 Language: C# Any help is appreciated.

Online	Offline
Last Visited	‎08-14-2018 12:16 PM

Member Since	‎01-21-2018 04:26 AM
Last Visited	‎08-14-2018 12:16 PM
Posts	9
Kudos received	2

Cloudera Community

Re: Hive writing many small csv files to HDFS

Hive writing many small csv files to HDFS

Re: Difference in Hive query execution via Beeline...

Difference in Hive query execution via Beeline & H...