Member since
11-02-2015
2
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9579 | 04-18-2016 02:34 AM | |
2562 | 03-17-2016 06:36 AM |
04-18-2016
02:34 AM
1 Kudo
Regarding the "WARNINGS: Parquet files should not be split into multiple hdfs-blocks" issue, what is the HDFS block size set to for the application that is inserting the parquet data into HDFS? If your application is using the default this should be found in hdfs-site.xml under the dfs.blocksize property. If this is smaller than the size of the parquet files then they will be split into multiple HDFS blocks. This can mean Impala has to read one or more blocks remotely to reassemble a parquet row group. This carries performance impact. This is explained pretty well here: http://ingest.tips/2015/01/31/parquet-row-group-size/ Please note that dfs.blocksize can be set per application so you should not need to modify the global setting. Try raising this to your largest parquet file size when inserting to HDFS.
... View more
03-17-2016
06:36 AM
If you are managing groups via the underlying OS you will need to manually add the group and user mapping to a minimum of the sentry host and your client host. It is recommended to add them to all hosts to avoid permission ambiguity when running jobs.
... View more