About rajukp

rajukp · ‎04-08-2019

Hi All, When I use datatype float VS double I see that when data is ingested with n digit precision in fraction part ex (2.57) , subsequent query output returns 2.5 59999942779541 for float whereas it result exact 2 digit precision for float as shown in example below. Is there a way we can achieve similar behavior of double for float. Problem with float is we need to roundoff the output value explicitly when using in different applications. [hadoop-data.default.svc.cluster.local:21000] > create table parquet_table_name (x float, y float) STORED AS PARQUET; [hadoop-data.default.svc.cluster.local:21000] > insert into TABLE test values(2.56,2.57); Query: insert into TABLE test values(2.56,2.57) Query submitted at: 2019-04-08 09:56:00 (Coordinator: https://hadoop-data-0:25000) Query progress can be monitored at: https://hadoop-data-0:25000/query_plan?query_id=634a267b54e95cc1:2f111d6300000000 Modified 1 row(s) in 17.34s [hadoop-data.default.svc.cluster.local:21000] > [hadoop-data.default.svc.cluster.local:21000] > [hadoop-data.default.svc.cluster.local:21000] > select * from test; Query: select * from test Query submitted at: 2019-04-08 09:56:25 (Coordinator: https://hadoop-data-0:25000) Query progress can be monitored at: https://hadoop-data-0:25000/query_plan?query_id=cf4731ca88960669:972a651a00000000 +-------------------+------+ | x | y | +-------------------+------+ |2.5 59999942779541 | 2.57 | +-------------------+------+ Fetched 1 row(s) in 2.52s [hadoop-data.default.svc.cluster.local:21000] > Thanks, Raju

rajukp · ‎02-13-2019

Thanks so much for help , I will try out sorting and validate query performance.

rajukp · ‎02-12-2019

If I am using dictionary encoding for the column, do I still need to write data in sorted order in parquet file .

rajukp · ‎02-12-2019

I am using parquet-cpp to write parquet file and the upload it to HDFS using web-hdfs . At the end use "LOAD DATA" command to load iparquet file nto into impla. Is there any option in parquet-cpp to sort it out.

rajukp · ‎02-11-2019

Hi All, I was looking at this BLOG https://blog.cloudera.com/blog/2017/12/faster-performance-for-selective-queries/ where we see that using "SORT BY" during table creation we can improve impala query performance . As mentioned in the blog this works only if we use "INSERT" or "CREAT table with select " . Our use case is we create parquet file externally and UPLOAD it onto HDFS and then use IMPALA " LOAD DATA" command. Is there a way we can use "SORT BY" mechanism with this model of loading parquet files. Thanks, Raju.

Online	Offline
Last Visited	‎04-16-2019 01:47 PM

Member Since	‎02-11-2019 12:39 PM
Last Visited	‎04-16-2019 01:47 PM
Posts	5

Cloudera Community

IMPALA float vs double fraction part usage

Re: Using SORT BY with externally loaded parquet f...

Re: Using SORT BY with externally loaded parquet f...

Re: Using SORT BY with externally loaded parquet f...

Using SORT BY with externally loaded parquet files...