About punshi

EricL · ‎07-15-2019

Hi, The purpose of compression is to save space, not speed up query time. Compression actually adds overhead to decompress the data before data can be read, so I would expect the query against compressed data will be slightly slower than uncompressed. So what you see is totally normal to me. Cheers Eric

punshi · ‎07-03-2019

Thanks @AcharkiMed I tried that there was no improvement however after enabling hyper threading I was able to reduce it to 25sec from 40. I tried my hands on HDFS cache however even after defining cache_pool size to 3gb only 1 gb data gets cached, Any idea ? Query: show table stats tbl_parq_123 +-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+ | year | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location | +-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+ | 1990 | -1 | 2 | 338.45MB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1990 | | 1993 | -1 | 6 | 1.32GB | 0B | 1 | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1993 | | 1994 | -1 | 6 | 1.32GB | 1010.95MB | 1 | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1994 | | 1995 | -1 | 14 | 3.24GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1995 | | 1996 | -1 | 14 | 3.30GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1996 | | 1997 | -1 | 14 | 3.30GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1997 | | 1998 | -1 | 27 | 6.60GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1998 | | 1999 | -1 | 14 | 3.30GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1999 | | 2000 | -1 | 14 | 3.30GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2000 | | 2001 | -1 | 14 | 3.30GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2001 | | 2002 | -1 | 23 | 5.48GB | NOT CACHED | NOT CACHED | PARQUET | false | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2002 | | Total | -1 | 148 | 34.79GB | 1010.95MB | | | | | +-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+ F [root@quickstart ~]# hdfs cacheadmin -listPools Found 1 result. NAME OWNER GROUP MODE LIMIT MAXTTL three_gig_pool impala hdfs rwxr-xr-x 3000000000 never Thanks

punshi · ‎06-26-2019

Hello, Finally I was able to add values into partition table after much reading into the concepts and trial & error. However the load file is still not working and also I came to know while doing that when we specify partition value into query it actually is treated as column. insert into tbl_raw_v12 partition(source_ip="192.168.1.10",destination_ip="172.16.8.177",year,event_date) select * from tbl_123; I managed to add 50k values, tbl_raw_v12 contained 29 columns and tbl_123 contains 27 columns and remaining two column values are added through insert query at runtime. I will try with more values tomorrow to check performance. Thanks

punshi · ‎06-24-2019

I will be getting data from network via snmp or serial. Loading data from file is very fast , I did it manually first to check and it worked however when I try to do the same with my java code I get different errors. I am trying few steps but getting error -- 1. I wrote 1 miilion records into text file through java. 2. I tried to upload the generate text file into HDFS through my java code. 3. I am executing LOAD query through code. However I am unable to execute step 2- 1. I tried java method of ProcessBuilder(), getRuntime.exec() but it didn't worked. Then I tried searching for some API to do following task but I was getting error- java.io.IOException: No FileSystem for scheme: file at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at HdfsWriter.run(impala_crt.java:935) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at impala_crt.writeFile(impala_crt.java:861) at impala_crt.main(impala_crt.java:106) Do you have any solution for this or there is some better method ?

Andrew_Sherman · ‎06-20-2019

I agree that is super confusing

Online	Offline
Last Visited	‎07-23-2019 08:17 AM

Member Since	‎06-17-2019 12:17 AM
Last Visited	‎07-23-2019 08:17 AM
Posts	23

Cloudera Community

Re: No change in overall performance when used gzi...

Re: What performance to expect from Cloudera VM ?

Re: Load Data throws error for partitioned table.

Re: Insert into works very slow in Impala.

Re: Impala JDBC executeBatch() doesn't work for St...