Member since
01-24-2014
5
Posts
0
Kudos Received
0
Solutions
04-02-2018
08:48 AM
@Timothy Spann Thank you for your reply. All of those attributes I tried in testing and was not improved less than 4 hours. After I increased 20 buckets to 30 buckets in source table, 30 minutes reduced in run time. Total run time of the query 3 hour 30 minute. Input data size is not more and i dont want to write multiple files with too less data sets. DDL: # col_name data_type comment reg_id bigint product_family string product_score double product_sales bigint # Detailed Table Information Database: default Owner: CreateTime: Tue Mar 27 10:58:41 EDT 2018 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://xxx.xx.xxxx Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE true numFiles 30 numRows 32788685 rawDataSize 3901853515 totalSize 204127850 transient_lastDdlTime 1522162963 # Storage Information SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Compressed: No Num Buckets: 30 Bucket Columns: [reg_id] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 1.772 seconds, Fetched: 3 Each Container - Resource:
Resource:8192 Memory, 1 VCores Application Master: 4096 Memory, 1 VCores Available Resources: <memory:1851392, vCores:275>
Used Resources: <memory:102400, vCores:13> Source Data Sample: 2234 TOMMY HILFIGER HOME 0.176216039651281134 9171 4222 CHARTER CLUB 0.165046936631880472259 610 2234 AUTHENTIC COMFORT 0.17621603965128113 4901 4222 PHILOSOPHY 0.8252346831594022254 575 2234 WEATHERPROOF VINTAGE 0.1762160396512811317 3671 4 columns with few MB's ORC data table, in more than required resources availability, query runs 3 hour 30 minutes.
... View more
01-27-2014
08:22 PM
Hi Brad, Thanks for the confirmation, could you please put some light on below queries: 1. Earlier the coverage of various technologies in Hadoop ecosystem (including Hive, Pig, Sqoop, Oozie, Crunch, and Flume) was just 8%, however now I see them called out at many places spread across more than one topic. So could you please advise how much % they will have in the exam in terms of questions. 2. I do not see Hbase and Avro mentioned in the new topics, so does that mean no question can be expected on them? Thanks Amit Mittal
... View more