About mraghuram_87

mraghuram_87 · ‎01-17-2017

Could not paste both the explain plans in previous comment. Here is the explain plan by disabling hive.explain.user=false. hive> set hive.explain.user=false; hive> explain select a.* from big_part a, small_np b where a.jdate = b.jdate; OK STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-1 Tez DagId: A515595_20170117140547_4494cba3-581e-441c-8fb6-8175b74d89c2:3 Edges: Map 1 <- Map 2 (BROADCAST_EDGE) DagName: Vertices: Map 1 Map Operator Tree: TableScan alias: a filterExpr: jdate is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Filter Operator predicate: jdate is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Map Join Operator condition map: Inner Join 0 to 1 keys: 0 jdate (type: date) 1 jdate (type: date) outputColumnNames: _col0, _col1, _col6 input vertices: 1 Map 2 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE HybridGraceHashJoin: true Filter Operator predicate: (_col1 = _col6) (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: _col0 (type: int), _col1 (type: date) outputColumnNames: _col0, _col1 Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE File Output Operator compressed: false Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Execution mode: vectorized Map 2 Map Operator Tree: TableScan alias: b filterExpr: jdate is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator predicate: jdate is not null (type: boolean) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Reduce Output Operator key expressions: jdate (type: date) sort order: + Map-reduce partition columns: jdate (type: date) Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: ListSink Time taken: 0.428 seconds, Fetched: 68 row(s)

mraghuram_87 · ‎01-17-2017

Thanks for your comments! Here is the explain plan and create table statements. Hive version is 0.14. And also for the 3rd answer, in case both are partitioned tables, is there anyway to ensure that bigger of the two tables undergo partition pruning instead of the small one? (or is this the default behavior?) What does hive.explain.user = false do? I have attached explain plan by both enabling and disabling this. > show create table big_part; OK CREATE TABLE `big_part`( `id` int) PARTITIONED BY ( `jdate` date) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://littleredns/apps/hive/warehouse/big_part' TBLPROPERTIES ( 'transient_lastDdlTime'='1484615054') Time taken: 1.749 seconds, Fetched: 14 row(s) hive> show create table small_np; OK CREATE TABLE `small_np`( `id2` int, `jdate` date) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://littleredns/apps/hive/warehouse/small_np' TBLPROPERTIES ( 'transient_lastDdlTime'='1484615162') Time taken: 0.16 seconds, Fetched: 13 row(s) hive> set hive.optimize.ppd=true; hive> set hive.tez.dynamic.partition.pruning=true; hive> explain select a.* from big_part a, small_np b where a.jdate = b.jdate; OK Plan not optimized by CBO. Vertex dependency in root stage Map 1 <- Map 2 (BROADCAST_EDGE) Stage-0 Fetch Operator limit:-1 Stage-1 Map 1 vectorized File Output Operator [FS_21] compressed:false Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"} Select Operator [OP_20] outputColumnNames:["_col0","_col1"] Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Filter Operator [FIL_19] predicate:(_col1 = _col6) (type: boolean) Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE Map Join Operator [MAPJOIN_18] | condition map:[{"":"Inner Join 0 to 1"}] | HybridGraceHashJoin:true | keys:{"Map 2":"jdate (type: date)","Map 1":"jdate (type: date)"} | outputColumnNames:["_col0","_col1","_col6"] | Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE |<-Map 2 [BROADCAST_EDGE] | Reduce Output Operator [RS_4] | key expressions:jdate (type: date) | Map-reduce partition columns:jdate (type: date) | sort order:+ | Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE | Filter Operator [FIL_14] | predicate:jdate is not null (type: boolean) | Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE | TableScan [TS_1] | alias:b | Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE |<-Filter Operator [FIL_17] predicate:jdate is not null (type: boolean) Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL TableScan [TS_0] alias:a Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: PARTIAL Time taken: 1.459 seconds, Fetched: 45 row(s)

mraghuram_87 · ‎01-17-2017

Hi, Could someone please explain me understand the below questions on hive partition pruning and explain plan? 1. How to check if partition pruning occurs by checking the explain plan? I thought I would see "Dynamic partitioning event operator" in explain plan, but in my sample query below I am not seeing any such operator. I enabled hive.tez.dynamic.partition.pruning. -- Since the table does not have much data, it is going for map join, does that have anything to do with partition pruning not happening? explain select a.* from big_part a, small_np b where a.jdate = b.jdate ; big_part is partitioned on jdate where small_np is a non partitioned table, even adding explicit filter on jdate like jdate = "2017-01-01" is not showing this operator in explain plan. The tables are just in text formats. I tried disabling and enabling hive.optimize.ppd . But it just changed adding or removing a filter operator much higher in explain plan, but no difference besides that. Will the optimize.ppd parameter have any effect on partition pruning? 2. Is it correct to expect that dynamic partition pruning should happen on big_part table in the above query? 3. If both the tables used in join are partitioned, can we expect that dynamic partition pruning happens on both tables? 4. Will the dynamic partition pruning occur in case of outer joins too? (full and left outer assuming that inner table's conditions are given in "on condition" or outer table's conditions are given in "where clause"). 5. What exactly this hive.optimize.ppd do in case of text files? Just push the filter predicates when reading from table itself if possible? Thank you!

mraghuram_87 · ‎09-28-2016

Thanks for the suggestion. I have not tried these parameters.. What are these parameters for? Are these the ones that help set the mapper memory size in pig?

mraghuram_87 · ‎09-27-2016

I am running my pig scripts and Hive queries in tez mode. For all of these pig scripts/Hive queries the mapper memory requested was more than the memory used. So I changed the mapreduce.map.memory.mb to a lesser value and also changed the mapreduce.map.java.opts. Even after changing these values, my mapper memory requested is more than the map memory used, nothing seemed to changed in performance metrics. (This was from analyzing the job in Dr. elephant), but then the pig script also aborted now with below error message after changing these settings. "java.lang.IllegalArgumentException: tez.runtime.io.sort.mb 1638 should be larger than 0 and should be less than the available task memory (MB):786" I never gave 786 MB anywhere in my setting, where did it take this value from? And also, how do I configure the map and reduce memory in tez execution mode? (I see documentation for hive to set then hive.tez.container.size, but nothing for pig). And is it possible to configure the map and reduce memory differently in tez mode? since in hive on tez documentation it was just mentioned about the map memory setting nothing for reducer memory. And also since tez creates a dag of tasks, they are not like map reduce right, both map and reduce are just seen as an individual task in DAG? or are these DAG tasks still can be classified into mapper/reducer actions? Thanks!

mraghuram_87 · ‎06-14-2016

When I added the hive-site.xml first, I missed a few properties, now I added all properties mentioned by @allen huang in this link https://community.hortonworks.com/questions/25121/oozie-execute-sqoop-falls.html#answer-25291 So even if the sqoop is called using ooziee shell action, I had to add hive-site.xml with properties mentioned by Allen. Thank you Allen :). My script is working fine now.

mraghuram_87 · ‎06-14-2016

Hi, I checked the logs. No information found as to why the script aborted. This is all is shown in the log. INFO hive.HiveImport: Loading uploaded data into Hive WARN conf.HiveConf: HiveConf of name hive.metastore.pre-event.listeners does not exist WARN conf.HiveConf: HiveConf of name hive.semantic.analyzer.factory.impl does not exist Logging initialized using configuration in jar:file:/grid/8/hadoop/yarn/local/filecache/5470/hive-common-1.2.1.2.3.4.0-3485.jar!/hive-log4j.properties Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

mraghuram_87 · ‎06-10-2016

Hi, I am running a oozie shell action to run a sqoop command to import data to Hive. When I run the sqoop command directly, it works fine, but when I run it through oozie shell action, it aborts with Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] Based on this link, https://community.hortonworks.com/questions/25121/oozie-execute-sqoop-falls.html#answer-25290 I have added hive-site.xml also using <file> tag in oozie shell action and also based on other link I have added export HIVE_CONF_DIR=`pwd` before running the sqoop command. But nothing worked. When I add full hive-site.xml it resulted in the same error above, when I added just the important properties mentioned in this link http://ingest.tips/2014/11/27/how-to-oozie-sqoop-hive/, I get this error FAILED: IllegalStateException Unxpected Exception thrown: Unable to fetch table XYZ. java.net.SocketException: Connection resetFailing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1] Both the times, the sqoop command successfully creates the file in target-directory but fails while loading this data to hive. Hadoop cluster is kerberos enabled. I have a kinit done before submitting the workflow and also kinit is done again inside the oozie shell action. Can someone please throw some light on how to fix this one? below is the sqoop command used. Sqoop command:sqoop import \ --connect "jdbc:teradata://${server}/database=${db},logmech=ldap" \ --driver "com.teradata.jdbc.TeraDriver" \ --table "XYZ" \ --split-by "col1" \ --hive-import \ --delete-target-dir \ --target-dir "/user/test/" \ --hive-table "default.XYZ" \ --username "terauser" \ --password tdpwd \ --where "${CONDITION}" \ --m 2 \ --fetch-size 1000 \ --hive-drop-import-delims \ --fields-terminated-by '\001' \ --lines-terminated-by '\n' \ --null-string '\\N' \ --null-non-string '\\N'

mraghuram_87 · ‎06-10-2016

Thanks, I was able to set up the SSH and it is working. But I have a question. What is this oozie ID? I am logging in to Linux using my ID. I submit the workflow either using my ID or I do a kinit to other ID and submit the workflow using that ID. When I see in the UI logs, oozie workflow is shown to be submitted using either my ID or other ID for which ticket was obtained using Kinit. What I dont understand is where does this "oozie" user id fit in. Even I had to go to the home directory of this user "oozie" and get its keys and added it to my destination server authprized_keys file. Can you please explain the purpose of this id? And also how to find this "oozie user id" Is this user id available in oozie-site.xml? Based on this article I also searched for oozie id, but what if it were different? how to find this oozie id? Thanks!

mraghuram_87 · ‎06-07-2016

Hi, I need to execute a shell script in a remote server from hadoop server. So I am planning to use oozie SSH action for this. I have 2 basic questions regarding oozie actions. 1. for passwordless SSH, I need to share the public keys between 2 servers. In case of oozie SSH action, where does the oozie workflow initiate the SSH action from? Does it execute from any of the data nodes? If so, how to setup the SSH or get the public keys. 2. Does the oozie Shell action execute from any of the available data nodes or is there any specific way this execution server is identified? Thanks!

Online	Offline
Last Visited	‎09-16-2017 09:59 PM

Member Since	‎12-30-2015 07:20 PM
Last Visited	‎09-16-2017 09:59 PM
Posts	68
Kudos received	16

Cloudera Community

Re: Sqoop to import data to Hive through oozie she...

Re: Reading data from Hive (Hbase) external table ...

Re: TDCH export from hive to teradata

Re: Question on hive dynamic partition pruning and...

Re: Question on hive dynamic partition pruning and...

Question on hive dynamic partition pruning and exp...

Re: Setting Mapper memory for pig in tez mode

Setting Mapper memory for pig in tez mode

Re: Sqoop to import data to Hive through oozie she...

Re: Sqoop to import data to Hive through oozie she...

Sqoop to import data to Hive through oozie shell a...

Re: question on oozie SSH action

question on oozie SSH action