About simran_k

abhijitnayak · ‎12-06-2017

@Kit Menke - Hey Kit , I have requirement where the users need to execute a query using beeline from hdfs.I tried your approach however i have tried several versions of it and the outcome unfortunately contradicts your posts.Can beeline access hdfs uri?? It would be great help if you could share your thoughts on this. beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "City.sql" --verbose true --hivevar HDFSDIR="hdfs://namenode1.dc.corp.astro.com:8020/user/xxx" ############ OUTPUT ########################## Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) Transaction isolation: TRANSACTION_REPEATABLE_READ City.sql (No such file or directory) ############################################## Option 2: beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "hdfs://namenode1.dc.corp.astro.com:8020/user/xxx/City.sql" --verbose true ############ OUTPUT ########################## Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) Transaction isolation: TRANSACTION_REPEATABLE_READ City.sql (No such file or directory) ############################################## Option 3: beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "hdfs://user/xxx/City.sql" --verbose true ############ OUTPUT ########################## Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8) Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8) Transaction isolation: TRANSACTION_REPEATABLE_READ City.sql (No such file or directory) ##############################################

ssubhas · ‎08-08-2016

@Simran Kaur Hive CSV serde is built on Open CSV which support all basic csv type operations. Basically, to design the serde based on the type of data within the csv file. Refer to below link for details: http://opencsv.sourceforge.net/

cstanca · ‎12-26-2016

@Simran Kaur You need to use a join or a sub-query within the sub-query. You can access out query from inner query. It is the other way around and by design for any SQL-like language.

bpreachuk · ‎07-12-2016

OK. That's good. I have not seen any specific issue with NOT IN. Without seeing the data it stands to be asked again - You have confirmed that (a) count(*) on the inner query returns results, (b) that it is not failing silently, and (c) that there are 1+ email addresses in June do not exist prior to June... Here are 2 things to check to see if it is indeed a Hive issue... (1) Try running the query once with MapReduce as the execution engine and then with Tez as the execution engine and see if you get differing results. set hive.execution.engine=mr; set hive.execution.engine=tez; (2) Change the query to use a temporary table and evaluate your results that way (double-check my syntax but it should be close to correct... create temporary table temp_table as select customer_email FROM Table1 WHERE TO_DATE(created_at) < '2016-05-31'; select DISTINCT SF.customer_email FROM Magento.sales_flat_order SF WHERE YEAR(TO_DATE(SF.created_at)) = '2016' AND MONTH(TO_DATE(SF.created_at)) = '6' AND SF.customer_email NOT IN (select customer_email from temp_table); select DISTINCT SF.customer_email FROM Magento.sales_flat_order SF left join temp_table on SF.customer_email = temp_table.customer_email WHERE YEAR(TO_DATE(SF.created_at)) = '2016' AND MONTH(TO_DATE(SF.created_at)) = '6' AND temp_table.customer_email is NULL

bpreachuk · ‎07-07-2016

Hi @Simran Kaur. I added an answer to your other version of this question found here (https://community.hortonworks.com/questions/43885/not-in-operator-in-hive-with-a-subquery.html#answer-43944) One other thing I like to look for is leading or trailing blanks in the comparison field (email address). But in this instance it should not cause a problem since you are using the same table (and the same email addresses) in the inner and outer query.

bpreachuk · ‎07-07-2016

Hi @Simran Kaur. This query 'should' work. 😉 Quick things to double-check... 1. These 2 queries return counts > 0 select count(*) FROM Table1 SF WHERE YEAR(TO_DATE(SF.created_at)) = '2016' AND MONTH(TO_DATE(SF.created_at)) = '6' select count(*) FROM Table1 SFO WHERE TO_DATE(SFO.created_at) < '2016-05-31' 2. based on the counts returned - is it possible that there are no customers from June 2016 that did not receive an email prior to 2016 June? (i.e. the correct result is in fact zero)?

simran_k · ‎07-23-2016

Here's a workaround that fixed it for me: http://www.yourtechchick.com/sqoop/illegal-partition-exception-sqoop-incremental-imports/

ssubhas · ‎07-01-2016

@Simran Kaur If you have a Hive metastore associated with your HDFS cluster, --hive-import and -hive-overwrite always writes to Hive warehouse directory. Arguments like --warehouse-dir <dir>, --as-avrodatafile, --as-sequencefile, --target-dir etc. are not honoured. Thanks and Regards, Sindhu

slachterman · ‎01-11-2018

Yes, use UpdateAttribute and the expression language to add missing values as appropriate.

simran_k · ‎07-23-2016

here's a workaround that fixed it for me : http://www.yourtechchick.com/sqoop/illegal-partition-exception-sqoop-incremental-imports/

Online	Offline
Last Visited	‎05-04-2018 06:36 AM

Member Since	‎05-16-2016 08:37 AM
Last Visited	‎05-04-2018 06:36 AM
Posts	270
Kudos received	18

Cloudera Community

Re: Merge MapReduce job fails in oozie

Re: IllegalArgumentException and Illegal partition...

Re: only some sqoop jobs ask for password when run...

Re: Sqoop jobs ask for passward even when record p...

Re: How to run all of hive queries in a file in on...

Re: How to drop delimiter in hive table

Re: correlated subquery in hive

Re: NOT IN operator does not work as expected in h...

Re: hive query to check for entries added in a par...

Re: NOT IN operator in hive with a subquery

Re: IllegalArgumentException and Illegal partition...

Re: How does --hive import work

Re: Data cleaning before storing in HDFS

Re: Merge MapReduce job fails in oozie