Member since
05-16-2016
270
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1717 | 07-23-2016 11:36 AM | |
3053 | 07-23-2016 11:35 AM | |
1563 | 06-05-2016 10:41 AM | |
1157 | 06-05-2016 10:37 AM |
12-06-2017
06:28 AM
@Kit Menke - Hey Kit , I have requirement where the users need to execute a query using beeline from hdfs.I tried your approach however i have tried several versions of it and the outcome unfortunately contradicts your posts.Can beeline access hdfs uri?? It would be great help if you could share your thoughts on this. beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "City.sql" --verbose true --hivevar HDFSDIR="hdfs://namenode1.dc.corp.astro.com:8020/user/xxx"
############ OUTPUT ##########################
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
City.sql (No such file or directory)
##############################################
Option 2:
beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "hdfs://namenode1.dc.corp.astro.com:8020/user/xxx/City.sql" --verbose true
############ OUTPUT ##########################
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
City.sql (No such file or directory)
##############################################
Option 3:
beeline -u "jdbc:hive2://namenode2.dc.corp.astro.com:2181,namenode1.dc.corp.astro.com:2181,namenode3.dc.corp.astro.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNameSpace=hiveserver2sockettimeout=600000;tcpKeepAlive=true" -n xxx -p ******* -f "hdfs://user/xxx/City.sql" --verbose true
############ OUTPUT ##########################
Connected to: Apache Hive (version 1.2.1000.2.6.0.3-8)
Driver: Hive JDBC (version 1.2.1000.2.6.0.3-8)
Transaction isolation: TRANSACTION_REPEATABLE_READ
City.sql (No such file or directory)
##############################################
... View more
08-08-2016
09:04 AM
1 Kudo
@Simran Kaur Hive CSV serde is built on Open CSV which support all basic csv type operations. Basically, to design the serde based on the type of data within the csv file. Refer to below link for details: http://opencsv.sourceforge.net/
... View more
12-26-2016
10:38 PM
2 Kudos
@Simran Kaur You need to use a join or a sub-query within the sub-query. You can access out query from inner query. It is the other way around and by design for any SQL-like language.
... View more
07-12-2016
02:08 PM
OK. That's good. I have not seen any specific issue with NOT IN. Without seeing the data it stands to be asked again - You have confirmed that (a) count(*) on the inner query returns results, (b) that it is not failing silently, and (c) that there are 1+ email addresses in June do not exist prior to June... Here are 2 things to check to see if it is indeed a Hive issue... (1) Try running the query once with MapReduce as the execution engine and then with Tez as the execution engine and see if you get differing results. set hive.execution.engine=mr; set hive.execution.engine=tez; (2) Change the query to use a temporary table and evaluate your results that way (double-check my syntax but it should be close to correct... create temporary table temp_table
as select customer_email FROM Table1
WHERE TO_DATE(created_at) < '2016-05-31';
select DISTINCT SF.customer_email
FROM Magento.sales_flat_order SF
WHERE YEAR(TO_DATE(SF.created_at)) = '2016'
AND MONTH(TO_DATE(SF.created_at)) = '6'
AND SF.customer_email NOT IN (select customer_email from temp_table);
select DISTINCT SF.customer_email
FROM Magento.sales_flat_order SF
left join temp_table
on SF.customer_email = temp_table.customer_email
WHERE YEAR(TO_DATE(SF.created_at)) = '2016'
AND MONTH(TO_DATE(SF.created_at)) = '6'
AND temp_table.customer_email is NULL
... View more
07-07-2016
01:45 PM
Hi @Simran Kaur. I added an answer to your other version of this question found here (https://community.hortonworks.com/questions/43885/not-in-operator-in-hive-with-a-subquery.html#answer-43944) One other thing I like to look for is leading or trailing blanks in the comparison field (email address). But in this instance it should not cause a problem since you are using the same table (and the same email addresses) in the inner and outer query.
... View more
07-07-2016
01:39 PM
Hi @Simran Kaur. This query 'should' work. 😉 Quick things to double-check... 1. These 2 queries return counts > 0 select count(*)
FROM Table1 SF
WHERE YEAR(TO_DATE(SF.created_at)) = '2016'
AND MONTH(TO_DATE(SF.created_at)) = '6' select count(*) FROM Table1 SFO
WHERE TO_DATE(SFO.created_at) < '2016-05-31' 2. based on the counts returned - is it possible that there are no customers from June 2016 that did not receive an email prior to 2016 June? (i.e. the correct result is in fact zero)?
... View more
07-23-2016
11:35 AM
Here's a workaround that fixed it for me: http://www.yourtechchick.com/sqoop/illegal-partition-exception-sqoop-incremental-imports/
... View more
07-01-2016
08:15 AM
1 Kudo
@Simran Kaur If you have a Hive metastore associated with your HDFS cluster, --hive-import and -hive-overwrite always writes to Hive warehouse directory. Arguments like --warehouse-dir <dir>, --as-avrodatafile, --as-sequencefile, --target-dir etc. are not honoured. Thanks and Regards, Sindhu
... View more
01-11-2018
05:54 AM
Yes, use UpdateAttribute and the expression language to add missing values as appropriate.
... View more
07-23-2016
11:36 AM
here's a workaround that fixed it for me : http://www.yourtechchick.com/sqoop/illegal-partition-exception-sqoop-incremental-imports/
... View more