Member since
09-25-2015
112
Posts
88
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9518 | 03-15-2017 03:17 PM | |
5962 | 02-17-2017 01:03 PM | |
1749 | 01-02-2017 10:47 PM | |
2626 | 11-16-2016 07:03 PM | |
1053 | 10-07-2016 05:24 PM |
02-25-2016
02:58 AM
1 Kudo
Thanks Predrag, that's what I thought. psql.py is an option for our smaller files...
... View more
02-24-2016
08:27 PM
2 Kudos
Running the following code: hadoop jar /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool
-z <Zookeeper nodes>:2181:/hbase-unsecure
-d $'\t'
--g
--table <DB>.<TBL>
--input /data/product/inbound/<FNAME>.TXT is there any way to skip the first line of the input file - is there a parameter on the CsvBulkLoadTool that would allow a skip row? Specifically like what Hive gives you with 'tblproperties ("skip.header.line.count"="1")'. Thanks!
... View more
Labels:
- Labels:
-
Apache Phoenix
02-24-2016
11:54 AM
4 Kudos
There is also HPL/SQL. http://www.hplsql.org/doc It was incorporated into the Hive 2.0 release (Feb 16th) and thus will be automatically included in the next release of HDP that includes Hive 2.0. It can be very easily added to an existing HDP install as well. It is an implementation of stored procs, control flow logic, temporary variables, error checking, on-the-fly SQL syntax conversion to HiveQL, RDBMS SQL syntax compatibility, etc. It brings almost all the functionality of DB2, SQL Server, Oracle stored procs and SQL syntax into Hive. An added bonus is that you can mix & match the syntax of SQL Server T-SQL, Oracle PLSQL and DB2. It is slick. For example - here is a sample script that runs in HPL/SQL (for real): create procedure spLoadSalesFact (LoadMonth char(10))
BEGIN
SET plhql.onerror = seterror;
print 'Beginning execution...';
print '*' || LoadMonth || '*';
/* Fail if invalid Input Parameter
set LoadMonth = Upper(LoadMonth);
if (LoadMonth not in ('LAST', 'CURRENT'))
BEGIN
print 'Bad Input Paramter';
RETURN -1;
END
*/
/* Load Prior Month */
IF (LoadMonth = 'LAST')
BEGIN
print 'Inserting Last Month''s Data...';
insert into table saleslineitem
select * from saleslineitem
where TransactionDate >= '2011-06-01'
and transactionDate < '2011-07-01';
if SQLCODE <> 0
RETURN -2;
END
/* Load Current Month */
ELSE IF (LoadMonth = 'CURRENT')
BEGIN
print 'Inserting Current month''s data...';
insert into table saleslineitem
select * from saleslineitem
where TransactionDate >= '2011-07-01'
and transactionDate < '2011-08-01';
if SQLCODE <> 0
RETURN -3;
END
PRINT 'Execution Completed...';
END
/* Stored Proc invocation */
-- CALL spLoadSalesFact ('GARBAGE');
-- CALL spLoadSalesFact ('LAST');
CALL spLoadSalesFact ('CURRENT');
... View more
02-24-2016
11:40 AM
2 Kudos
I agree with Scott. Bringing data from Hadoop into Access is no big deal. We had an Access database use ODBC to retrieve data from Hadoop with effective results (fast enough). BUT if you try to send data from Access into Hadoop - painfully slow. Row by Row. Sending it off to Sql Server requires very little code in Access, and then you have a durable store of your data. Sending it from SQL Server to Hadoop is effective and fast. Here's a simple sample of a Sqoop script to send data from SQL Server to Hadoop. It reads table Customer from SQL Server DB SQLTestDB and copies it into Hadoop database TestDB tablename Customer. It overwrites any existing data in the Hive table. It also uses 1 mapper. sqoop import --connect "jdbc:sqlserver://<IP Address>:1433;database=SQLTestDB" \
--username root \
--password hadoop \
--table Customer \
--hive-import --hive-overwrite \
--hive-table TestDB.Customer \
-m 1
... View more
12-09-2015
04:48 PM
Thanks Carter!
... View more
12-07-2015
09:38 PM
Will do. I will send DDL and show partitions data in an email as-is. The difficulty lies in that I can't show you you how the partition was created with the bad data - the person who did it is no longer available.
... View more
12-07-2015
09:14 PM
No error message - It just silently does not drop the partition. Checked Hive logs - no underlying messages. Plus I've confirmed there are no security issues either. When it does drop a partition you get a positive message - something to the effect of 'PARTITION XXXXX dropped successfully'
... View more
12-07-2015
08:58 PM
Cleaning up some bad partitions, created in error. We currently have an external table with a Hive Partition that I am unable to drop via Alter statement. The Partition has control characters (%0D - what was a Carriage Return) in the partition name field. The table is External and is called <tableName>. It's partitioned by fiscal_year and erp. show partitions <tableName>;
fiscal_year=2014%0D/erp=ae_na%0D%0D
fiscal_year=2014/erp=ae_na
fiscal_year=2014/erp=be_na
... The underlying files in HDFS were deleted long ago and no longer exist. I have tried the following commands without success: ALTER TABLE <tableName> DROP IF EXISTS PARTITION(fiscal_year='2014%0D', erp='ae_na%0D%0D');
ALTER TABLE <tableName> DROP PARTITION(fiscal_year='2014%0D', erp='ae_na%0D%0D');
ALTER TABLE <tableName> DROP PARTITION(fiscal_year>'2014'); Is there a way to drop this partition, or do I have to copy the data out, rebuild the table and move the data back in? Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hive
11-25-2015
08:30 PM
1 Kudo
Yes it can do Inserts, but it turns out that they're RBAR updates. We may try SIMBA drivers to see if they improve the performance (allow batching) but we are in the meantime moving ahead with Inserts via Access -> SQL Server -> Hive via Sqoop.
... View more
11-25-2015
07:47 PM
1 Kudo
Here is an update on this task: ODBC read from Hive -> Access works fine (not a surprise). ODBC from Access to Hive runs as RBAR - one row at a time. This is sub-optimal in the RDBMS world, but REALLY crummy in the Hadoop world. This means 1 Hive session per row updated. We were getting about 150 rows updated every 10 minutes. When sending data form Access to Hive, we will be implementing using Access -> SQL Server -> Sqoop into Hive. We may try out Simba drivers in the future - more for curiosity - to see if they perform better with updates - or have the ability to batch updates into Hive. Thanks Neeraj and Birender!
... View more
- « Previous
- Next »