About bpreachuk

bpreachuk · ‎02-25-2016

Thanks Predrag, that's what I thought. psql.py is an option for our smaller files...

bpreachuk · ‎02-24-2016

Running the following code: hadoop jar /usr/hdp/current/phoenix-client/phoenix-4.4.0.2.3.4.0-3485-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool -z <Zookeeper nodes>:2181:/hbase-unsecure -d $'\t' --g --table <DB>.<TBL> --input /data/product/inbound/<FNAME>.TXT is there any way to skip the first line of the input file - is there a parameter on the CsvBulkLoadTool that would allow a skip row? Specifically like what Hive gives you with 'tblproperties ("skip.header.line.count"="1")'. Thanks!

bpreachuk · ‎02-24-2016

There is also HPL/SQL. http://www.hplsql.org/doc It was incorporated into the Hive 2.0 release (Feb 16th) and thus will be automatically included in the next release of HDP that includes Hive 2.0. It can be very easily added to an existing HDP install as well. It is an implementation of stored procs, control flow logic, temporary variables, error checking, on-the-fly SQL syntax conversion to HiveQL, RDBMS SQL syntax compatibility, etc. It brings almost all the functionality of DB2, SQL Server, Oracle stored procs and SQL syntax into Hive. An added bonus is that you can mix & match the syntax of SQL Server T-SQL, Oracle PLSQL and DB2. It is slick. For example - here is a sample script that runs in HPL/SQL (for real): create procedure spLoadSalesFact (LoadMonth char(10)) BEGIN SET plhql.onerror = seterror; print 'Beginning execution...'; print '*' || LoadMonth || '*'; /* Fail if invalid Input Parameter set LoadMonth = Upper(LoadMonth); if (LoadMonth not in ('LAST', 'CURRENT')) BEGIN print 'Bad Input Paramter'; RETURN -1; END */ /* Load Prior Month */ IF (LoadMonth = 'LAST') BEGIN print 'Inserting Last Month''s Data...'; insert into table saleslineitem select * from saleslineitem where TransactionDate >= '2011-06-01' and transactionDate < '2011-07-01'; if SQLCODE <> 0 RETURN -2; END /* Load Current Month */ ELSE IF (LoadMonth = 'CURRENT') BEGIN print 'Inserting Current month''s data...'; insert into table saleslineitem select * from saleslineitem where TransactionDate >= '2011-07-01' and transactionDate < '2011-08-01'; if SQLCODE <> 0 RETURN -3; END PRINT 'Execution Completed...'; END /* Stored Proc invocation */ -- CALL spLoadSalesFact ('GARBAGE'); -- CALL spLoadSalesFact ('LAST'); CALL spLoadSalesFact ('CURRENT');

bpreachuk · ‎02-24-2016

I agree with Scott. Bringing data from Hadoop into Access is no big deal. We had an Access database use ODBC to retrieve data from Hadoop with effective results (fast enough). BUT if you try to send data from Access into Hadoop - painfully slow. Row by Row. Sending it off to Sql Server requires very little code in Access, and then you have a durable store of your data. Sending it from SQL Server to Hadoop is effective and fast. Here's a simple sample of a Sqoop script to send data from SQL Server to Hadoop. It reads table Customer from SQL Server DB SQLTestDB and copies it into Hadoop database TestDB tablename Customer. It overwrites any existing data in the Hive table. It also uses 1 mapper. sqoop import --connect "jdbc:sqlserver://<IP Address>:1433;database=SQLTestDB" \ --username root \ --password hadoop \ --table Customer \ --hive-import --hive-overwrite \ --hive-table TestDB.Customer \ -m 1

bpreachuk · ‎12-09-2015

Thanks Carter!

bpreachuk · ‎12-07-2015

Will do. I will send DDL and show partitions data in an email as-is. The difficulty lies in that I can't show you you how the partition was created with the bad data - the person who did it is no longer available.

bpreachuk · ‎12-07-2015

No error message - It just silently does not drop the partition. Checked Hive logs - no underlying messages. Plus I've confirmed there are no security issues either. When it does drop a partition you get a positive message - something to the effect of 'PARTITION XXXXX dropped successfully'

bpreachuk · ‎12-07-2015

Cleaning up some bad partitions, created in error. We currently have an external table with a Hive Partition that I am unable to drop via Alter statement. The Partition has control characters (%0D - what was a Carriage Return) in the partition name field. The table is External and is called <tableName>. It's partitioned by fiscal_year and erp. show partitions <tableName>; fiscal_year=2014%0D/erp=ae_na%0D%0D fiscal_year=2014/erp=ae_na fiscal_year=2014/erp=be_na ... The underlying files in HDFS were deleted long ago and no longer exist. I have tried the following commands without success: ALTER TABLE <tableName> DROP IF EXISTS PARTITION(fiscal_year='2014%0D', erp='ae_na%0D%0D'); ALTER TABLE <tableName> DROP PARTITION(fiscal_year='2014%0D', erp='ae_na%0D%0D'); ALTER TABLE <tableName> DROP PARTITION(fiscal_year>'2014'); Is there a way to drop this partition, or do I have to copy the data out, rebuild the table and move the data back in? Thanks in advance.

bpreachuk · ‎11-25-2015

Yes it can do Inserts, but it turns out that they're RBAR updates. We may try SIMBA drivers to see if they improve the performance (allow batching) but we are in the meantime moving ahead with Inserts via Access -> SQL Server -> Hive via Sqoop.

bpreachuk · ‎11-25-2015

Here is an update on this task: ODBC read from Hive -> Access works fine (not a surprise). ODBC from Access to Hive runs as RBAR - one row at a time. This is sub-optimal in the RDBMS world, but REALLY crummy in the Hadoop world. This means 1 Hive session per row updated. We were getting about 150 rows updated every 10 minutes. When sending data form Access to Hive, we will be implementing using Access -> SQL Server -> Sqoop into Hive. We may try out Simba drivers in the future - more for curiosity - to see if they perform better with updates - or have the ability to batch updates into Hive. Thanks Neeraj and Birender!

Online	Offline
Last Visited	‎04-26-2019 11:02 AM

Member Since	‎09-25-2015 05:26 PM
Last Visited	‎04-26-2019 11:02 AM
Posts	112
Kudos received	85

Cloudera Community

Re: Does Hortonworks have EOL dates?

Re: I have multiple tables which i need to join an...

Re: Difference between WHERE ...OR & WHERE ... IN

Re: How to insert individual rows into hive based ...

Re: Why is there data limitation with LLAP?

Re: Any option to SKIP header line(s) when using t...

Any option to SKIP header line(s) when using the P...

Re: Migrating SQL or PLSQL language to Pig and Hiv...

Re: Best way to migrate MS Access Databases to Had...

Re: Cannot drop Hive partition with control charac...

Re: Cannot drop Hive partition with control charac...

Re: Cannot drop Hive partition with control charac...

Cannot drop Hive partition with control characters...

Re: Can Microsoft/Hortonworks ODBC Driver provide ...

Re: Can Microsoft/Hortonworks ODBC Driver provide ...