About landon_t_robins

landon_t_robins · ‎12-19-2016

WORKING WORKAROUND: Set the options in your JDBC URL. So your URL goes from this... Old: jdbc:hive2://hadoop.company.com:8443/default;transportMode=http;httpPath=gateway/default/hive New: jdbc:hive2://hadoop.company.com:8443/default;transportMode=http;httpPath=gateway/default/hive?hive.execution.engine=tez;tez.queue.name=di;hive.exec.parallel=true;hive.vectorized.execution.enabled=true;hive.vectorized.execution.reduce.enabled

landon_t_robins · ‎12-19-2016

Hive-Hcatalog 1.2.1

landon_t_robins · ‎12-19-2016

Using a set works fine, correct. However, that requires that I do that for every script I try to automate. My goal is to make these values a default part of this base command, and just change what SQL I throw at it. Global values, if you will. This should be remedied by the use of --hiveconf or --property-file, but neither results in it taking the values. Here's an example: Command: beeline -n $user -p $password -u "$jdbc_url" -f $script --verbose true --property-file "query.properties" --fastConnect query.properties hive.execution.engine=tez; tez.queue.name=di; hive.exec.parallel=true; The job shows up in the 'default' queue when it should show up in the 'di' queue. When I use a set command, it does in fact show up in the 'di' queue.

landon_t_robins · ‎12-19-2016

Need to set hive configuration options from Beeline CLI command. The below code, for example, doesn't work. --hiveconf hive.execution.engine=tez --hiveconf and --property-file both don't appear to be making the session take the settings. Here it is in full usage: beeline -n $user -p $password -u "$jdbc_url" -f $script --verbose true --property-file "query.properties" --fastConnect Is the only way to make it work to include the options in the JDBC URL as arguments? Like hive.execution.engine=tez;next;next ?

landon_t_robins · ‎09-26-2016

I've resolved the issue by using the --boundary-query option in Sqoop, in tandem with my use of --query and --split-by. Will post answer shortly. Edit: Posted answer at top of my question. Here's the code again for convenience: sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 -Dmapred.job.queue.name=default --connect jdbc:db2://mybox.mycompany.com:1234/mydb2db --username myuser --password mypassword --query "select * from ( SELECT column1, column2, column3, current timestamp as load_time, CHAR(current date, iso) as load_dt FROM my_database.my_table where column1 <= 1000 )a WHERE $CONDITIONS with UR" --hcatalog-database myHiveDB --hcatalog-table myHiveTable --split-by column1 --fields-terminated-by \001 -m 5 --relaxed-isolation --fetch-size=100000 --boundary-query "SELECT MIN(column1), MAX(column1) FROM my_database.my_table where column1 <= 1000 with UR"

landon_t_robins · ‎09-26-2016

I've attempted that, but it still feeds out the same error as above. Am trying now to use JDBC level connection parameters as below, but it doesn't seem to be passing it to the splitted sub queries. Initial connection is with UR, but subsequent subqueries are not. --connection-param-file /just/some/path/sqoop.properties jdbc.transaction.isolation=TRANSACTION_READ_UNCOMMITTED

landon_t_robins · ‎09-26-2016

EDIT (SOLUTION): Problem: Can't include the "with ur" statement DB2 requires to keep a table from locking. i.e. a "dirty read." Solution: Make use of the --boundary-query option in Sqoop. I modified my Sqoop import statement to leverage the --boundary-query option in Sqoop, which allows me to build the boundary query for determining the splits for the mappers. Typically, since I'm using the --split-by and --query options, this is generated automatically. By adding the new option at the end of my Sqoop statement, and adjusting both it and my standard query to include the desired "with ur", I can achieve a Sqoop job whose multiple mappers fire off non-locking select queries, with ranges of data, with the "with ur" statement. Working code: sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 -Dmapred.job.queue.name=default --connect jdbc:db2://mybox.mycompany.com:1234/mydb2db --username myuser --password mypassword --query "select * from ( SELECT column1, column2, column3, current timestamp as load_time, CHAR(current date, iso) as load_dt FROM my_database.my_table where column1 <= 1000 )a WHERE $CONDITIONS with UR" --hcatalog-database myHiveDB --hcatalog-table myHiveTable --split-by column1 --fields-terminated-by \001 -m 5 --relaxed-isolation --fetch-size=100000 --boundary-query "SELECT MIN(column1), MAX(column1) FROM my_database.my_table where column1 <= 1000 with UR" * This Sqoop command is being dynamically generated from a bash script. If you experience problems adopting the above code, try the following: - put the whole sqoop command on one line - use \$CONDITIONS not $CONDITIONS (if this is coming from a text file, not hardcoded, you don't need the escape. If you're typing it directly in the sqoop command, you do - ensure the information following both --query and --boundary-query are in double quotes as above --- ORIGINAL QUESTION: I have this Sqoop command that nearly works... up until I get a syntax exception from DB2: Sqoop Command: sqoop import -Ddb2.jcc.charsetDecoderEncoder=3 -Dmapred.job.queue.name=default --connect jdbc:db2://mybox.mycompany.com:1234/mydb2db --username myuser --password mypassword --query SELECT column1, column2, column3, current timestamp as load_time, CHAR(current date, iso) as load_dt FROM my_database.my_table where column1 <= 1000 AND $CONDITIONS with UR --hcatalog-database myHiveDB --hcatalog-table myHiveTable --split-by column1 --fields-terminated-by \001 -m 5 --relaxed-isolation --fetch-size=100000 It works great up to this point. Compares the target data with the hcat/hive table, all checks out. Then it tries to do the splits with the bounding values, and flips. Error (and log): 16/09/26 10:32:12 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(column1), MAX(column1) FROM (SELECT column1, column2, column3, current timestamp as load_time, CHAR(current date, iso) as load_dt FROM my_database.my_table where column1 <= 1000 AND (1 = 1) with UR) AS t1 Caused by: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-199, SQLSTATE=42601, SQLERRMC=WITH;AND OR HAVING GROUP INTERSECT ORDER ) FETCH EXCEPT MINUS UNION, DRIVER=4.16.53 I believe it has to do with DB2 needing the "with UR" phrase last, but Sqoop always tacks on the AS t1, causing it to break. Any suggestions to fixing this? We need to do non-locking reads on a DB2 table.

landon_t_robins · ‎01-28-2016

What's the horton-recommended way to get Hive data in Java code? Thinking: hcat api jdbc query HDFS call We're just trying to get Hive data into a data structure in java, like a Collection, and operate on it. Typical Map/Reduce stuff. Specifically this would be used in a Crunch/Cascading setup. Thanks!

landon_t_robins · ‎01-19-2016

Thanks! I gave that a try after posting this, and have updated my question with my findings accordingly. Thanks for answering!

landon_t_robins · ‎01-18-2016

We have Falcon jobs that use oozie workflows (workflow.xml) in HDFS. However, we've made some changes to the Oozie workflow, specifically argument values for actions, but don't see them reflected in Falcon. I notice that the workflow.xml is present in the /apps/falcon/clusterName/staging area for the Falcon process, but it's the older version. How can I get Falcon to refresh/rebuild this area and incorporate the new workflow it points to in HDFS? Will Falcon -update or -repair do it? Edit: Resolved! Falcon -update does the trick. falcon entity -type process -suspend -name hdp0106x-my-process falcon entity -type process -update -name hdp0106x-my-process That provided this output: falcon/update/default/Updated successfully(process) hdp0106x-my-process/Effective Time: 2016-01-20T09:30Z. Old workflow id: 0001498-151208005457707-oozie-oozi-C. New workflow id: 0007639-151208005457707-oozie-oozi-B Explanation: In a recent project, we experienced a minor issue with a Falcon job not adapting future instances to changes we made to its Oozie workflow. In short, we made an enhancement, but Falcon didn’t recognize the change simply by modifying the Oozie workflow. Cause: When you submit & schedule a set of entities to Falcon, it creates a home for it in /apps/falcon/clusterName/staging/falcon/workflows/process/ places the workflow.xml you specified in the process.xml in that home (including a lib folder if you have one) Uses that workflow.xml for future generated instances The only issue is when you want to make changes to that Oozie workflow — Falcon doesn’t re-ingest this workflow to its home at any frequency. It only does this at submit/schedule time. So if you make changes to an Oozie workflow, you need to tell Falcon that it has changed (because it technically runs copies of the workflow and /lib folder you give it, not the HDFS versions themselves). However... the above solution (using falcon update) does the trick!

Online	Offline
Last Visited	‎12-19-2016 09:29 PM

Member Since	‎12-09-2015 03:10 PM
Last Visited	‎12-19-2016 09:29 PM
Posts	34
Kudos received	12

Cloudera Community

Re: Including WITH UR in Sqoop Free-form query Imp...

Re: Limiting Sqoop imports to some time intervals

Re: How to reduce HDFS output file size?

Re: Beeline Configuration hiveconf Equivalent

Re: Beeline Configuration hiveconf Equivalent

Re: Beeline Configuration hiveconf Equivalent

Beeline Configuration hiveconf Equivalent

Re: Including WITH UR in Sqoop Free-form query Imp...

Re: Including WITH UR in Sqoop Free-form query Imp...

Including WITH UR in Sqoop Free-form query Import ...

Fetch Hive Table from HCatalog/Hive

Re: Falcon: Refreshing/Updating Oozie Workflow in ...

Falcon: Refreshing/Updating Oozie Workflow in /app...