Member since
02-26-2016
28
Posts
6
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
873 | 07-21-2016 12:21 PM |
03-27-2018
07:35 AM
When you know your application ID try accessing your yarn logs Usage: yarn logs -applicationId <application ID> [options] COMMAND_OPTIONS Description -applicationId <application ID> Specifies an application id -appOwner <AppOwner> AppOwner (assumed to be current user if not specified) -containerId <ContainerId> ContainerId (must be specified if node address is specified) -help Help -nodeAddress <NodeAddress> NodeAddress in the format nodename:port (must be specified if container id is specified) https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
... View more
07-21-2017
07:33 AM
so if I have multiple hive installations (not sure how to check, but I can ask someone who should know this) than by adding the --hive-home to my sqoop script the table should be overwritten? So with new delimiter/columns and comment when the table was created? Because the documentation only mentions that a create table operation is created but I'm missing the reference there that the table will be dropped first if it exists.
... View more
07-19-2017
08:35 AM
1 Kudo
Thanks rbiswas, but I'm not sure how this helps. Sqoop creates a table fine the first time. It just doesn't recreate the table with the new tabledefs and updated timestamp from when the data was loaded (as a table comment). My question is if sqoop should recreate a table when using the --hive-overwrite function or if it is supposed to overwrite only the data.
... View more
07-18-2017
09:57 AM
if you also want your sqoop command included you can expend the command like this (with set -x and set +x:
{
echo $(date)
set -x
beeline -u ${hive2jdbcZooKeeperUrl} -f "file.hql"
set +x
echo $(date)
} 2>&1 | tee /tmp/sqoop.log
... View more
07-18-2017
09:51 AM
Dear all, perhaps my understanding is incorrect but I'm trying to reload a table using sqoop import, that already exists in hive. My expectation is that besides overwriting the actual data, also the table will be dropped and recreated. A table structure can of course change and as sqoop cannot detect changes unless you are using the incremental option, it should assume that this will happen - thus dropping and recreating the table with the new structure. Also a change in for instance a delimiter used to separate records or attributes can happen. Currently the table is not recreated though. The ddl stays untouched and also the the comment when data has been loaded is not updated. Is this a bug or am I missing a command in my sqoop action? My sqoop action looks like this: <sqoop xmlns="uri:oozie:sqoop-action:0.4">
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sap://[server]:30015/?currentschema=[schema]</arg>
<arg>--username</arg>
<arg>[username]</arg>
<arg>--password-file</arg>
<arg>[pwd_location]</arg>
<arg>--driver</arg>
<arg>com.sap.db.jdbc.Driver</arg>
<arg>--query</arg>
<arg>SELECT * FROM "ZDLTABLES" WHERE $CONDITIONS</arg>
<arg>--hive-import</arg>
<arg>--hive-database</arg>
<arg>[hive_schema]</arg>
<arg>--hive-table</arg>
<arg>zdltables</arg>
<arg>--hive-delims-replacement</arg>
<arg>\040</arg>
<arg>--fields-terminated-by</arg>
<arg>\037</arg>
<arg>--hive-overwrite</arg>
<arg>--compress</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<name-node>[name_node]</name-node>
<job-tracker>[job_tracker]</job-tracker>
<property xmlns="">
<name>oozie.launcher.mapred.job.queue.name</name>
<value>default</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>mapreduce.job.queuename</name>
<value>default</value>
<source>programatically</source>
</property>
</configuration>
</sqoop>
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Sqoop
05-16-2017
07:56 AM
it's not only an issue with split, also with eg concat. escaping a semicolon in hue works fine but beeline errors with the same error as above. Replacing the semicolon with \073 works here as well.
... View more
01-13-2017
08:38 AM
If you want to use crontab that you already decided to use a time-trigger/interval right? You should really use a coordinator.
If you really want to stick with crontab than the command is more or less correct. You have a typo (--oozie should be -oozie, and the port normally is 11000 but I guess you already confirmed the port?) and normally you refer to a job.properties file (stored locally not on hdfs) with -config.
So it should look like: oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config /path/to/job.properties -run
In the job.properties file you would have parameters listed like namenode, jobtracker, hcmetastoreuri and of course the one you provide via the -D: oozie.wf.application.path Normally the hdfs://namenode part can be omitted from the apppath url.
... View more
12-14-2016
11:58 AM
We are currently using a mysql metastore in our test environment so it is possible (we run hpd 2.5). Only thing is that oozie requires the sqoop-site.xml file to be placed somewhere in hdfs to access the metastore. We don't really like the idea that passwords are just stored like that..
... View more
11-03-2016
01:10 PM
The answer is outdated. It is possible to use a character attribute as split-by attribute. You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true after your 'sqoop job' statement like this: sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\
--create ${JOB_NAME} \\
-- \\
import \\
--connect \"${JDBC}\" \\
--username ${SOURCE_USR} \\
--password-file ${PWD_FILE_PATH} \\ no guarantees though that sqoop splits your records evenly over your mappers though.
... View more
10-27-2016
04:52 PM
worked! many thanks, you saved my day.
... View more