Member since
02-26-2016
28
Posts
6
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
820 | 07-21-2016 12:21 PM |
03-27-2018
07:35 AM
When you know your application ID try accessing your yarn logs Usage: yarn logs -applicationId <application ID> [options] COMMAND_OPTIONS Description -applicationId <application ID> Specifies an application id -appOwner <AppOwner> AppOwner (assumed to be current user if not specified) -containerId <ContainerId> ContainerId (must be specified if node address is specified) -help Help -nodeAddress <NodeAddress> NodeAddress in the format nodename:port (must be specified if container id is specified) https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs
... View more
07-21-2017
07:33 AM
so if I have multiple hive installations (not sure how to check, but I can ask someone who should know this) than by adding the --hive-home to my sqoop script the table should be overwritten? So with new delimiter/columns and comment when the table was created? Because the documentation only mentions that a create table operation is created but I'm missing the reference there that the table will be dropped first if it exists.
... View more
07-19-2017
08:35 AM
1 Kudo
Thanks rbiswas, but I'm not sure how this helps. Sqoop creates a table fine the first time. It just doesn't recreate the table with the new tabledefs and updated timestamp from when the data was loaded (as a table comment). My question is if sqoop should recreate a table when using the --hive-overwrite function or if it is supposed to overwrite only the data.
... View more
07-18-2017
09:57 AM
if you also want your sqoop command included you can expend the command like this (with set -x and set +x:
{
echo $(date)
set -x
beeline -u ${hive2jdbcZooKeeperUrl} -f "file.hql"
set +x
echo $(date)
} 2>&1 | tee /tmp/sqoop.log
... View more
07-18-2017
09:51 AM
Dear all, perhaps my understanding is incorrect but I'm trying to reload a table using sqoop import, that already exists in hive. My expectation is that besides overwriting the actual data, also the table will be dropped and recreated. A table structure can of course change and as sqoop cannot detect changes unless you are using the incremental option, it should assume that this will happen - thus dropping and recreating the table with the new structure. Also a change in for instance a delimiter used to separate records or attributes can happen. Currently the table is not recreated though. The ddl stays untouched and also the the comment when data has been loaded is not updated. Is this a bug or am I missing a command in my sqoop action? My sqoop action looks like this: <sqoop xmlns="uri:oozie:sqoop-action:0.4">
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sap://[server]:30015/?currentschema=[schema]</arg>
<arg>--username</arg>
<arg>[username]</arg>
<arg>--password-file</arg>
<arg>[pwd_location]</arg>
<arg>--driver</arg>
<arg>com.sap.db.jdbc.Driver</arg>
<arg>--query</arg>
<arg>SELECT * FROM "ZDLTABLES" WHERE $CONDITIONS</arg>
<arg>--hive-import</arg>
<arg>--hive-database</arg>
<arg>[hive_schema]</arg>
<arg>--hive-table</arg>
<arg>zdltables</arg>
<arg>--hive-delims-replacement</arg>
<arg>\040</arg>
<arg>--fields-terminated-by</arg>
<arg>\037</arg>
<arg>--hive-overwrite</arg>
<arg>--compress</arg>
<arg>--num-mappers</arg>
<arg>1</arg>
<name-node>[name_node]</name-node>
<job-tracker>[job_tracker]</job-tracker>
<property xmlns="">
<name>oozie.launcher.mapred.job.queue.name</name>
<value>default</value>
<source>programatically</source>
</property>
<property xmlns="">
<name>mapreduce.job.queuename</name>
<value>default</value>
<source>programatically</source>
</property>
</configuration>
</sqoop>
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Sqoop
05-16-2017
07:56 AM
it's not only an issue with split, also with eg concat. escaping a semicolon in hue works fine but beeline errors with the same error as above. Replacing the semicolon with \073 works here as well.
... View more
01-13-2017
08:38 AM
If you want to use crontab that you already decided to use a time-trigger/interval right? You should really use a coordinator.
If you really want to stick with crontab than the command is more or less correct. You have a typo (--oozie should be -oozie, and the port normally is 11000 but I guess you already confirmed the port?) and normally you refer to a job.properties file (stored locally not on hdfs) with -config.
So it should look like: oozie job -oozie http://sandbox.hortonworks.com:11000/oozie -config /path/to/job.properties -run
In the job.properties file you would have parameters listed like namenode, jobtracker, hcmetastoreuri and of course the one you provide via the -D: oozie.wf.application.path Normally the hdfs://namenode part can be omitted from the apppath url.
... View more
12-14-2016
11:58 AM
We are currently using a mysql metastore in our test environment so it is possible (we run hpd 2.5). Only thing is that oozie requires the sqoop-site.xml file to be placed somewhere in hdfs to access the metastore. We don't really like the idea that passwords are just stored like that..
... View more
11-03-2016
01:10 PM
The answer is outdated. It is possible to use a character attribute as split-by attribute. You only need to add -Dorg.apache.sqoop.splitter.allow_text_splitter=true after your 'sqoop job' statement like this: sqoop job -Dorg.apache.sqoop.splitter.allow_text_splitter=true \\
--create ${JOB_NAME} \\
-- \\
import \\
--connect \"${JDBC}\" \\
--username ${SOURCE_USR} \\
--password-file ${PWD_FILE_PATH} \\ no guarantees though that sqoop splits your records evenly over your mappers though.
... View more
10-27-2016
04:52 PM
worked! many thanks, you saved my day.
... View more
10-27-2016
04:02 PM
@Gayathri Reddy G or @Sindhu can you tell me exactly how you added the -Dorg.apache.sqoop.splitter.allow_text_splitter=true to your import statement? I'm trying to create a sqoop job with this import statement but it keeps failing: 16/10/27 17:53:25 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
16/10/27 17:53:25 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dorg.apache.sqoop.splitter.allow_text_splitter=true my shell script to create the sqoop job looks like this: #!/bin/bash
sqoop job \
--create <tablename> \
-- \
import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \
--connect "jdbc:<URL>" If I remove the -Dorg line all goes file but offcourse the execution of the job failes thats why I need to pass this allow_text_splitter as a parameter. I tried with double quotes, without, but no luck unfortunately
... View more
09-27-2016
02:59 PM
Hi njayakumar, we have sqoop working with the mysql metastore but oozie gives errors that it can't find the driver to connect to the sqoop metastore. Caused by: java.sql.SQLException: No suitable driver found for 'jdbc:mysql://<server>.com/sqoop' The mysql-connector-java.jar is available in the java folder mentioned by you, also available in: /usr/hdp/current/oozie folders libserver, libtools and lib (as symbolic link) and for sqoop in /usr/hdp/current/scoop/lib folder any thoughts what we are missing here? I already got the same error trying to use the sqoop metastore 'service'. Oozie wasn't able to find that driver either...
... View more
07-21-2016
12:21 PM
1 Kudo
@Artem Ervits, we use a shell script to invoke a oozie workflow. Our script polls certain folders and if there are files they will be passed to the new invoked workflow. The shell script looks something like this: #!/bin/bash -e
for file in $(hdfs dfs -ls -R $pollfolder | grep "^-" | grep -Po "($pollfolder/[a-zA-Z]{2}_.*/[a-zA-Z]{2}_.*-[0-9]{1,}-.*.csv.gz)" | grep -vE '('$automatedfolder'|'$quarantinefolder')')
do
oozie job -oozie $ooziebaseurl -config $jobproperties -run \
-D file=$file \
This shell script can then be a shell action in a separate workflow that is triggered by a coordinator or can be just scheduled with cron. *I removed the creation of the variables that also happens in this script to save some space.
... View more
07-08-2016
11:38 AM
1 Kudo
I can confirm what
@Josh Persinger is saying. The only way to get tables with forward slashes ('/') (and actually colons too) in the tablename from SAP into hadoop hdfs/hive is by using the --query statement.
Some other things I found out when importing from SAP HANA:
a table name can be something like 'MSG\TABLENAME' or even worse: '[SCHEMA]::database.[TABLENAME]'. Just make sure you put the complete tablename between escaped double quotes:
eg:
\"/SOMETING/TABLENAME\" or \"[SCHEMA]::database.[TABLENAME]\"
we needed to add there where clause '\$CONDITIONS' even though we did a select * without any filters.
when limiting the result with a where clause the values have to be between single quotes:
eg. WHERE DDLANGUAGE='E'
SAP columns can contain empty values called SPACE (not the same as NULL) (shown as a '?' in the webIDE). If you want to exclude them use the where clause <>'' (just two singlequotes following each other):
WHERE DDLANGUAGE<>''
When making your command more readible I had to keep one extra parameter after the --query parameter. When I moved the --hive-import to the next line the command would fail (I think due to the ending quotes of the query.
The result should look something like this:
sqoop import --connect "jdbc:sap://[SERVER]:30015/?currentschema=[SCHEMA]" \ --username [USERNAME] \ --password-file file:///[PATH]/.pwd_file \ --driver com.sap.db.jdbc.Driver \ --query "select * from \"[/TABLENAME/WITH/FORWARDSLASHES]\" WHERE DDLANGUAGE='E' and [COLUMNNAME]<>'' and \$CONDITIONS" --hive-import \ --hive-database [HIVEDATABASE] \ --hive-table [TABLENAME] \ --hive-delims-replacement ' ' \ --fields-terminated-by '|' \ --lines-terminated-by '\n' \ --hive-overwrite
--num-mappers 1
... View more