Member since
05-19-2016
216
Posts
20
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4281 | 05-29-2018 11:56 PM | |
7154 | 07-06-2017 02:50 AM | |
3848 | 10-09-2016 12:51 AM | |
3658 | 05-13-2016 04:17 AM |
02-13-2017
07:37 PM
@ Simran Kaur, Can you re post the link to setting MYSQL as metastore for Sqoop. The above link is broken.
... View more
05-07-2016
07:34 PM
1 Kudo
1) You essentially have two options. Use Sqoop import-all-tables with exclude as you mention. However in that case you have a single sqoop action in oozie and no parallelity in oozie. However sqoop might provide that. You have some limitations though ( only straight imports all columns , ... ) Alternatively you make an oozie flow that uses a fork and then one single table sqoop action per table. In that case you have fine grained control over how much you want to run in parallel. ( You could for example load 4 at a time by doing Start -> Fork -> 4 Sqoop Actions -> Join -> Fork -> 4 Sqoop Actions -> Join -> End 2) If you want incremental load I don't think the Sqoop import-all-tables is possible. So one Sqoop action per table it is. Essentially you can either use Sqoop incremental import functionality ( using a property file ) or use WHERE conditions and give through the date parameter from the coordinator. You can use coord:dateformat to transform your execution date. 3) Run One coord for each table OR have a Decision action in the oozie workflow that skips some sqoop actions Like Start -> Sqoop1 where date = mydate -> Decision if mydate % 3 = 0 then Sqoop2 else end. 4) incremental imports load the new data into a folder in HDFS. If you run it the folder needs to be deleted. If you use append it doesn't delete the old data in HDFS. Now you may ask why would I ever not want append and the reason is that you mostly do something with the data after like importing the new data to a hive partitioned table. If you would use append he would load the same data over and over.
... View more
02-12-2018
03:14 PM
Can you please let me know where to try this?
... View more
05-06-2016
06:24 AM
1 Kudo
Try replacing --target-dir with --warehouse-dir. Table t1 will be imported into directory warehouse-dir/t1. Regarding Hive, add --hive-import, the very first time use --create-hive-table, and after that use --hive-overwrite. If troubles continue, test your Oozie Sqoop action on a single table import into hdfs, just to make sure you have the right syntax. After that retry import-all-tables.
... View more
09-05-2018
04:29 PM
@simran kaur
The link you provided is not accessible. kindly send the solution you made as i'm also facing the same problem.
... View more
02-07-2019
12:18 PM
hi All, I would like share that this has been fixed by changing yarn-site.xml file. OLD Parameters -
<code><property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle,spark_shuffle,spark2_shuffle</value>
</property>
New Parameters <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> Please Note - this is for those users who has install Yarn+MRV2 and not for spark & spark 2. Thanks
Pawan Giri
... View more
05-04-2016
09:18 AM
1 Kudo
xmlns stands for XML Name Space, you can find general introduction here. In Oozie workflows there are 2 xmlns specified, the one on top: <workflow-appname="once-a-day"xmlns="uri:oozie:workflow:0.1"> defines XML tags for Oozie workflow files in general. The other one: <sqoop xmlns="uri:oozie:sqoop-action:0.2"> defines XML tags specific to Sqoop action, you can find its definition here. In your case xmlns is not a problem. If it were Oozie would reject your workflow xml file as incorrect, for example, because of using non-existent tags, or existent ones in a wrong way.
... View more
02-24-2017
11:47 AM
Please post as new question and supply relevant logs and configs
... View more
02-24-2017
09:12 AM
@simran kaur Is the issue is solved for you? I am getting the same error (java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SqoopMain not found) Can somebody please help.
... View more
10-03-2016
06:32 PM
1 Kudo
do the following to increase the dfs size : Create multiple directories or mount points in the hdfs data path : by default ambari deployed cluster contain /hadoop/hdfs/data as the data directory , so with root privileges : create a directory 1) mkdir /hadoop/hdfs/data1 2) chown -R hdfs:hadoop /hadoop/hdfs/data1 3) chmod -R 777 /hadoop/hdfs/data1 now edit the hdfs configuration : 1) on the cluster click on hdfs , click on configs , in the settings add the directory separated by comma under the hdfs.data.dir property : ex : /hadoop/hdfs/data, /hadoop/hdfs/data1 save the changes and restart the effected That will increase the disk space , to increase further repeat the same (or) lvs resize /hadoop/hdfs/data directory , do the following to increase the dfs size : Create multiple directories or mount points in the hdfs data path : by default ambari deployed cluster contain /hadoop/hdfs/data as the data directory , so with root privileges : create a directory 1) mkdir /hadoop/hdfs/data1 2) chown -R hdfs:hadoop /hadoop/hdfs/data1 3) chmod -R 777 /hadoop/hdfs/data1 now edit the hdfs configuration : 1) on the cluster click on hdfs , click on configs , in the settings add the directory separated by comma under the hdfs.data.dir property : ex : /hadoop/hdfs/data, /hadoop/hdfs/data1 save the changes and restart the effected That will increase the disk space , to increase further repeat the same (or) lvs resize /hadoop/hdfs/data directory
... View more
- « Previous
- Next »