Created on 02-04-2014 07:54 AM - edited 09-16-2022 01:53 AM
I knew this could come someday and now it is the day .... Oozie is running very slow and I believe it is due to using the embedded derby db.
Anyone has experience of migrating the data that is already in derby to an external mysql?
What do I loose if I just start with a clean mysql and resubmit all the oozie jobs?
Created 02-04-2014 10:48 AM
No problem!
The workflow and coordinator definitions are in the Hue DB and the xml for them are in HDFS, just none of the status history, that Hue pulls from Oozie. Everything you need to resubmit the coordinators and workflows is in the Hue DB and HDFS and clearing the Oozie DB won't cause any problems there.
Thanks
Chris
Created 02-04-2014 08:34 AM
Created on 02-04-2014 08:38 AM - edited 02-04-2014 08:39 AM
Hey,
Unfortunately this is something I've never been able to get working. The problem is Oozie stores the workflows in the DB as blobs, as a result, that makes migration very complex. The data in Oozie is the status of all the past workflows that have not been purged by the purge process and then info on all the current running oozie jobs. So if you resubmit the jobs, you will just lose status information about old jobs.
If you can afford to lose the historical data and then resubmit the jobs, I would strongly recommend going that route. If not, you can take a look at Razor SQL. It might be able to do it, however, it's not free, but there is a free trial for a trial run.
Hope this helps.
Created 02-04-2014 08:40 AM
These steps with Razor SQL will get you the historical data, but you would still need to resubmit the jobs:
1. Stop Oozie.
2. Backup the original Derby DB file somewhere.
Copy /var/lib/oozie/data/ to a backup location.
3. Gather the info for connecting to the Derby DB. On the Oozie server run the following grep command. It will give you all the DB connect info:
grep -A1 JPA "/var/run/cloudera-scm-agent/process/`ls -alrt /var/run/cloudera-scm-agent/process/ | grep OOZIE | tail -1 | awk '{print $9}'`/oozie-site.xml"
<name>oozie.service.JPAService.jdbc.driver</name>
<value>org.apache.derby.jdbc.EmbeddedDriver</value>
<name>oozie.service.JPAService.jdbc.username</name>
<value>sa</value>
<name>oozie.service.JPAService.jdbc.password</name>
<value></value>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:derby:/var/lib/oozie/data;create=true</value>
4. Install RazorSQL
5. Make a copy of your of the Derby DB files for Oozie in the path mentioned in step 2 above and copy to the host with RazorSQL.
6. From RazorSQL do the following:
Menu/Option
Action/Value
a. Connections
Add Connection Profile
b. Derby
-(continue)
c. Profile Name: Oozie DB
d. Database Directory
Point to the directory with the Oozie DB copy mentioned in step 5.
e. Click connect
f. DB Tools
Export Data
g. Check Multiple tables
h. Enter schema as "SA"
i. Generate SQL Statements
j. Generate SQL INSERT statements
k. Do not export the DDLS!!!
l. Export to single file, \ for escape single quotes, <SEMI-COLON> SQL statement separator
m. Select a filename (i.e. oozie.sql), then Save
8. Edit the resulting oozie.sql and replace "SA." with the name of the DB in your new database.
9. Verify the oozie.sql looks good.
10. In CM, reconfigure Oozie to point to the new DB.
11. In CM, run "Create Database" from the Actions drop down within the Oozie service.
12. Import the oozie.sql or oozie-processed.sql above.
Created 02-04-2014 10:44 AM
Thanks for all your replies. I think I will go with a fresh new mysql database, the oozie history isn't that important to me.
Just one more question ...... the workflows and coordinators configured via hue ... this info is stored in the Hue database right?
Created 02-04-2014 10:48 AM
No problem!
The workflow and coordinator definitions are in the Hue DB and the xml for them are in HDFS, just none of the status history, that Hue pulls from Oozie. Everything you need to resubmit the coordinators and workflows is in the Hue DB and HDFS and clearing the Oozie DB won't cause any problems there.
Thanks
Chris
Created 01-16-2018 06:20 AM
Hi @cconner
I've connected to the Hue database on Mysql. I see all the the tables prefixed with oozie_ however I do not see any meaninful data in these tables. Can you explain where in this DB schema the workflow definitions are stored?
Thanks
Shak