Support Questions

Find answers, ask questions, and share your expertise

Oozie embedded derby to mysql, what is the best way to go about this?

avatar
Contributor

I knew this could come someday and now it is the day .... Oozie is running very slow and I believe it is due to using the embedded derby db.

 

Anyone has experience of migrating the data that is already in derby to an external mysql? 

 

What do I loose if I just start with a clean mysql and resubmit all the oozie jobs?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

No problem!

 

The workflow and coordinator definitions are in the Hue DB and the xml for them are in HDFS, just none of the status history, that Hue pulls from Oozie.  Everything you need to resubmit the coordinators and workflows is in the Hue DB and HDFS and clearing the Oozie DB won't cause any problems there.

 

Thanks

Chris

View solution in original post

6 REPLIES 6

avatar
Mentor
The only thing you'll lose out if you do the fresh approach is the
history, but not the functionality/correctness of the newly added
jobs. I'd pick using the fresh mysql approach if it is an option,
cause delving down into extracting data from Apache Derby DB for
moving it onto MySQL is a tad more painful operation.

avatar
Super Collaborator

Hey,

 

Unfortunately this is something I've never been able to get working.  The problem is Oozie stores the workflows in the DB as blobs, as a result, that makes migration very complex.  The data in Oozie is the status of all the past workflows that have not been purged by the purge process and then info on all the current running oozie jobs.  So if you resubmit the jobs, you will just lose status information about old jobs.

 

If you can afford to lose the historical data and then resubmit the jobs, I would strongly recommend going that route.  If not, you can take a look at Razor SQL.  It might be able to do it, however, it's not free, but there is a free trial for a trial run.

 

Hope this helps.

avatar
Super Collaborator

These steps with Razor SQL will get you the historical data, but you would still need to resubmit the jobs:

 

1. Stop Oozie. 
2. Backup the original Derby DB file somewhere. 
Copy /var/lib/oozie/data/ to a backup location. 
3. Gather the info for connecting to the Derby DB. On the Oozie server run the following grep command. It will give you all the DB connect info: 
grep -A1 JPA "/var/run/cloudera-scm-agent/process/`ls -alrt /var/run/cloudera-scm-agent/process/ | grep OOZIE | tail -1 | awk '{print $9}'`/oozie-site.xml" 
<name>oozie.service.JPAService.jdbc.driver</name> 
<value>org.apache.derby.jdbc.EmbeddedDriver</value> 
<name>oozie.service.JPAService.jdbc.username</name> 
<value>sa</value> 
<name>oozie.service.JPAService.jdbc.password</name> 
<value></value> 
<name>oozie.service.JPAService.jdbc.url</name> 
<value>jdbc:derby:/var/lib/oozie/data;create=true</value> 
4. Install RazorSQL 
5. Make a copy of your of the Derby DB files for Oozie in the path mentioned in step 2 above and copy to the host with RazorSQL. 
6. From RazorSQL do the following: 
Menu/Option 
Action/Value 
a. Connections 
Add Connection Profile 
b. Derby 
-(continue) 
c. Profile Name: Oozie DB 
d. Database Directory 
Point to the directory with the Oozie DB copy mentioned in step 5. 
e. Click connect 
f. DB Tools 
Export Data 
g. Check Multiple tables 
h. Enter schema as "SA" 
i. Generate SQL Statements 
j. Generate SQL INSERT statements 
k. Do not export the DDLS!!! 
l. Export to single file, \ for escape single quotes, <SEMI-COLON> SQL statement separator 
m. Select a filename (i.e. oozie.sql), then Save 
8. Edit the resulting oozie.sql and replace "SA." with the name of the DB in your new database. 
9. Verify the oozie.sql looks good. 
10. In CM, reconfigure Oozie to point to the new DB. 
11. In CM, run "Create Database" from the Actions drop down within the Oozie service. 
12. Import the oozie.sql or oozie-processed.sql above.

avatar
Contributor

Thanks for all your replies.  I think I will go with a fresh new mysql database, the oozie history isn't that important to me. 

 

Just one more question ...... the workflows and coordinators configured via hue ... this info is stored in the Hue database right?

avatar
Super Collaborator

No problem!

 

The workflow and coordinator definitions are in the Hue DB and the xml for them are in HDFS, just none of the status history, that Hue pulls from Oozie.  Everything you need to resubmit the coordinators and workflows is in the Hue DB and HDFS and clearing the Oozie DB won't cause any problems there.

 

Thanks

Chris

avatar
Explorer

Hi @cconner

 

I've connected to the Hue database on Mysql.  I see all the the tables prefixed with oozie_ however I do not see any meaninful data in these tables.  Can you explain where in this DB schema the workflow definitions are stored?

 

Thanks

 

Shak