I am testing BDR functionality and have not managed to create a working Hive replication job yet. Currently when running it I am getting
Message: The remote command failed with error message: Another Hive replication command is already running for Database: MY_TABLE_NAME Table: . on service HIVE-2.
I previously had Hive replication failing immediately because I had not specified the port (443) for the CM peering.
What is causing this to fail immediately? I cannot see any logs apart from the above error message.
I happen to be copying betwee two clusters within the same Cloudera Manager - but wont always be.
Cloudera Manager does some basic checks to find out if there are other Hive Replication commands running that involve the same databases and tables.
The fact that your error says "The remote command failed with error message" indicates that the Hive Export command failed on the source Cloudera Manager server.
I would open up the peer (source) Cloudera Manager and check to see what commands are running. Based on the response, there may be one or more Hive Export commands running. If they are, you can Abort them if you want to continue testing.
After doing that, you can try running Hive replication from you destination cluster's Cloudera Manager again. If there are no other Hive replication commands running, you should not see this failure.
> "The remote command failed with error message" indicates that the Hive Export command failed on the source Cloudera Manager server.
OP said: I happen to be copying betwee two clusters within the same Cloudera Manager
I can clearly see that there are no running Hive Replications. I am the only person who has tried BDR in the entire company. The source Cloudera Manager is the same as the target Cloudera Manager. Only the cluster is different.
There are no working/running Hive Replications.
I know this is frustrating; however, in order to isolate the cause, we need to be clear about the details.
When you run a Hive Replication command, Cloudera Manager checks for any other "Active" Hive Replication Commmands.
if it finds any Hive Replication Commands listed as in a STARTED state based on a query of its database, then it will check the arguments and return the error you see if it sees that there are any conflicts.
The question becomes: why is a Hive Replication command detected as STARTED if no replication command is running.
From what you mention, it could be that there is something out of sync if there are no Hive commands listed as running in CM.
I would recommend restarting Cloudera Manager (servcie cloudera-scm-server restart (from the command line)) when you have an opportunity.
My feeling is that it is more likely there is something transiet going on here that may be very complex to debug over a community board. Hopefully restarting will clear out jvm objects and build afresh from the database, thereby eliminating the condition that led to the issue.
If this does not help, then let us know... it is possible to mimic the database query that is responsible for detecting active commands.
> If this does not help, then let us know...
cloudera Manager has been restarted without success.
> it is possible to mimic the database query that is responsible for detecting active commands.
That would be excellent if you could give me that query then I could see what is causing it to report incorrectly. Thanks!
I can see records in the COMMANDS table with NAME HiveReplicationCommand appearing with STATE STARTED and then immediately STATE changes to FINISHED, but I cannot see why hive Replication sees this as not itself.
COMMAND_ID bigint(20) NO PRI NULL
NAME varchar(255) NO NULL
STATE varchar(255) YES MUL NULL
START_INSTANT bigint(20) YES MUL NULL
END_INSTANT bigint(20) YES NULL
ACTIVE int(11) YES MUL NULL
RESULT_MESSAGE longtext YES NULL
RESULT_DATA mediumblob YES NULL
RESULT_DATA_MIME_TYPE varchar(255) YES NULL
RESULT_DATA_FILENAME varchar(255) YES NULL
SUCCESS bit(1) YES NULL
SERVICE_ID bigint(20) YES MUL NULL
ROLE_ID bigint(20) YES MUL NULL
PARENT_ID bigint(20) YES MUL NULL
HOST_ID bigint(20) YES MUL NULL
RESULT_DATA_PATH varchar(255) YES NULL
RESULT_DATA_REAPED bit(1) YES b'0'
CLUSTER_ID bigint(20) YES MUL NULL
OPTIMISTIC_LOCK_VERSION bigint(20) NO 0
SCHEDULE_ID bigint(20) YES MUL NULL
ARGUMENTS longtext YES NULL
AUDITED bit(1) NO b'0'
FIRST_UPDATED_INSTANT bigint(20) YES NULL
CREATION_INSTANT bigint(20) YES NULL
While no replication commands are running, does the query for HiveReplicationCommand with STATE STARTED return any results?
If not, this is indeed quite a mystery.
The only curiosity here for me is that you are replicating from one Hive Service to another on the same cluster.
I can't explain how that would lead to this particular condition, so it may not be involved.
The codes hows that the error you get is coming directly out of the result of finding a STARTED HiveReplicationCommand that is configured to copy the same database/tables, so the answer must be there.
Thanks for the clarification on the CM / Hive Service situation ... I get it now.
Can you confirm that when you query your database for NAME = HiveReplicationCommand and STATE = STARTED that nothing is returned if you have no replication schedules running. I wasn't quite sure based on your previous comment.
The query that is performed to generate the result you are seeing shouldn't care about clusters or anything I think. I'll double check and let you know if I find differently.
I checked the docs and the indications are that you should not need to configure a peer of both source and target Hive services are managed in the same Cloudera Manager.
If you did configure a peer for the replication schedule, maybe let's try creating a new replication schedule that does not use a peer. You should be able to select the source Hive service from the desired source without the peer. I am wonding if the remote execution of that Hive Export command is failing since the parent command is already running...
Perhaps without the peer, the conflict check handles that...
Just a thought.