Data Protector jobs may hang and never finish

This issue describes symptoms that occur for ages with HP Data Protector and that may become a critical issue for your if you don't see them. There will be no notification from DP at all if this happens. The main cause of these bugs seems to be very unreliable internally error handling process in Data Protector. I personally see this for many years now and nothing get's really better over time in these product if you don't stress your bugs.

Symptoms:

  • Job is running and cannot killed. You can try to kill it 1000 times and you can wait days for the kill, but it never get's killed.
  • Planned backup jobs with same job name are not running any longer as DP is waiting for the hanging job to finish.
  • Restarting all Data Protector services will not kill a job.

How to identify:

  • Check if your zombie job has:
    • Job had error with defective tape
    • Job had error with cannot rewind media
    • Job had error with cannot eject media
    • Other errors are also possible. Issue has been seen with backup to disk jobs, too.
      • Internal error in ("ma/xma/bma.c /main/hsl_dp61/hsl_hpit2_2/dp611_patch/3":3714) => process aborted
      • Unexpected close reading NET message => aborting.
      • Ipc subsystem reports: "IPC Invalid Handle Number"
  • Future backup jobs with same name are no longer running.
  • A server reboot kills the zombie job.

Workaround:

  • In all your backup specifications go to Options > Backup Specifications Options > Advanced and enable Reconnect broken connections. This at least seems to prevents the zombie process from never exiting. It does not solve the job failure bug.

Quick help to fix a broken environment:

Keep in mind - this is not a solution for the bug reported here - it only helps you to get rid of the zombie process to make sure your future backup jobs will run.

  1. Open task manager
  2. Order processes by name
  3. Kill bsm.exe

NOTE: HP has not provided a fix for this issue yet, but they are trying to identify the issue since November 2010 and I never received any fix until end of 2011. We fully migrated to Commvault backup solution and after 5 years with Commvault I can only suggest everyone to do the same.

Rating
Average: 6.1 (52 votes)