Error monitoring and management

From All n One's bxp software Wixi

Jump to: navigation, search

1 Intro

Error monitoring and management are a massive part of the All n One operation and their are a number of levels and redundancy checks performed on an ongoing basis to support bxp.

This document provides some background to our operations.

2 Security overlap

There is overlap between security and error management. For example if a DDOS (Distributed Denial Of Service) attack was happening and not managed this would appear to user as an error. So firstly there are a number of live monitoring solutions applied 24x7 to our solutions.

3 Live hosting monitoring

Sungard Availability Services provide 24x7 monitoring and management of the network system. This monitoring with our high availability architecture ensures that connectivity remains at 100%.

If issues are detected they are reported by email to the directors and security department of All n One. All notifications are put through our traige process Bxp_Support_Triage_Process

4 Operatonal error monitoring

The primary hours of operation of our clients and our Dublin offices are 08:00 till 18:00. For this reason All n One operate error monitoring in the office through the Contact department.

The primary capability of this approach is enable as an extensive error management system which is integrated into the operational levels of the solution. So with Sungard watching the hardware, All n One watch the software.

The error management solution reports errors using our Hamsters Meet_the_Hamsters. Errors are grouped by type.

  • Not Found (404): White hamster
  • Error (500): Green hamster

As 500 errors can be for various reasons, we continuously reviewing the errors and adding extra intelligence to the errors. For this reason green hamsters are graded.

  • Lime: A temporary error due to connection resets or similarly fleeting issue ( click back and try again is the usual instant fix)
  • Jade: A recognised programming error.
  • Olive: An error created by one of the All n One staff.
  • Emerald: Error usually with a configuration of an account (very rare)
  • Green: Unknown error (for the moment!)
  • Apple: bxp API Error
  • Mint: bxp server IIS Recycle

There are also numerous software messages we track and monitor in our operation.

  • Grey: Security access issues within bxp
  • Red: Suggestion / Help Me!

The monitoring team have a live wall board on a 50" screen, which looks like.

Hamster Board.PNG

This report is also available on their own machines with click through for instant error management.

5 Triage

Once an error is detected it enters our triage management process Bxp_Support_Triage_Process