Overview
Enterprise deployments of Beeks Analytics should be integrated with each customer’s main operational monitoring and alerting system(s).
This section highlights the main components involved, what should be monitored, and with what mechanisms.
Main Components
Incident Management Platform
This is provided by the customer and is the central endpoint that alerts should be routed to. This is usually responsible for onbound routing, silencing and general lifecycle of alerts/incidents.
Examples of this could be Opsgenie, Pager Duty, Jira Service Desk.
However, it could also be as simple as a Slack or Teams channel, or even a mailing list.
Appliance iDRAC Monitoring
Beeks Analytics appliances are deployed on Dell PowerEdge servers equipped with Integrated Dell Remote Access Controller (iDRAC). iDRAC provides out-of-band hardware monitoring for server health, event logging, and alert generation for all hardware components including power supplies, fans, memory, and storage.
Hardware Alerts and Notifications
iDRAC can be configured to send notifications via:
Email (SMTP) — for direct alert delivery
SNMP traps — to external monitoring systems
Remote syslog — to centralized log aggregation
These mechanisms allow proactive notification of hardware issues before they impact analytics operations.
To configure email alerting for system events, refer to Dell’s official guidance .
For SNMP trap forwarding and remote syslog configuration, Dell provides step-by-step instructions in their support knowledge base. Stats can be polled via SNMP, however we capture most of those internally as part of the Appliance Internal Monitoring, and these can be viewed easily in the VMX Health dashboards.
See this Dell knowledge base article for how to configure alerting rules in iDRAC, and see the list of alerts here.
Event and Error Message Reference
Dell publishes a reference of iDRAC Error and Event Messages (EEMs) . This is useful for interpreting the codes and conditions reported by iDRAC.
Supplementing the Self-Monitoring with Customer-Specific Monitoring Tools
Beeks Analytics is self monitoring, but it is always recommended to supplement Beeks Analytics own monitoring with customer-specific monitoring tools. In addition, monitoring is performed by the Grafana component and so cannot monitor all situations for Grafana or severe OS failure. It is recommended that additional alerts are put in place for these using a customer-approved monitoring tool.
Monitoring Server Health with Customer-Specific Monitoring Tools
The most common failure is disk failure. Beeks Analytics appliances specify RAID configuration for all drives, largely eliminating the chances of a catastrophic disk failure, but external monitoring is critical to prevent gradual unchecked degradation of the RAID array that will result in eventual catastrophic failure.
Refer to the above Dell articles on iDRAC monitoring to ensure monitoring is in place for these critical considerations.
Monitoring Appliance Server Availability with Customer-Specific Monitoring Tools
The following endpoints should be monitored by an external infrastructure monitoring system. Any that are down represent a high severity alert that needs immediate attention as the service may be compromised.
Appliance iDRAC
IP is up
HTTPS port is up
SSH port is up
Appliance OS
IP is up
HTTPS port is up
SSH port is up
Examples of suitable infrastructure monitoring systems: Geneos, Nagios, Zabbix, Datadog, Checkmk, LibreNMS.
Appliance Internal Monitoring
Appliance Health
Beeks Analytics collates a comprehensive set of health metrics via Telegraf into an integrated database on the appliance.
We provision the most important health monitoring alert rules alongside the appliance software implemented in Grafana.
https://grafana.com/docs/grafana/latest/alerting/
https://grafana.com/docs/grafana/latest/alerting/fundamentals/
Grafana Alerting is extremely flexible, the rules are provisioned but the routing of Alerts to Contact Points must be configured as part of commissioning. More details are provided in the following sections.
The range of contact points is documented below but includes options like Webhook, Email, Slack and more: https://grafana.com/docs/grafana/latest/alerting/fundamentals/contact-points/
Application and Data Alerts
Beeks Analytics can be configured with Grafana Alert rules based on the data that is collated from the infrastructure and applications being monitored - e.g. latency, loss, traffic patterns etc.
See the Beeks Analytics Data Guide for the list of metrics which you can alert on in a standard Beeks Analytics deployment.
The same Grafana contact point integrations as the Appliance Health alerts can be used.
How Beeks monitors Beeks-Managed Appliances
Some clients choose to have Beeks monitor their appliances for them. This section provides information about the technologies that are used by Beeks to perform this function.
Incident Management Platform
Beeks use Jira as an incident management platform - it’s flexible with service queue management, rules to reroute and escalate issues. We make use of the following endpoints for integration:
REST API
Email
WebHook
Appliance iDRAC Monitoring
Beeks use LibreNMS as an infrastructure monitoring solution. We configure iDRAC to send alerts to a remote syslog server running on the LibreNMS server, with rules to scrape the syslog and generate an alert for each entry. The severity from the alert is extracted and used to add context to the alert that has been raised. Alerts are transmitted to Jira via the Jira REST API plugin in LibreNMS.
Appliance Server Monitoring
Beeks also use LibreNMS to monitor the health of the Appliance Server itself. LibreNMS is configured to poll all recommended endpoints and raise alerts to Jira using the same API endpoint as explained above.
Appliance Internal Monitoring
Grafana is configured with a Webhook Contact Point for all alerts which sends the alert details to Jira.