A Deep Dive into Graphiant’s Site Health Dashboard: Simplifying Day 2 Operations
In today’s fast-paced networking environment, managing thousands of alerts can become overwhelming for operations teams. Graphiant’s Site Health Dashboard aims to simplify this by consolidating and streamlining alerts into a manageable system. Let’s look at how this powerful tool enhances Day 2 operational simplicity for Graphiant’s customers, partners, and service providers.
What is the Graphiant Site Health Dashboard?
The Graphiant Site Health Dashboard is designed to process large amounts of network data and present it in an easy-to-understand format. Using a rules-based streaming engine, it collects logs, metrics, and events from various sources, helping to pinpoint critical issues that need immediate attention.
Key Features of the Site Health Dashboard:
- Rules-Based Streaming Engine: The dashboard processes millions of events, using predefined rules to deliver consolidated, correlated alerts that are actionable.
- Color-Coded Status Indicators: The dashboard uses color codes to indicate the health of different sites. Green means healthy, yellow suggests suboptimal performance, and red signals critical issues requiring immediate attention.
- Granular Breakdown by Planes: The system is divided into three planes: Data, Control, and System. This separation allows operators to quickly identify where an issue resides, whether in the overall system, data handling, or control mechanisms.
Navigating the Site Health Dashboard
The dashboard provides a simple, high-level view of a network and allows for more detailed examination. Operators can click on specific sites or devices to drill into issues affecting particular interfaces or systems.
Visual Indicators for Easy Troubleshooting
For example, if there’s a system plane issue, an exclamation mark will appear next to the problem, highlighting it immediately. Further clicking on the site will reveal details about the affected device or interface, such as an SFP (Small Form-factor Pluggable) issue, the time of occurrence, ongoing status, and the plane where the problem was detected.
Handling Alerts: A Practical Example
Let’s walk through a real-world example of troubleshooting using Graphiant’s Site Health Dashboard. In this scenario, an operator notices an alert regarding a down interface. After drilling into the site, they see the issue is related to a test interface purposely left unplugged. By acknowledging this and disabling the alarm, the alert is resolved, turning the site status back to green.
This functionality provides a quick, visual way for operators to focus on what matters, filtering out noise and false positives that don’t require attention.
Managing Alarms and Rules
The “Alarms” section is another critical part of the dashboard. It displays all actionable alerts in real-time, such as BGP (Border Gateway Protocol) down, WAN interface flaps, and recommendations for action, such as reboots. Alarms are categorized into two types:
- Active Alarms: These indicate ongoing issues that need to be addressed.
- Recovered Alarms: These alarms represent issues that have been resolved automatically, such as when an interface that was down has recovered.
Rules Engine
The heart of the system is the rules engine. Predefined rules trigger each alarm. For example, an alarm will be triggered if the CPU remains above 80% utilization for more than five minutes. Once it drops below 80%, the alarm is cleared. These customizable rules give the flexibility to tailor the system to specific network environments.
Flexibility and Customization
While the rules engine comes with preset conditions, Graphiant’s customers can request specific rules to be added to the system, offering a high degree of flexibility. This ensures that the dashboard evolves with the needs of each network, becoming an even more powerful tool for operations teams.
Wrapping Up: Why the Graphiant Site Health Dashboard Matters
Graphiant’s Site Health Dashboard is a robust tool that helps organizations manage complex networks efficiently. It simplifies network management and troubleshooting by providing clear visual indicators, actionable alerts, and customizable rules. Operators can focus on critical issues without getting bogged down by unnecessary noise, improving overall network health and operational efficiency.
If you want to enhance your network management strategy, Graphiant’s Site Health Dashboard is a valuable asset in simplifying Day 2 operations.
Accept all cookies to watch the video.