Management of Complicated Systems
Kevin O’Malley
November 14, 2018
Complicated systems, like the ones created here at Mercury, beg for access to a management module that can monitor health and control the behavior of modules that make up these systems. These management modules can be separate, integrated on each module/board, or strictly software applications. Our SMP Engineering team has dealt with these many types and have incorporated them, depending on the customer’s application and the level of security that is required.
Intelligent Platform Management Interface
Mercury's system management uses the Intelligent Platform Management Interface (IPMI) that is available on most modules. Some systems use the System Management Module (SMM) to control payload configurations and processes distributed throughout the system. Others use a Management Controller (MC) and/or Baseboard Management Controller (BMC). All of these communicate to each other over the Intelligent Platform Management Bus (IPMB).
The Intelligent Platform Management Interface (IPMI) has been around for quite some time and is the standard for system monitoring, employed by many system technologies including AdvancedTCA, MicroTCA, VME, VPX, VXS, CompactPCI, and CompactPCI Serial. IPMI also works with other standard software, such as Simple Network Management Protocol (SNMP) and Desktop Management Interface (DMI).
This architecture enables IPMI to be throughout a system, from board-level management to system-level management, on just about anything that can be monitored. System Management capabilities defined by IPMI include event logging, sensor monitoring, manual and automatic system recovery when sensor readings exceed pre-defined thresholds, the ability to collect information such as serial numbers, etc.
Chassis Management
The new VPX Chassis Manager (CM) enables remote access to any board and chassis that supports VITA 46.11. The primary functions of the VITA 46.11 Chassis Manager are inventory management, sensor management, system configuration, recovery and diagnostic management. The layers of management referenced by VITA 46.11 are IPMC, Chassis Manager and System Manager, which are hierarchical in nature. IPMC (board level management) communicates with the Chassis Manager, which then reports to the System Manager that's capable of monitoring multiple chassis.
The Chassis Management Controller (ChMC) can be implemented in several ways. These could be on one of the front-loading VPX plug-in modules, as a mezzanine board that plugs onto the backplane, or as a standalone board. Advantages of a VPX Chassis Management Controller:
- External access - The Chassis Manager, which has access to every IPMC in the system over the IPMB, supports an Ethernet interface to the System Manager.
- Cooling control - A typical method is to monitor board and chassis temperatures, then adjust the fan speed to maintain the predetermined range.
- Inventory management - The chassis manager maintains a full list of all intelligent Field Replaceable Units (FRUs) and boards as well as any other components that support VITA 46.11.
- Sensor management - A list of all sensors connected to each intelligent FRU in the system, along with any threshold or limits.
- Sensor Event log - Although the actual log size will vary among chassis managers, it provides a history of all events such as an over temperature condition or an under voltage condition, which will typically begin to overwrite the log when full, removing the older messages first.
- Diagnostics and recovery - The specific VPX boards in a system, and their compatibility with VITA 46.11, will determine the level at which the chassis manager can diagnose and respond to system events.
System Manager
At the top of the logical management layer is the System Manager, which oversees multiple chassis and will communicate with multiple chassis managers. The System Manager capability can be based on middleware, a SNMP MIB browser, custom software, RMCP, or something as simple as the ability to Telnet into a system. As more advanced monitoring functions are implemented, existing shelf management capabilities will continue to increase the effectiveness, value, and reliability of these embedded systems well into the future.
Using an SMM on a system allows you access to a system that may be closed, with only one way in or out. The use of an SMM gives you another way to reach the modules within the system if a payload module becomes inaccessible for whatever reason. Without System Management, these types of systems would be down, unusable, and need to be removed and sent to where they could be fixed, causing delays in the execution of their assigned task. Having an SMM, you can access the troubled module, investigate what happened, and possibly even repair the issue without an extended down time.
Think of it like this: You hear someone calling for help from your neighbor's house, but when you walk inside, you realized the power has been knocked. The house is totally pitch-black and you can't see anything. Since you're blinded, you're not able to help unless you brought your "System Management" flashlight. Once you turn it on, you can see the furniture and pathways so you can find the person calling for help and rescue them.
Open-Source Tools
Some open-source tools support and provide access to IPMI systems like OpenIPMI and IPMItool. OpenIPMI provides access to all IPMI functions, allowing its tools to manage any IPMI system from a Linux system. Used by IPMI applications, it consists of a device driver and a set of user libraries. IPMItool is an application that can interface with an IPMI system using IPMI-over-LAN interfaces or device driver interfaces like OpenIPMI.