Description:
Satellite on-board software must be able to detect abnormal situations, react appropriately, and recover from failures whenever possible, sometimes without ground intervention. This is the role of Fault Detection, Isolation and Recovery (FDIR).
The goal of this project is to design and implement the FDIR architecture of the CHESS Flight Software based on existing best practices and standards (
SOURCE paper, ECSS standard, ESA Savoir FDIR handbook, ESA Engineering Guidelines for CubeSat Projects, internal CHESS FDIR documentation). The Flight Software is based on the
F´ framework developed by NASA and runs on a redundant on-board computer architecture composed of two CPUs. When the Flight Software heartbeat is lost, the system switches to the backup CPU.
The project will focus on integrating FDIR concepts into the existing Flight Software architecture and the mechanisms already provided by the F´ framework. The student will also work closely with subsystem teams (EPS, ADCS, Telecom, Payloads, etc.) to define fault detection and recovery behaviors matching the constraints of each subsystem.
Finally, the implementation will be validated through dedicated fault injection and recovery scenarios executed inside the NEST simulation framework developed by the Flight Software Team.
Tasks: - Design and implementation of the FDIR architecture for the Flight Software.
- Implementation of hierarchical fault handling and heartbeat supervision mechanisms:
1) monitoring the health of individual Flight Software components
2) attempting local recovery when possibleescalating only when recovery fails
3) aggregating component health into a global Flight Software heartbeat
4) triggering a switch from CPU A to CPU B if the global heartbeat is lost
5) distinguishing recoverable and irrecoverable failures
- Design of the FDIR architecture around existing F´ mechanisms/components and clean integration with the current Flight Software architecture.
- Extension and improvement of the current TlmMonitor work for telemetry monitoring and fault detection.
- Coordination with subsystem teams to define subsystem-specific fault detection and recovery behaviors.
- Implementation of telemetry filtering mechanisms to avoid reacting to short telemetry spikes or inconsistent readings.
- Design of mechanisms to persist critical information across reboots:
1) reboot cause
2) safe Mode state
3) current state during LEOP sequence
- Development of validation scenarios in NEST to test the FDIR implementation.
- Production of an FDIR architecture document.
Background and skills:
• Good knowledge of embedded C/C++ programming
• Experience with microcontrollersUnderstanding of digital communication protocols
• Basic understanding of power electronics or willingness to learn
• Familiarity with debugging tools (JTAG/SWD, logic analyzer, oscilloscopes)
• Interest in reliable and fault-tolerant embedded systems for space applications