Umbrella Issue: Performance Counters #12882
Labels
perf
for issues tracking performance problems/improvements
performance monitoring
Feature or bug related to performance monitoring
Introduction
Performance monitoring is essential for optimizing system performance. This umbrella issue focuses on implementing a performance counters system that encompasses both hardware and software counters, accessible from kernel and user-space environments. The system aims to provide deep insights into the accelerator's operation, and to facilitate detailed performance analysis and debugging.
Note: Individual issues for each task will be opened and linked to this umbrella issue.
Hardware Counters
The hardware counters are accessed through debug registers and can be directly accessed from the RISC cores. There are five physical hardware units that provide access to number of different counters. These counters provide valuable data on various aspects of the accelerator's performance, such as Floating Point Unit (FPU) utilization, Level 1 (L1) cache behavior, and other critical metrics.
Software Counters
In addition to hardware counters, there is a need for software counters to measure the duration of specific functions or events within the firmware running on the RISC cores. These include measuring times between key operations like:
Data Collection Considerations
Data collection strategies must balance the need for detailed information with performance overhead. Options include collecting all values for specific counters, sampling every nth value, or summarizing data. Enabling software counters affects firmware performance, so it's crucial to enable them judiciously, possibly through compile-time options.
Objective
The goal is to develop a robust performance counters system that integrates both hardware and software counters, provides efficient data collection and storage mechanisms, and allows for detailed performance analysis without significantly impacting system performance.
Milestone 1: Establishing the Hardware Performance Counters Foundation
Objective: Lay the groundwork by defining all hardware performance counters and developing the kernel-level API, accompanied by comprehensive documentation and initial testing. Establishing a solid foundation with well-documented hardware counters and a robust kernel-level API is crucial before proceeding to more complex aspects of the project. Initial testing ensures that the core components function correctly, reducing potential issues later on.
Definition and Documentation of Hardware Counter Types: Define and document all hardware performance counters, including their usage across different architectures and RISC cores.
Kernel-Level API Development for Hardware Performance Counters: Develop a kernel-level API to interact with hardware performance counters effectively, accessible from the RISC cores via debug registers.
Testing (Initial Phase)
Milestone 2: Development of Data Collection Strategies and Software Performance Counters
Objective: Formulate efficient data collection strategies and implement software performance counters within firmware, addressing challenges related to limited shared memory and data transfer mechanisms.
Data Collection Strategies: Formulate strategies for collecting performance data efficiently, balancing detail and performance overhead.
Software Performance Counters
Milestone 3: Comprehensive Testing and Integration with Existing Toolchains
Testing (Advanced Phase)
Integration with Existing Toolchains:
Milestone 4: System Optimization and Documentation Updates
Objective: Optimize the performance counters system to minimize overhead, implement compile-time options for flexibility, and update documentation to reflect best practices and performance considerations.
fyi @ttmtrajkovic
The text was updated successfully, but these errors were encountered: