Let MindShare Bring "Intel® x86 Code Performance Analysis" to Life for You
This course is offered in two variants – a Linux variant and a Windows variant, reflecting the differences between the tools in the two different environments. The class will examine the performance of application code executing on the processor cores, with the assistance of the various tools available, and explore ways to optimize the performance. The class will include live demonstrations of the tools, allowing the students’ questions to drive the demonstrations. The processors discussed in the class are Intel’s core architecture.
MindShare Courses Related to Intel x86 Code Performance Analysis:
All of MindShare's classroom and virtual classroom courses can be customized to fit the needs of your group.
Intel x86 Code Preformance Analysis
You Will Learn:
- The x86 architectural features impacting the performance of application code, and possibilities for taking advantage of them:
- The cache system
- Multiple processor cores
- Out-of-order execution of instructions
- Virtual memory mappings and the TLB
- Some of the tools available for analyzing the performance of application code
- The methodology for getting to good performance
Note, this is not an operating system tuning class. The examples analyzed are all applications, but the ideas are also relevant to kernel code. There are no student exercises included as part of the class.
Course Length: 3 Days
Who Should Attend?
This is a code optimization class geared toward programmers.
- Power and performance
- Importance of algorithm and data structures – design for performance
- Advantages and disadvantages of optimizing to a particular processor implementation (microarchitecture)
- Ways that code can be made to be more flexible
- The memory system
- Cache line bounce
- Microarchitectural optimizations
- Core pipeline overview
- Out-of-order execution
- Handling “slow” instructions
- Execution serialization (e.g. microcode, MMIO)
- Example optimizations, e.g loop unrolling
- Multiple processor cores
- Distribution of the workload across cores
- Core affinity
- Sharing issues
- Synchronization issues
- Compiler optimizations (Windows)
- Microsoft compiler
- Intel compiler
- Compiler optimizations (Linux)
- Hotspot analysis, using perf (Linux)
- Hotspot analysis using VTune (Linux)
- Use of vector instructions (SSE, AVX)
- Analysis using performance counters
- Cache misses, with perf (Linux), with WPR (Windows)
- Profile Guided Optimization
- System Topics
- Direct cache injection
- Overview of virtualization optimization
- Analysis in a virtualized environment
- Ways to make code “virtualization friendly”
Some experience of reading Intel assembly language is desirable, as the class will look at assembly code to see the details of execution. Familiarity with Intel processor and system architecture is also assumed.
Students will be provided with electronic version of the slides.