![]() |
Нижегородский государственный университет им.Н.И.Лобачевского. |
![]() |
Лаборатория ITLab![]() ![]() |
Switch to English version |
![]() |
![]() Текущие новостиСеминар «Об опыте оптимизации приложений для Intel Xeon Phi»15.10 в 18:00 в 114 ауд. состоится семинар.
Докладчик: Роман Лыгин (Интел).
Аннотация:
The Intel(R) Xeon Phi(TM) coprocessor is getting momentum in the HPC community and software developers are looking forward to give it a try with their legacy applications. However the developers often face a challenge to efficiently take advantage of available compute power what may undermine their first experience and thereby lead to disappointment. Although Intel does deliver on a promise of the same programming models and tools across the Xeon and Xeon Phi processors, reasonable efforts are still required from developers to efficiently port their applications to the coprocessor.
This talk will present a practical case study of porting the Tachyon, an open source ray tracer, part of the SpecMPI suite, to Xeon Phi. Initial port revealed disappointing performance, e.g. combined Xeon and Xeon Phi version ran 2.6x slower than Xeon-only version. To achieve good performance some code modifications had to be introduced improving both Xeon and Xeon Phi parts. The talk will demonstrate the use of Intel(R) Cluster Studio XE to pinpoint the problems and will highlight key code changes which helped achieve significant improvements (up to 7x vs from initial baseline, and 1.8x speed up vs improved Xeon version). The application exploits parallelism at multiple levels - symmetric MPI execution model, OpenMP-based multi-threading, and explicit SIMD (using SSE2/AVX/Xeon Phi instructions).
The talk will highlight use of software tools – Intel(R) Trace Analyzer and Collector, and Intel(R) VTune(TM) Amplifier XE in combination with MPI* and OpenMP* programming models, as well as a SIMD-enabled 3D vector operations library (reused and extended from Embree, the open source ray tracer by Intel Labs). Algorithmic changes include MPI-based dynamic scheduling, introduction of explicit intrinsics-based SIMD support, enabling greater OpenMP parallelism capacity.
The case study demonstrates the Intel’s message “optimization for Xeon pays off on Xeon Phi, and vice versa”. The talk may represent interest for the software developers who could re-apply this practical experience, ideas and approaches back to their projects.
| ![]() |
Новости14.11.2015
16.10.2015
16.10.2015
14.10.2015
20.09.2015
![]() |
© ITLab, Нижний Новгород, 2009 |