2021年3月12日星期五

Speed of code using AVX compared to normal code (C99)

So I decided to take a look at how to use SSE, AVX, ... in C via Intel® Intrinsics. Not because of any actual interest to use it for something, but out of pure curiosity. Trying to check if code using AVX is actually faster than non-AVX code, I was a bit surprised by the results. Here is my C code:

#include <stdio.h>  #include <stdlib.h>    #include <emmintrin.h>  #include <immintrin.h>      /*** Sum up two vectors using AVX ***/  #define __vec_sum_4d_d64(src_vec1, src_vec2, dst_vec) \    _mm256_store_pd(dst_vec, _mm256_add_pd(_mm256_load_pd(src_vec1), _mm256_load_pd(src_vec2)));    /*** Sum up two vectors without AVX ***/  #define __vec_sum_4d(src_vec1, src_vec2, dst_vec) \    dst_vec[0] = src_vec1[0] + src_vec2[0];\    dst_vec[1] = src_vec1[1] + src_vec2[1];\    dst_vec[2] = src_vec1[2] + src_vec2[2];\    dst_vec[3] = src_vec1[3] + src_vec2[3];      int main (int argc, char *argv[]) {    unsigned long i;      double dvec1[4] = {atof(argv[1]), atof(argv[2]), atof(argv[3]), atof(argv[4])};    double dvec2[4] = {atof(argv[5]), atof(argv[6]), atof(argv[7]), atof(argv[8])};     #if 1    for (i = 0; i < 3000000000; i++) {      __vec_sum_4d(dvec1, dvec2, dvec2);    }  

没有评论:

发表评论