So I decided to take a look at how to use SSE, AVX, ... in C via Intel® Intrinsics. Not because of any actual interest to use it for something, but out of pure curiosity. Trying to check if code using AVX is actually faster than non-AVX code, I was a bit surprised by the results. Here is my C code:
#include <stdio.h> #include <stdlib.h> #include <emmintrin.h> #include <immintrin.h> /*** Sum up two vectors using AVX ***/ #define __vec_sum_4d_d64(src_vec1, src_vec2, dst_vec) \ _mm256_store_pd(dst_vec, _mm256_add_pd(_mm256_load_pd(src_vec1), _mm256_load_pd(src_vec2))); /*** Sum up two vectors without AVX ***/ #define __vec_sum_4d(src_vec1, src_vec2, dst_vec) \ dst_vec[0] = src_vec1[0] + src_vec2[0];\ dst_vec[1] = src_vec1[1] + src_vec2[1];\ dst_vec[2] = src_vec1[2] + src_vec2[2];\ dst_vec[3] = src_vec1[3] + src_vec2[3]; int main (int argc, char *argv[]) { unsigned long i; double dvec1[4] = {atof(argv[1]), atof(argv[2]), atof(argv[3]), atof(argv[4])}; double dvec2[4] = {atof(argv[5]), atof(argv[6]), atof(argv[7]), atof(argv[8])}; #if 1 for (i = 0; i < 3000000000; i++) { __vec_sum_4d(dvec1, dvec2, dvec2); }
没有评论:
发表评论