티스토리 뷰

카테고리 없음

Math SIMD

newpolaris 2020. 1. 31. 15:32

glm

컴파일 단계에서 결정

https://github.com/g-truc/glm/blob/master/glm/simd/platform.h

#    if defined(__x86_64__) || defined(_M_X64) || defined(_M_IX86) || defined(__i386__)
#        define GLM_ARCH (GLM_ARCH_X86)
#    elif defined(__arm__) || defined(_M_ARM)
#        define GLM_ARCH (GLM_ARCH_ARM)

강제하는 define도 존재

#elif defined(GLM_FORCE_NEON)
#    if __ARM_ARCH >= 8
#        define GLM_ARCH (GLM_ARCH_ARMV8)
#    else
#        define GLM_ARCH (GLM_ARCH_NEON)
#    endif
#    define GLM_FORCE_INTRINSICS
#elif defined(GLM_FORCE_AVX2)
#    define GLM_ARCH (GLM_ARCH_AVX2)
#    define GLM_FORCE_INTRINSICS

일부 window flag에 따른 구분과 함께

https://github.com/g-truc/glm/blob/master/glm/gtc/integer.inl

cpuid 를 통한 조사

https://github.com/Mysticial/FeatureDetector

https://gist.github.com/coderluna/82478e4e4d5258b6bf6f

cpuid 를 통해 얻고, OS 검증

https://msparkms.tistory.com/entry/SIMD%EB%A5%BC-%EC%9D%B4%EC%9A%A9%ED%95%9C-%EC%88%98%ED%95%99-%EB%9D%BC%EC%9D%B4%EB%B8%8C%EB%9F%AC%EB%A6%AC-%EB%A7%8C%EB%93%A4%EA%B8%B0-2-CPU-%EC%8B%9D%EB%B3%84%ED%95%98%EA%B8%B0?category=380167

__try
{
    __asm xorps    xmm0, xmm0               
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
    if (STATUS_ILLEGAL_INSTRUCTION == _exception_code()) 
        return false;

    return true;                                         
}

cmake 에 AVX2 설정

https://stackoverflow.com/questions/54114287/visual-studio-not-recognizing-avx2-or-avx

vectorclass

컴파일시 지원 버전 선택
최하 SSE2 버전 선택 후 동적 dispatch를 사용자가 알아서 해야하는 듯

// Dispatcher
float myfunc_dispatch(float * f) {
    int iset = instrset_detect();                          // Detect supported instruction set
    if      (iset >= 10) myfunc_pointer = &myfunc_AVX512;  // AVX512 version
    else if (iset >=  8) myfunc_pointer = &myfunc_AVX2;    // AVX2 version
    else if (iset >=  5) myfunc_pointer = &myfunc_SSE41;   // SSE4.1 version
    else if (iset >=  2) myfunc_pointer = &myfunc_SSE2;    // SSE2 version
    else {
        // Error: lowest instruction set not supported
        fprintf(stderr, "\nError: Instruction set SSE2 not supported on this computer");
        return 0.f;
    }
    // continue in dispatched version of the function
    return (*myfunc_pointer)(f);
}

https://www.agner.org/optimize/vcl_manual.pdf

9.9 Instruction sets and CPU dispatching



https://www.youtube.com/watch?v=TKjYdLIMTrI

arch 선택은 define으로 구분

#ifndef INSTRSET
#if defined ( __AVX512VL__ ) && defined ( __AVX512BW__ ) && defined ( __AVX512DQ__ ) 
#define INSTRSET 10
#elif defined ( __AVX512F__ ) || defined ( __AVX512__ )
#define INSTRSET 9
#elif defined ( __AVX2__ )
#define INSTRSET 8

https://github.com/vectorclass/version2/blob/master/instrset.h
https://github.com/vectorclass/version2/blob/master/instrset_detect.cpp

DirectXMath

내부적으로 선택

https://github.com/newpolaris/Mikudayo/blob/master/Core/Math/Matrix4.h

https://github.com/microsoft/DirectXTK/wiki/Mixing-SimpleMath-and-DirectXMath

댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크