1. 19 May, 2022 1 commit
    • Field G. Van Zee's avatar
      Init/finalize via bli_pthread_switch_t API (#634). · 4603324e
      Field G. Van Zee authored
      - Defined and implemented a new pthread-like abstract datatype and API
        in bli_pthread.c. The new type, bli_pthread_switch_t, is similar to
        bli_pthread_once_t in some respects. The idea is that like a switch in
        your home that controls a light or ceiling fan, it can either be on or 
        off. The switch starts in the off state. Moving from one state to the 
        other (on to off; off to on) causes some action (i.e., a startup or
        shutdown function) to be executed. Trying to move from one state to 
        the same state (on to on; off to off) is safe in that it results in
        no action. Unlike bli_pthread_once(), the API for bli_pthread_switch_t 
        contains both _on() and _off() interfaces. Also, unlike the _once()
        function, the _on() and _off() functions return error codes so that
        the 'int' error code returned from the startup or shutdown functions
        may be passed back to the caller. Thanks to Devin Matthews for his
        input and feedback on this feature.
      - Replaced the previous implementation of bli_init_once() and 
        bli_finalize_once() -- both of which used bli_pthread_once() -- with 
        ones that rely upon bli_pthread_switch_on() and _switch_off(),
        respectively. This also required updating the return types of 
        _init_apis() and _finalize_apis() to match the function pointer type 
        required by bli_pthread_switch_on()/_switch_off().
      - Comment updates.
  2. 10 May, 2022 1 commit
    • Field G. Van Zee's avatar
      Fixed misspelling of 'xpbys' in gemm macrokernel. · 64a9b061
      Field G. Van Zee authored
      - Fixed a functionally harmless typo in bli_gemm_ker_var2.c where a few
        instances of the substring "xpbys" were misspelled as "xbpys". The
        misspellings were harmless because they were consistent, and because
        they referenced only local symbols.
  3. 28 Apr, 2022 1 commit
  4. 14 Apr, 2022 1 commit
    • Field G. Van Zee's avatar
      Added missing 'const' to zen bli_gemm_small.c. · 6431c9e1
      Field G. Van Zee authored
      - Added missing 'const' qualifiers to signatures of functions defined in
        kernels/zen/3/bli_gemm_small.c. This fixes compile-time errors when
        targeting 'zen3' subconfig (which apparently is enabling AMD's
        gemm_small code path by default). Thanks to Devin Matthews for
        reporting this error.
  5. 13 Apr, 2022 1 commit
    • Devin Matthews's avatar
      Partial addition of 'const' to all interfaces above the (micro)kernels. (#625) · 9fea6337
      Devin Matthews authored
      - Added 'const' qualifier to applicable function arguments wherever the
        the pointed-to object is not internally modified. This change affects 
        all interfaces that reside above the level of the (micro)kernels.
      - Typecast certain function return values to discard 'const' qualifier.
      - Removed 'restrict' from various arguments, including cntx_t*,
        auxinfo_t*, rntm_t*, thrinfo_t*, mem_t*, and others
      - Removed parts of some APIs, such as bli_cntx_*(), due to limited use.
      - Merged some variable declarations with their corresponding 
        initialization statements.
      - Whitespace changes.
  6. 07 Apr, 2022 1 commit
    • Devin Matthews's avatar
      Simplify and rewrite reference packm kernels. (#610) · ae10d949
      Devin Matthews authored
      - Reorganized the way kernels are stored within the cntx_t structure so
        that rather than having a function pointer for every supported size of
        unrolled packm kernel (2xk, 3xk, 4xk, etc.), we store only two packm
        kernels per datatype: one to pack MRxk micropanels and one to pack
        NRxk micropanels.
        - NOTE: The "bb" (broadcast B) reference kernels have been merged into
          the "standard" kernels (packm [including 1er and unpackm], gemm, 
          trsm, gemmtrsm). This replication factor is controlled by 
          BLIS_BB[MN]_[sdcz] etc. Power9/10 needs testing since only a 
          replication factor of 1 has been tested. armsve also needs testing 
          since the MR value isn't available as a macro.
      - Simplified the bli_cntx_*() APIs to conform to the new unified kernel
        array within the cntx_t. Updated existing bli_cntx_init_<subconfig>()
        function definitions for all subconfigurations.
      - Consolidated all kernel id types (e.g. l1vkr_t, l1mkr_t, l3ukr_t,
        etc.) into one kernel id type: ukr_t.
      - Various edits, updates, and rewrites of reference kernels pursuant to 
        the aforementioned changes.
      - Define compile-time macro constants (BLIS_MR_[sdcz], BLIS_NR_[sdcz], 
        and friends) in bli_kernel_macro_defs.h, but only when the macro
        BLIS_IN_REF_KERNEL is defined by the build system.
      - Loose ends:
        - Still need to update documentation, including:
          - docs/ConfigurationHowTo.md
          - docs/KernelsHowTo.md
          to reflect changes made in this commit.
  7. 05 Apr, 2022 1 commit
  8. 01 Apr, 2022 4 commits
  9. 31 Mar, 2022 1 commit
  10. 29 Mar, 2022 2 commits
    • Field G. Van Zee's avatar
      Fixed typo in BLAS gemm3m call to _check(). · cf063643
      Field G. Van Zee authored
      - Fixed an unresolved symbol issue leftover from #590 whereby ?gemm3m_()
        as defined in bla_gemm3m.c was referencing bla_gemm3m_check(), which
        does not exist. It should have simply called the _check() function for
    • Dipal M Zambare's avatar
      AMD kernel updates; frame-specific AMD updates. (#597) · 1ec020b3
      Dipal M Zambare authored
      - Allow building BLIS with certain framework files (each with the '_amd'
        suffix) that have been customized by AMD for Zen-based hardware. These
        customized files were derived from portable versions of the same files
        (i.e., those without the '_amd' suffix). Whether the portable or AMD-
        specific files are compiled is now controlled by a new configure
        option, --[en|dis]able-amd-frame-tweaks. This option is disabled by
        default in vanilla BLIS, though AMD may choose to enable it by default
        in their fork. For now, the added AMD-specific files are:
        - bli_gemv_unf_var2_amd.c
        - bla_copy_amd.c
        - bla_gemv_amd.c
        These files reside in 'amd' subdirectories found within the directory
        housing their generic counterparts.
      - Register optimized real-domain copyv, setv, and swapv kernels in
      - Various minor updates to level-1v kernels in 'zen' kernel set.
      - Added caxpyf kernel as well as saxpyf and multiple daxpyf kernels to
        the 'zen' kernel set
      - If the problem passed to ?gemm_() in bla_gemm.c has a unit m or n dim,
        call gemv instead and return early.
      - Combined variable declarations with their initialization in various
        level-2 and level-3 BLAS compatibility files, and also inserted
        'const' qualifer in those same declaration statements.
      - Moved frame/compat/bla_gemmt.c and .h to frame/compat/extra/ .
      - Added copyv and swapv test drivers to 'test' directory.
      - Whitespace, comment changes.
  11. 25 Mar, 2022 1 commit
    • Bhaskar Nallani's avatar
      Added BLAS/CBLAS APIs for gemm3m. (#590) · 0db2bd53
      Bhaskar Nallani authored
      - Created ?gemm3m_() and cblas_?gemm3m() APIs that (for now) simply
        invoke the 1m implementation unconditionally. (Note that these APIs
        bypass sup handling.)
      - Added BLAS prototypes for gemm3m in frame/compat/bla_gemm3m.h.
      - Added CBLAS prototypes for gemm3m in frame/compat/cblas/src/cblas.h.
      - Relocated: 
        files into
      - Relocated frame/compat/bla_gemmt.? into frame/compat/extra/ .
      - Minor reorganization of prototypes and cpp macro directives in 
        bli_blas.h, cblas.h, and cblas_f77.h.
      - Trival whitespace change to cblas_zgemm.c.
  12. 14 Mar, 2022 1 commit
    • Devin Matthews's avatar
      Update Multithreading.md · d6810000
      Devin Matthews authored
      Add notes about `BLIS_IR_NT` (should typically be 1) and `BLIS_JR_NT` (should typically be small, e.g. <= 4). [ci skip]
  13. 11 Mar, 2022 2 commits
    • Field G. Van Zee's avatar
      Trival whitespace change; commit log addendum. · f1dbb0e5
      Field G. Van Zee authored
      - A co-attribution to Mithun Mohan was inadvertently omitted from the
        commit log for headline change in the previous commit, 7c07b477.
    • Field G. Van Zee's avatar
      Avoid gemmsup barriers when not packing A or B. (#622) · 7c07b477
      Field G. Van Zee authored
      - Implemented a multithreaded optimization for the special (and common)
        case of employing the gemmsup code path when the user requests
        (implicitly or explicitly) that neither A nor B be packed during
        computation. This optimization takes the form of a greatly reduced
        code branch in bli_thrinfo_sup_create_for_cntl(), which avoids a
        broadcast and two barriers, and results in higher performance when
        obtaining two-way or higher parallelism within BLIS. Thanks to
        Bhaskar Nallani of AMD for proposing this change via issue #605.
      - Added an early return branch to bli_thrinfo_create_for_cntl() that
        detects and quickly handles cases where no parallelism is being
        obtained within BLIS (i.e., single-threaded execution). Note that
        this special case handling was/is already present in
      - CREDITS file update.
  14. 10 Mar, 2022 1 commit
  15. 09 Mar, 2022 1 commit
    • Field G. Van Zee's avatar
      Fixed level-3 performance bug in haswell ukernels. · 71851a05
      Field G. Van Zee authored
      - Fixed a performance regression affecting nearly all level-3 operations
        that use the 'haswell' sgemm and dgemm microkernels. This regression
        was introduced in 54fa28bd, caused by an ill-formed conditional
        expression in the assembly code that controls whether cache lines of C
        should be prefetched as rows or as columns. Essentially, the two
        branches were reversed, causing incomplete prefetching to occur for
        both row- and column-stored instances of matrix C. Thanks to Devin
        Matthews for his help finding and fixing this bug.
  16. 28 Feb, 2022 1 commit
    • Field G. Van Zee's avatar
      Revamp how tools are handled/checked by configure. · 84732bf9
      Field G. Van Zee authored
      - Consolidate handling of tools that are specifiable via CC, CXX, FC, 
        PYTHON, AR, and RANLIB into one bash function, select_tool_w_env().
        - If the user specifies a tool via an environment variable (e.g. 
          CC=gcc) and that tool does not seem valid, print an error message 
          and abort configure, unless the tool is optional (e.g. CXX or FC), 
          in which case a warning message is printed instead.
        - The definition of "seems valid" above amounts to:
          - responding to at least one of a basic set of command line options 
            (e.g. --version, -V, -h) if the os_name is Linux (since GNU tools 
            tend to respond to flags such as --version) or if the tool in 
            question is CC, CXX, FC, or PYTHON (which tend to respond to the 
            expected flags regardless of OS)
          - the binary merely existing for AR and RANLIB on Darwin/OSX/BSD. 
            (These OSes tend to have non-GNU versions of ar and ranlib, which 
            typically do not respond to --version and friends.)
      - This PR addresses #584. Thanks to Devin Matthews for suggesting some
        of the changes in this commit.
  17. 22 Feb, 2022 2 commits
    • RuQing Xu's avatar
      ArmSVE Ensure Non-zero Block Size (#615) · d5146582
      RuQing Xu authored
      Fixes #613. There are several macros/environment variables which need to be tuned to get good cache block sizes. It would be nice to have a way of getting values automatically.
    • RuQing Xu's avatar
      Add armsve to arm64 Metaconfig (#614) · 4d835230
      RuQing Xu authored
      Availability of the `armsve` subconfig is controlled by the compiler version (gcc/clang). Tested for SVE and non-SVE. Fixes #612.
  18. 15 Feb, 2022 2 commits
    • Field G. Van Zee's avatar
      Renamed SIMD-related macro constants for clarity. · c9700f36
      Field G. Van Zee authored
      - Renamed the following macros defined in bli_kernel_macro_defs.h:
          BLIS_SIMD_SIZE          -> BLIS_SIMD_MAX_SIZE
        Also updated all instances of these macros elsewhere, including
        subconfigurations, source code, and documentation. Thanks to Devin
        Matthews for suggesting this change.
    • Field G. Van Zee's avatar
      Move edge cases to gemmtrsm ukrs; doc updates. · ee9ff988
      Field G. Van Zee authored
      - Moved edge-case handling into the gemmtrsm microkernel. This required
        changing the microkernel API to take m and n dimension parameters as
        well as updating all existing gemmtrsm microkernel function pointer
        types, function signatures, and related definitions to take m and n
        dimensions. Also updated all existing gemmtrsm kernels in the
        'kernels' directory (which for now is limited to haswell and penryn
        kernel sets, plus native and 1m-based reference kernels in
        'ref_kernels') to take m and n dimensions, and implemented edge-case
        handling within those microkernels via a collection of new C
        preprocessor macros defined within bli_edge_case_macro_defs.h. Note
        that the edge-case handling for gemm-like operations had already
        been relocated into the gemm microkernel in 54fa28bd.
      - Added desriptive comments to GEMM_UKR_SETUP_CT() and related macros in
        bli_edge_case_macro_defs.h to allow for easier reading.
      - Updated docs/KernelsHowTo.md to reflect above changes. Also cleaned up
        the bullet under "Implementation Notes for gemm" that covers alignment
        issues. (Thanks to Ivan Korostelev for pointing out the confusing and
        outdated language in issue #591.)
      - Other minor tweaks to KernelsHowTo.md.
  19. 14 Feb, 2022 2 commits
  20. 13 Feb, 2022 1 commit
  21. 07 Feb, 2022 4 commits
  22. 31 Jan, 2022 1 commit
  23. 27 Jan, 2022 1 commit
    • Field G. Van Zee's avatar
      Updated zen3 macro constant names. · 0be9282c
      Field G. Van Zee authored
      - In config/zen3/bli_family_zen3.h, renamed:
        Thanks to Jeff Diamond for helping spot the stale _SYRK naming.
  24. 17 Jan, 2022 1 commit
    • Jeff Hammond's avatar
      the Apple local label thing is required by Clang in general · 0ab20c0e
      Jeff Hammond authored
      @egaudry and I both saw this issue on Linux with Clang 10.
      Compiling obj/thunderx2/kernels/armv8a/3/sup/bli_gemmsup_rv_armv8a_asm_d4x8m.o ('thunderx2' CFLAGS for kernels)
      kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:171:49: fatal error: invalid symbol redefinition
              "                                            \n\t"
      <inline asm>:90:5: note: instantiated into assembly here
      1 error generated.
      Signed-off-by: default avatarJeff Hammond <jehammond@nvidia.com>
  25. 11 Jan, 2022 2 commits
  26. 06 Jan, 2022 1 commit
    • Field G. Van Zee's avatar
      Added m, n dims to gemmd/gemmlike ukernel calls. · 3f2440b0
      Field G. Van Zee authored
      - Updated the gemmd addon and the gemmlike sandbox code to use the new
        microkernel calling sequence, which now includes m and n dimensions so
        that the microkernel has all the information necessary to handle edge
        cases. Thanks to Jeff Diamond for catching this, which ideally would
        have been included in commit 54fa28bd.
      - Retired var2 of both gemmd and gemmlike to 'attic' directories and
        removed their corresponding prototypes. In both cases, var2 was a
        variant of the block-panel algorithm where edge-case handling was
        abstracted away to a microkernel wrapper. (Since this is now the
        official behavior of BLIS microkernels, I saw no need to have it
        included as a separate code path.)
      - Comment updates.
  27. 04 Jan, 2022 1 commit
  28. 02 Jan, 2022 1 commit
    • Devin Matthews's avatar
      Add unique tag to branch labels for Apple ARM64. · 466b68a3
      Devin Matthews authored
      Add `%=` tag to branch labels, which expands to a unique identifier for each inline assembly block. This prevents duplicate symbol errors on Apple Silicon (#594). Fixes #594. [ci skip] since we can't test Apple Silicon anyways...