/*! \page installation Installation and Requirements
\tableofcontents

# CMake Based Build

The STRUMPACK package uses the CMake build system (CMake version ≥
3.11). The recommended way of building the STRUMPACK library is as
follows:

\code {.bash}
> tar-xvzf strumpack-x.y.z.tar.gz
> cd strumpack-x.y.z
> mkdir build
> mkdir install
> cd build
> export METIS_DIR=/path/to/metis/install
> cmake ../ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install
> make -j4
> make install
> make examples -j4
\endcode

The above will only work if you have the following dependencies, and
CMake can find them:


- __C++11__, __C__ and __FORTRAN77__ compilers. CMake looks for these
  compilers in the standard locations, if they are installed
  elsewhere, you can specify them as follows:
\code {.bash}
> cmake ../ \
     -DCMAKE_BUILD_TYPE=Release \
     -DCMAKE_CXX_COMPILER=g++ \
     -DCMAKE_C_COMPILER=gcc \
     -DCMAKE_Fortran_COMPILER=gfortran
\endcode


- __BLAS__ and __LAPACK__ libraries. For performance it is crucial to
  use optimized BLAS/LAPACK libraries like for instance Intel® MKL,
  AMD® ACML, Cray® LibSci, IBM ESSL or OpenBLAS.  If BLAS or LAPACK is
  not found automatically, you can specify the libraries using
\code {.bash}
 > cmake ../  \
      -DTPL_BLAS_LIBRARIES="..." \
      -DTPL_LAPACK_LIBRARIES="..."
\endcode
  For Intel MKL, make sure to use the lp64 interface instead of ilp64. \n

  The default versions of the Intel® MKL and Cray® LibSci BLAS libraries
  will use multithreaded kernels, unless when they are called from
  within an OpenMP parallel region, in which case they run
  sequentially. This is the behavior STRUMPACK relies upon to achieve
  good performance when running in MPI+OpenMP hybrid mode.

  The IBM ESSL library provides some highly optimizes BLAS routines,
  but doesn't provide a complete BLAS library. So, when linking with
  ESSL, you also need to provide another BLAS libary for the routines
  not available in ESSL:
\code {.bash}
 > cmake ../  \
      -DTPL_BLAS_LIBRARIES="${OLCF_ESSL_ROOT}/lib64/libessl.so;${OLCF_NETLIB_LAPACK_ROOT}/lib64/libblas.so" \
      -DTPL_LAPACK_LIBRARIES="${OLCF_ESSL_ROOT}/lib64/libessl.so;${OLCF_NETLIB_LAPACK_ROOT}/lib64/liblapack.so" \
      -DTPL_SCALAPACK_LIBRARIES="${OLCF_NETLIB_SCALAPACK_ROOT}/lib/libscalapack.so"
\endcode


- __METIS__ (≥ 5.1.0 <b>required</b>) for the nested dissection matrix
  reordering. Metis can be obtained from: \n
  http://glaros.dtc.umn.edu/gkhome/metis/metis/download \n
  CMake looks for the Metis inlude files the library in the default
  locations as well as at $METIS_DIR.
\code {.bash}
 > export METIS_DIR=...
 > cmake ../
\endcode
  You can download and install METIS as follows:
\code {.bash}
wget http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz
tar -xvzf metis-5.1.0.tar.gz
cd metis-5.1.0
make config cc=gcc prefix=`pwd`/install
make install
export METIS_DIR=`pwd`/install
\endcode
  Alternatively, you can specify the
  location of the header and library as follows:
\code {.bash}
 > cmake ../  \
      -DTPL_METIS_INCLUDE_DIRS=/usr/local/metis/include \
      -DTPL_METIS_LIBRARIES=/usr/local/metis/lib/libmetis.a
\endcode


- __OpenMP v3.1__ support is required in the C++ compiler to use the
  shared-memory parallelism in the code. OpenMP support can be
  disabled by adding the CMake option
\code {.bash}
-DSTRUMPACK_USE_OPENMP=OFF
\endcode
  OpenMP v3.1 introduces task parallelism, which is used
  extensivelythroughout the code. STRUMPACK also uses tasks with
  dependencies, an OpenMP feauture introduced in version v4.0 as well
  as the taskloop construct, introduced in OpenMP 4.5. It is assumed
  you have at least OpenMP 3.1, and CMake will check whether your
  compiler supports task dependencies and the taskloop construct and
  enable those only if found.


- __CUDA__, __cuBLAS__ and __cuSOLVER__ (optional) can be used to
  accelerate the sparse direct solver:
\code {.bash}
 > cmake ../ -DSTRUMPACK_USE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="75"
\endcode
  which is enabled by default, and will look for the CUDA compiler and
  libraries in the default locations. Additionally, one can specify
  the CUDAToolkit_ROOT path to the CUDA libraries:
\code {.bash}
 > cmake ../ -DSTRUMPACK_USE_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES="75" -DCUDAToolkit_ROOT=/some/path
\endcode
or
\code {.bash}
 > export CUDAToolkit_ROOT=..
\endcode
  For full GPU support in the distributed memory sparse direct solver,
  one should also compile with support for SLATE, see below. \n
  See the page on \link GPU_Support GPU Support \endlink for more
  details.


- __MPI__ (Message Passing Interface) library. Support for MPI is
  enabled by default in STRUMPACK, but can be disabled by adding
  \code{.bash} -DSTRUMPACK_USE_MPI=OFF \endcode to the CMake
  command. You should not need to manually specify the MPI compiler
  wrappers. CMake will look for MPI options and libraries and set the
  appropriate compiler and linker flags. When MPI is enabled,
  STRUMPACK also requires ScaLAPACK (and BLACS), see below.


- __ScaLAPACK__ (included in Intel® MKL or Cray® LibSci), is not
  required when compiling without MPI. ScaLAPACK depends on the BLACS
  communication library and on PBLAS (parallel BLAS), both of which
  are typically included with the ScaLAPACK installation (from
  ScaLAPACK 2.0.2, the blacs library is included in the ScaLAPACK
  library file). If CMake cannot locate these libraries, you can
  specify their path by setting the environment variable $SCALAPACK_DIR
\code {.bash}
 > export SCALAPACK_DIR=/some/path/
 > cmake ../
\endcode
  or by specifying the libraries manually:
\code {.bash}
 > cmake ../ -DTPL_SCALAPACK_LIBRARIES="..."
\endcode
  Or one can also directly modify the linker flags to add for instance
  the ScaLAPACK and BLACS libraries:
\code {.bash}
 > cmake../ -DCMAKE_EXE_LINKER_FLAGS="-L/usr/lib64/mpich/lib/ -lscalapack -lmpiblacs"
\endcode
  When using Intel MKL we recommend using the the link advisor: \n
  https://software.intel.com/en-us/articles/intel-mkl-link-line-advisor \n
  Make sure to select the lp64 interface instead of ilp64.


- __getopt_long__: This is a GNU extension to the POSIX getopt() C
  library function.


- __SLATE__ is a modern ScaLAPACK alternative, bringing support for
  GPU acceleration. It can be found at: \n
  https://github.com/icl-utk-edu/slate \n
  Support for SLATE in STRUMPACK can be enabled with for instance:
\code {.bash}
> export SLATE_DIR=/slate/install/dir/
> cmake ../ -DTPL_ENABLE_SLATE=ON
\endcode
  This requires that SLATE was configured/installed using CMake, and
  the SLATE_DIR variable points to the installation path.


- __PARMETIS__ (optional, only used when compiling with MPI) for
parallel nested dissection. ParMetis can be download from
http://glaros.dtc.umn.edu/gkhome/metis/parmetis/download \n
The steps to make sure CMake can find ParMetis are similar as for
Metis. Enable with -DTPL_ENABLE_PARMETIS. The CMake variables
TPL_PARMETIS_INCLUDE_DIRS and TPL_PARMETIS_LIBRARIES, or set the
environment variable ParMETIS_DIR.


- __SCOTCH__ and __PT-SCOTCH__ (≥ 5.1.12) (optional) for matrix
reordering. Scotch can be downloaded from:
http://www.labri.fr/perso/pelegrin/scotch/  Configuring CMake
to find (PT-)Scotch is similar to Metis. Enable with
-DTPL_ENABLE_SCOTCH=ON and -DTPL_ENABLE_PTSCOTCH=ON. For (PT-)Scotch
the environment variables SCOTCH_DIR (PTSCOTCH_DIR) or the CMake
variables TPL_SCOTCH_INCLUDE_DIRS, TPL_PTSCOTCH_INCLUDE_DIRS, and
TPL_SCOTCH_LIBRARIES, TPL_PTSCOTCH_LIBRARIES (make sure to specify all
libraries: libscotch, libscotcherr, libptscotch and libptscotcherr).


- __ButterflyPACK__ (≥ 1.2.0): https://github.com/liuyangzhuan/ButterflyPACK STRUMPACK
  supports the HODLR (Hierarchically Off-Diagonal Low-Rank) and
  Butterfly matrix formats, see also \ref hodlr_matrices, through the
  ButterflyPACK library. To enable this from CMake, add the following:
\code {.bash}
 > export ButterflyPACK_DIR=...
 > cmake ../ -DTPL_ENABLE_BPACK=ON
\endcode


- __ZFP__ (≥ 0.5.5) https://computing.llnl.gov/projects/floating-point-compression :
  ZFP is used for lossy compression in the sparse solver
  (preconditioner).
\code {.bash}
 > export ZFP_DIR=...
 > cmake ../ -DTPL_ENABLE_ZFP=ON
\endcode


- __Combinatorial BLAS__ (≥ 2.0) (optional) can be used for parallel
  reordering for stability, i.e., to get nonzeros on the diagonal of
  the matrix, as an alternative to the sequential MC64 (included). Get
  it from: \n
     https://bitbucket.org/berkeleylab/combinatorial-blas-2.0/src/master/ \n
  Enable by adding -DTPL_ENABLE_COMBBLAS=ON to the CMake
  command. Also set the following environment variables:
\code {.bash}
 > export COMBBLAS_DIR=/path/to/combinatorial-blas-2.0/CombBLAS/install/
 > export COMBBLASAPP_DIR=/path/to/combinatorial-blas-2.0/CombBLAS/Applications/
\endcode
  to the appropriate directories for your setup.


The code was tested on GNU/Linux and Mac with the GNU and Intel®
compilers and the OpenBLAS, Intel® MKL®, Cray® LibSci® and IBM ESSL
numerical libraries. If you encounter issues on other platforms or
with other BLAS/LAPACK implementations, please let us know.

# SPACK, xSDK \& E4S


STRUMPACK can also be installed through Spack (https://spack.io),
a package manager for supercomputers, Linux, and macOS. It
makes installing scientific software easy. With Spack, you can build a
package with multiple versions, configurations, platforms, and
compilers, and all of these builds can coexist on the same machine:
\code {.bash}
git clone https://github.com/spack/spack.git
. spack/share/spack/setup-env.sh
spack install strumpack
\endcode
This can take a while, since by default Spack will build all
dependencies.


Furthermore, STRUMPACK is also part of the ECP __xSDK__
(https://xsdk.info) and __E4S__ (https://e4s-project.github.io)
software development kits. By installing the full xSDK, you get
STRUMPACK, all of it's dependecies, as well as PETSc (which has a
STRUMPACK interface) and several other scientific software libraries.


# Using STRUMPACK in Your Code

Please see the examples in the examples/ folder, which can be build using
\code {.bash}
 > make examples
\endcode
in the __build__ directory.


Using STRUMPACK in your code is easiest when using the CMake build
system. Below is an example CMakeLists.txt file for a small project
called myapp using STRUMPACK:

\code {.bash}
 cmake_minimum_required(VERSION 3.13)
 project(myapp LANGUAGES CXX)

 find_package(STRUMPACK REQUIRED)

 add_executable(myexe main.cpp)
 target_link_libraries(myexe PRIVATE STRUMPACK::strumpack)
\endcode

Then invoke CMake with the path to the STRUMPACK installation folder
set:

\code {.bash}
> export STRUMPACK_DIR=/some/path/STRUMPACK/install
> cd myapp
> mkdir build
> cd build
> cmake ../
> make
\endcode

*/
