Commit a0b15a19 authored by Guido Giuntoli's avatar Guido Giuntoli

Merge branch 'documentation'

parents 24287a76 ec8870aa
......@@ -29,6 +29,13 @@ extract_data.py
compile_commands.json
scripts/output*
# Latex
*.blg
*.bbl
*.synctex.gz
*.fls
*.fdb_latexmk
\#*
.#*
*.swp
# _micropp_
# Micropp
Code to _localize_ strains and _homogenize_ stress in a Representative Volume Element (RVE) by using Finite Element Method (FEM).
[![Build Status](https://travis-ci.org/GG1991/Micropp.svg?branch=master)](https://travis-ci.org/GG1991/Micropp)
Code to localize strains and homogenize stress in a Representative Volume Element (RVE) by using Finite Element Method (FEM).
# Characteristics
1. Works with structured grids 2D or 3D
2. Plastic non-linear material model for testing the memory storage and efficiency.
3. Supports boundary condition : uniform strains (Pure Dirichlet)
4. Runs sequentially.
5. Own ELL matrix routines with CG iterative solver (diagonal pre-conditioner).
6. Different kinds of micro-structures
1. Works with 3D structured FE elements problems
2. OpenACC acceleration support for GPUs
3. OpenMP support for multi-core CPUs
4. Solver: Conjugate Gradients with Diagonal Preconditioner (CGPD)
5. Different varieties of micro-structures and material laws
6. Native instrumentation to measure performance
7. C and Fortran Wrappers
# Main Characteristics
_micropp_ solves the FE problem on heterogeneous RVEs composed with more than one material and calculates the average properties of it. In the next figure a typical micro-structure is solved.
Micropp solves the FE problem on heterogeneous RVEs composed with more than one material and calculates the average properties of it. In the next figure a typical micro-structure is solved.
<img src="./pics/mic_1.png" alt="drawing" width="300"/>
_micropp_ is designed to be coupled with a macro-scale code in order to simulate multi-scale physical systems like an composite aircraft panel:
Micropp is designed to be coupled with a macro-scale code in order to simulate multi-scale physical systems like an composite aircraft panel:
<img src="./pics/coupling-micropp-macro.png" alt="drawing" width="300"/>
`MicroPP` has been coupled with high-performance codes such as [Alya](http://bsccase02.bsc.es/alya) developed at the Barcelona Supercomputing center ([BSC](https://www.bsc.es/)) to performed **FE2** calculations. Also it was coupled with [MacroC](https://github.com/GG1991/macroc), a FE code that uses PETSc library on structured meshes. With this good performance was reach until 30720 processors on Marenostrum IV supercomputer.
Micropp has been coupled with high-performance codes such as [Alya](http://bsccase02.bsc.es/alya) developed at the Barcelona Supercomputing center ([BSC](https://www.bsc.es/)) to performed **FE2** calculations. Also it was coupled with [MacroC](https://github.com/GG1991/macroc), a FE code that uses PETSc library on structured meshes. With this good performance was reach until 30720 processors on Marenostrum IV supercomputer.
<img src="./pics/scala.png" alt="drawing" width="350"/>
`MicroPP` has its own ELL matrix format routines optimized for the structured grid geometries that it has to manage. This allows to reach a really good performance in the assembly stage of the matrix. The relation between the assembly time and the solving time can be below than 1% depending on the problem size. The solving algorithm for the linear system of equations consists on a Conjugate Gradient algorithm with diagonal preconditioner.
Micropp has its own ELL matrix format routines optimized for the structured grid geometries that it has to manage. This allows to reach a really good performance in the assembly stage of the matrix. The relation between the assembly time and the solving time can be below than 1% depending on the problem size. The solving algorithm for the linear system of equations consists on a Conjugate Gradient algorithm with diagonal preconditioner.
Build steps with CMake:
-----------------------
1. Clone the repository
1. Clone the repository
2. cd cloned directory
3. mkdir build (can be also build+anything)
4. cd build
......@@ -53,20 +56,8 @@ and the debug version:
cmake -DCMAKE_BUILD_TYPE=Debug ..
```
Other possible options are: Debug, Release, RelWithDebInfo, MinSizeRel. Read CMake documentation for more information.
An option **TIMER** was added to insert time measure instrumentation for the execution. You can enable the option during cmake configuration time.
```bash
cmake -DTIMER=on ..
```
Option **CGDEBUG** was included to study the CG solver and see the convergence under different conditions.
```bash
cmake -DCGDEBUG=on ..
```
The new option is independent of Debug or release mode. But remember that any
Other possible options are:
1. `TIMER=[ON|OFF]` activate the native instrumentation for measuring times
2. `OPENACC=[ON|OFF]` compiles with OpenACC (only supported by some compilers such as PGI)
3. `OPENMP=[ON|OFF]` compiles with OpenMP for multi-core CPUs
all:
pdflatex manual.tex
\section{Benchmarks}
\subsection{\texttt{benchmarks-sol-ass}}
This benchmark is intended to measure the computing times of the assembly of the Residue vector ($\times2$) and the
Jacobian matrix and the solver (CGPD).
Basically the benchmark executes a tipical Newton-Raphson iteration:
\begin{lstlisting}[language=c++, backgroundcolor=\color{lightgray} ]
double norm = assembly_rhs_acc(u, nullptr, b);
#ifdef _OPENACC
assembly_mat_acc(&A, u, nullptr);
#else
assembly_mat(&A, u, nullptr);
#endif
#ifdef _OPENACC
int cg_its = ell_solve_cgpd_acc(&A, b, du, &cg_err);
#else
int cg_its = ell_solve_cgpd(&A, b, du, &cg_err);
#endif
for (int i = 0; i < nn * dim; ++i)
u[i] += du[i];
norm = assembly_rhs_acc(u, nullptr, b);
\end{lstlisting}
To execute this benchmark:
\begin{lstlisting}[language=Bash,backgroundcolor=\color{lightgray} ]
./benchmark-sol-ass
\end{lstlisting}
\begin{figure}[!htbp]
\centering
\begin{tikzpicture}[]
\pgfplotsset{every tick label/.append style={font=\small}}
\pgfplotstableread{data/benchmark-sol-ass-macintosh-intel-I7-3520M.dat}{\mac}
\pgfplotstableread{data/benchmark-sol-ass-CTEPOWER-PGI19.4-cpu.dat}{\ibmcpu}
\pgfplotstableread{data/benchmark-sol-ass-CTEPOWER-PGI19.4-gpu.dat}{\ibmgpu}
\begin{loglogaxis}[
grid=major,
y unit=s,
legend style={at={(1.7,0.45)},anchor=south east},
legend cell align={left},
ylabel=Computing Time,
xlabel=Micro-Scale Mesh Resolution,
x unit=\# Elements,
scaled y ticks=false,
%xmin=-1,
%ymin=-1,
xtick = {20,40,60,80,100},
xticklabels = {20\tst,40\tst,60\tst,80\tst,100\tst},
]
\addplot [color=blue,mark=*,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{1}*1.0e-3] {\mac};
\addplot [color=green,mark=*,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{2}*1.0e-3] {\mac};
\addplot [color=blue,mark=+,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{1}*1.0e-3] {\ibmcpu};
\addplot [color=green,mark=+,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{2}*1.0e-3] {\ibmcpu};
\addplot [color=blue,mark=x,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{1}*1.0e-3] {\ibmgpu};
\addplot [color=green,mark=x,line width = 0.2mm] table [x index={0}, y
expr=\thisrowno{2}*1.0e-3] {\ibmgpu};
\legend{Assembly (Intel I7 CPU), Solver (Intel I7 CPU),
Assembly (IBM P9 CPU), Solver (IBM P9 CPU),
Assembly (V100 GPU), Solver (V100 GPU)}
\end{loglogaxis}
\end{tikzpicture}
\caption{\label{fig:ass_vs_sol}
Computing time used for the assembly of the Jacobian Matrix and the Residue vector and
the solver algorithm of the Micropp code to perform the micro-scale FE
calculation in an Intel Core i7-3520M CPU 2.90GHz.
}
\end{figure}
\subsection{\texttt{benchmarks-cpu-gpu}}
This benchmark measures the speedup achieved when the CPU/GPU strategy is used against pure CPU. To execute this test
Micropp should be compiled with OpenACC:
\begin{lstlisting}[language=Bash,backgroundcolor=\color{lightgray} ]
./benchmark-cpu-gpu
\end{lstlisting}
The output of this program is stored in \texttt{benchmark-cpu-gpu.dat}.
Fig.~\ref{fig:cpu-vs-gpu} shows the computing time of one Newton-Raphson iteration for the micro-scale problem
evaluating different mesh resolutions. The comparison is performed between one IBM Power 9 CPU core and an NVidia V100
Tesla GPU of the CTE-POWER cluster~\cite{cte-power} using \texttt{benchmark-cpu-gpu}~\cite{micropp-doc}. The
Newton-Raphson iteration includes the assembly of the Jacobian matrix, two times the assembly of the Residue vector (the
second assembly is done for checking the that the final norm is near zero after the solver) and the CGPD solver part.
This is the most fundamental procedure that it is done at the micro-scale in the linear or elastic cases when the
One-Way or the Full coupling schemes are activated. Moreover, in non-linear cases, this procedure is repeated several
times after the convergence is achieved. Fig.~\ref{fig:cpu-vs-gpu} demonstrates the speedup gained with the CPU/GPU
acceleration against using a pure CPU core. The speedup increases as the micro-scale mesh resolution increases, for
instance, a speedup of $\times$25 is gained for the mesh resolution of 100\tst elements. This increment in the speed up
with the mesh resolution is due to that the CGPD solver occupies a dominant part in the algorithm mainly for large mesh
resolutions. Being the solver the part which is better parallelized with the GPU the computations (principally the MVP
product) of the stage is accelerated and it is translated to the whole calculation.
\begin{figure}
\centering
\begin{tikzpicture}
\pgfplotstableread[row sep=\\,col sep=&]{
interval & CPU & GPU & Speedup \\
%10 & 53 & 176 & $\times$0.3 \\
%20 & 496 & 349 & $\times$1.4 \\
30 & 2.183 & 0.648 & $\times$3.3 \\
40 & 6.221 & 1.146 & $\times$5.4 \\
50 & 13.824 & 1.541 & $\times$9.0 \\
60 & 24.751 & 1.464 & $\times$16.9 \\
70 & 41.142 & 2.369 & $\times$17.4 \\
80 & 73.224 & 4.204 & $\times$17.4 \\
90 & 105.634 & 5.365 & $\times$19.7 \\
100 & 182.470 & 7.315 & $\times$24.9 \\
}\mydata
\begin{axis}[
grid=major,
ybar=0.5cm,
xlabel=Micro-Scale Mesh Resolution,
x unit=\# Elements,
ylabel=Computing Time,
y unit=s,
x=0.15cm,
ymin=-1,
xtick={30,40,50,60,70,80,90,100},
xticklabels={30\tst,40\tst,50\tst,60\tst,70\tst,80\tst,90\tst,100\tst},
x tick label style={anchor=east},
scaled y ticks=false,
legend style={anchor=north west},
legend pos= north west
]
\addplot [fill=blue, ybar, point meta = explicit symbolic, nodes near coords]
table[meta=Speedup,x=interval,y=CPU] {\mydata};
\addplot [fill=green, ybar, point meta = explicit symbolic, nodes near coords]
table[x=interval,y=GPU] {\mydata};
\legend{CPU, GPU};
\end{axis}
\end{tikzpicture}
\caption{\label{fig:cpu-vs-gpu}
Computing Time of one Newton-Raphson iteration step in the micro-scale for different mesh resolutions
(\texttt{benchmark-cpu-gpu}~\cite{micropp-doc}). Comparison between one CPU IBM Power 9 CPU core and an
NVIDIA V100 GPU (computing node of CTE-POWER cluster).
}
\end{figure}
\subsection{\texttt{benchmarks-elastic}}
\subsection{\texttt{benchmarks-plastic}}
\subsection{\texttt{benchmarks-damage}}
\subsection{\texttt{benchmarks-mic-1}}
\subsection{\texttt{benchmarks-mic-2}}
\subsection{\texttt{benchmarks-mic-3}}
\section{Coding Style}
The compilation process is based on \texttt{CMake}
# CPU GPU
#N time_ass time_sol total ass% sol% time_ass time_sol total ass% sol%
10 32 21 53 60.3774 39.6226 24 152 176 13.6364 86.3636
20 190 306 496 38.3065 61.6935 79 270 349 22.6361 77.3639
30 674 1509 2183 30.8749 69.1251 158 490 648 24.3827 75.6173
40 1644 4577 6221 26.4266 73.5734 386 760 1146 33.6824 66.3176
50 3117 10707 13824 22.5477 77.4523 751 790 1541 48.7346 51.2654
60 5424 19327 24751 21.9143 78.0857 997 467 1464 68.1011 31.8989
70 7560 33582 41142 18.3754 81.6246 1574 795 2369 66.4415 33.5585
80 11341 61883 73224 15.4881 84.5119 2734 1470 4204 65.0333 34.9667
90 16185 89449 105634 15.3218 84.6782 3459 1906 5365 64.4734 35.5266
100 22098 160372 182470 12.1105 87.8895 4536 2779 7315 62.0096 37.9904
#N time_ass time_sol ass% sol%
10 23 21 52.2727 47.7273
20 224 307 42.1846 57.8154
30 767 1432 34.8795 65.1205
40 1569 4049 27.9281 72.0719
50 3190 9420 25.2974 74.7026
60 5947 20801 22.2334 77.7666
70 9272 33667 21.5934 78.4066
80 12640 56104 18.3871 81.6129
90 18008 89818 16.701 83.299
100 24949 138888 15.2279 84.7721
#N time_ass time_sol ass% sol%
10 36 12 75 25
20 40 31 56.338 43.662
30 124 70 63.9175 36.0825
40 281 140 66.7458 33.2542
50 585 265 68.8235 31.1765
60 1000 477 67.7048 32.2952
70 1574 817 65.8302 34.1698
80 2344 1277 64.7335 35.2665
90 3331 1941 63.1829 36.8171
100 4523 2876 61.1299 38.8701
#N time_ass time_sol ass% sol%
10 17 15 53.125 46.875
20 112 175 39.0244 60.9756
30 375 821 31.3545 68.6455
40 885 2493 26.1989 73.8011
50 1731 5708 23.2693 76.7307
60 3002 11567 20.6054 79.3946
70 4798 20730 18.795 81.205
80 7471 35229 17.4965 82.5035
90 10739 55554 16.1993 83.8007
100 14860 82273 15.2986 84.7014
all:
pdflatex manual
#bibtex manual
#pdflatex manual
#pdflatex manual
\documentclass[conference, onecolumn]{IEEEtran}
% \usepackage[pdftex]{graphicx}
% \usepackage[dvips]{graphicx}
%\usepackage{amsmath}
\usepackage{algorithm}
\usepackage{algorithmic}
%\usepackage{array}
%\usepackage[caption=false,font=normalsize,labelfont=sf,textfont=sf]{subfig}
%\usepackage{fixltx2e}
%\usepackage{stfloats}
%\usepackage{url}
\usepackage{amsmath}
\usepackage{amssymb}
\RequirePackage{amsfonts}
\RequirePackage{standalone}
\RequirePackage{tikz}
\usetikzlibrary{matrix,backgrounds,calc,shapes,arrows,arrows.meta,fit,positioning}
\usetikzlibrary{chains,shapes.multipart}
\usetikzlibrary{shapes,calc}
\usepackage{amsfonts}
\usepackage{siunitx}
\usepackage{standalone}
\usepackage{tikz}
\usetikzlibrary{matrix,backgrounds,calc,shapes,arrows,arrows.meta,fit,positioning}
\usetikzlibrary{chains,shapes.multipart}
\usepackage{pgfplots, pgfplotstable}
\usepgfplotslibrary{units}
\usepackage{xcolor}
\usepackage{xspace}
\usepackage{listings}
\usepackage{color}
\usepackage{framed}
\definecolor{dkgreen}{rgb}{0,0.6,0}
\definecolor{gray}{rgb}{0.5,0.5,0.5}
......@@ -43,18 +41,29 @@ tabsize=4
%columns=flexible,
\hyphenation{op-tical net-works semi-conduc-tor}
\newcommand{\ts}{\textsuperscript\xspace}
\newcommand{\tst}{\textsuperscript{3}\xspace}
\begin{document}
\title{MicroPP: Reference Manual}
\author{\IEEEauthorblockN{
Guido Giuntoli\IEEEauthorrefmark{1}\IEEEauthorrefmark{2},
Jimmy Aguilar\IEEEauthorrefmark{1}\IEEEauthorrefmark{3}}
\IEEEauthorblockA{\IEEEauthorrefmark{1}Barcelona Supercomputing Center}
\IEEEauthorblockA{\IEEEauthorrefmark{2}guido.giuntoli@bsc.es}
\IEEEauthorblockA{\IEEEauthorrefmark{3}jimmy.aguilar@bsc.es}}
\title{Micropp: Reference Manual}
%\author{\IEEEauthorblockN{
% Guido Giuntoli\IEEEauthorrefmark{1}\IEEEauthorrefmark{3},
% Jimmy Aguilar\IEEEauthorrefmark{1}\IEEEauthorrefmark{4}
% Judica\"el Grasset\IEEEauthorrefmark{2}\IEEEauthorrefmark{5}}
%\IEEEauthorblockA{\IEEEauthorrefmark{1}Barcelona Supercomputing Center, Spain}
%\IEEEauthorblockA{\IEEEauthorrefmark{2}STFC Daresbury Laboratory, UK}
%\IEEEauthorblockA{\IEEEauthorrefmark{3}guido.giuntoli@bsc.es}
%\IEEEauthorblockA{\IEEEauthorrefmark{4}jimmy.aguilar@bsc.es}
%\IEEEauthorblockA{\IEEEauthorrefmark{5}judicael.grasset@stfc.ac.uk}}
\author{
Guido Giuntoli (gagiuntoli@gmail.com) \\
Jimmy Aguilar (spacibba@aol.com) \\
Judica\"el Grasset (judicael.grasset@stfc.ac.uk)
}
\maketitle
......@@ -69,33 +78,21 @@ tabsize=4
\begin{figure}
\centering
\resizebox{\textwidth}{!}{
\begin{tikzpicture}[]
\node[draw=none,fill=none,scale=2.0] at (0,0) {\includegraphics[width=0.95\linewidth]{figures/mac_1.png}};
\node[draw=none,fill=none,scale=1.4] at (20,0) {\includegraphics[width=0.95\linewidth]{figures/mic_1.png}};
\node[draw=none,fill=none,scale=3.0] at (5, -9) {macro-scale};
\node[draw=none,fill=none,scale=3.0] at (13,-9) {\emph{micropp}};
\draw[-{Latex[length=8mm, width=4mm]}, black,line width=0.5mm] (0,10) .. controls ++(8.5,2) .. ++(17,0);
\draw[-{Latex[length=8mm, width=4mm]}, black,line width=0.5mm] (17,-10) .. controls ++(-8.5,-2) .. ++(-17,0);
\end{tikzpicture}
\includegraphics[width=0.95\linewidth]{figures/coupling-micropp-macro.png}
\caption{\label{fig:disp}
Coupling between a macro-scale solid mechanics code and Micropp to simulate a composite material
problem.
}
\caption{\label{fig:disp} Coupling between the macro-scale code \emph{Alya} \& \emph{micropp} to simulate damage in composite material for aeronautics.}
\end{figure}
\input{governing_equations_and_fe.tex}
\section{Implementation}
\begin{figure}[!hhh]
\centering
\resizebox{5cm}{!}{\input{figures/work_basis.tikz}}
\vspace{0.5cm}
\caption{\label{fig:comp_scheme}}
\end{figure}
The Voigt convention used here is the same as in Ref.~\cite{simo}.
\begin{equation}
\epsilon = \left[\epsilon_{11} \quad \epsilon_{22} \quad \epsilon_{33} \quad \epsilon_{12} \quad \epsilon_{13} \quad \epsilon_{23} \right]^T
\epsilon = \left[\epsilon_{12} \quad \epsilon_{22} \quad \epsilon_{33} \quad \epsilon_{12} \quad \epsilon_{13} \quad \epsilon_{23} \right]^T
\end{equation}
\section{Geometries}
......@@ -182,7 +179,7 @@ f_{n+1}^{\text{trial}} = || s_{n+1}^{\text{trial}} || - \sqrt{\frac{2}{3}} (\sig
\begin{array}{ll}
\epsilon_{n+1}^{p} = \epsilon_{n}^{p} - \Delta \gamma \mathbf{n}_{n+1} \\[5pt]
\alpha_{n+1} = \alpha_{n} + \sqrt{\frac{2}{3}} \Delta \gamma \\[5pt]
\sigma_{n+1} = k \, \text{tr} (\epsilon_{n+1}) + s_{n+1}^{\text{trial}} - 2 \mu \Delta \gamma \mathbf{n}_{n+1}
\sigma_{n+1} = k \, \text{tr} (\epsilon_{n+1}) + s_{n+1}^{\text{trial}} - 2 \mu \Delta \gamma \mathbf{n}_{n+1}
\end{array}
\right.
\end {equation}
......@@ -202,7 +199,11 @@ f_{n+1}^{\text{trial}} = || s_{n+1}^{\text{trial}} || - \sqrt{\frac{2}{3}} (\sig
\end{algorithmic}
\end{algorithm}
\input{coding_style.tex}
\input{Sections/compilation.tex}
\input{Sections/coding_style.tex}
\input{Sections/benchmarks.tex}
\section{Conclusion}
The conclusion goes here.
......@@ -210,12 +211,33 @@ The conclusion goes here.
% use section* for acknowledgment
\section*{Acknowledgment}
The authors would like to thank to the Barcelona Supercomputing Center for the resources provided to develop and test \emph{micropp} code in the architectures: \emph{Marenostrum IV} \& \emph{CTE-POWER} during Sep., 2016 and Dic., 2019. The simulations were primary done coupling \emph{micropp} with the multi-physics code \emph{Alya} to solve the macro-scale equations.
The authors would like to thank to the Barcelona Supercomputing Center for the resources provided to develop and test Micropp code in the architectures: Marenostrum IV \& CTE-POWER during Sep., 2016 and Dic., 2019. The simulations were primary done coupling Micropp with the multi-physics code Alya to solve the macro-scale equations.
\begin{thebibliography}{1}
\bibitem{paper1}G. Giuntoli, J. Aguilar, M. Vazquez, S. Oller and G. Houzeaux. \textit{A FE$^2$ multi-scale implementation for modeling composite materials on distributed architectures}. Coupled Systems Mechanics, 8(2), 2018
\bibitem{simo} J.C. Simo \& T.J.R. Huges.\emph{Computational Ineslasticity}, Springer, 2000.
\bibitem{oller} S. Oller. \emph{Numerical Simulation of Mechanical Behavior of Composite Materials}, Springer, 2014.
\bibitem{paper1}{
G. Giuntoli, J. Aguilar, M. Vazquez, S. Oller and G. Houzeaux.
``An FE$^2$ multi-scale implementation for modeling composite materials on distributed architectures''.
Coupled Systems Mechanics, 8(2), 2018
}
\bibitem{simo}{
J.C. Simo \& T.J.R. Huges.
``Computational Ineslasticity''.
Springer, 2000.
}
\bibitem{cte-power}{
Barcelona Supercomputing Center (2019),
``Power9 CTE User's Guide''.
}
\bibitem{oller}{
S. Oller.
``Numerical Simulation of Mechanical Behavior of Composite Materials''.
Springer, 2014.
}
\end{thebibliography}
\end{document}
......@@ -10,17 +10,12 @@
#SBATCH --exclusive
#SBATCH --time=01:30:00
### #SBATCH --nodes=1
### #SBATCH --ntasks-per-node=4
### #SBATCH --qos=bsc_case
### #SBATCH --qos=debug
N=100
NGP=32
N=5
NGP=5
STEPS=5
N_MPI=5
#EXEC="../build-gpu/test/test3d_2"
EXEC="../test/multi-gpu-mpi"
EXEC="../build/multi-gpu-mpi"
export OMP_NUM_THREADS=1
export CUDA_VISIBLE_DEVICES="0,1,2,3"
......@@ -31,10 +26,10 @@ export CUDA_VISIBLE_DEVICES="0,1,2,3"
rm -rf times.txt
for i in {1..9}; do
for (( i=1; i<=$N_MPI; i++ )); do
echo "running" $i MPI processes
time mpirun -np $i $EXEC $N $NGP $STEPS > output-${N}-${NGP}-${i}.out
mpirun -np $i $EXEC $N $NGP $STEPS > output-${N}-${NGP}-${i}.out
tim=$(awk '/time =/{print $3}' output-${N}-${NGP}-${i}.out)
echo $i $tim >> times.txt
......
......@@ -184,6 +184,8 @@ micropp<tdim>::~micropp()
{
INST_DESTRUCT;
cout << "Calling micropp<" << dim << "> destructor" << endl;
free(elem_stress);
free(elem_strain);
free(elem_type);
......
......@@ -29,6 +29,8 @@ set(testsources
benchmark-elastic.cpp
benchmark-plastic.cpp
benchmark-damage.cpp
benchmark-sol-ass.cpp
benchmark-cpu-gpu.cpp
)
# Iterate over the list above
......
/*
* This is a test example for MicroPP: a finite element library
* to solve microstructural problems for composite materials.
*
* Copyright (C) - 2018 - Jimmy Aguilar Mena <kratsbinovish@gmail.com>
* Guido Giuntoli <gagiuntoli@gmail.com>
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <https://www.gnu.org/licenses/>.
*/
#include <iostream>
#include <iomanip>
#include <ctime>
#include <cassert>
#include <chrono>
#include "micropp.hpp"
using namespace std;
using namespace std::chrono;
const double strain[6] = { 0.001, 0.0, 0., 0., 0., 0. };
class test_t : public micropp<3> {
public:
test_t(micropp_params_t mic_params):micropp<3>(mic_params) {};
~test_t() {};
void newton_raphson(ofstream & file)
{
double cg_err;
ell_matrix A; // Jacobian
const int ns[3] = { nx, ny, nz };
ell_init(&A, dim, dim, ns, CG_ABS_TOL, CG_REL_TOL, CG_MAX_ITS);
double *b = (double *) calloc(nndim, sizeof(double));
double *du = (double *) calloc(nndim, sizeof(double));
double *u = (double *) calloc(nndim, sizeof(double));
cout << "CPU Execution" << endl;
memset(u, 0.0, nndim * sizeof(double));
set_displ_bc(strain, u);
auto time_1 = high_resolution_clock::now();
double norm = assembly_rhs_acc(u, nullptr, b);
auto time_2 = high_resolution_clock::now();
cout << "|r| : " << norm << endl;
auto time_3 = high_resolution_clock::now();
assembly_mat(&A, u, nullptr);
auto time_4 = high_resolution_clock::now();
auto time_5 = high_resolution_clock::now();
int cg_its = ell_solve_cgpd(&A, b, du, &cg_err);
auto time_6 = high_resolution_clock::now();
cout << "CG Its : " << cg_its << endl;
cout << "CG Err : " << cg_err << endl;
for (int i = 0; i < nn * dim; ++i)
u[i] += du[i];
auto time_7 = high_resolution_clock::now();
norm = assembly_rhs_acc(u, nullptr, b);
auto time_8 = high_resolution_clock::now();
cout << "|r| : " << norm << endl;
auto ass_res =
duration_cast<milliseconds>(time_2 - time_1) +
duration_cast<milliseconds>(time_8 - time_7);
auto ass_mat =
duration_cast<milliseconds>(time_4 - time_3);
auto solver =
duration_cast<milliseconds>(time_6 - time_5);
auto ass_tot = ass_res.count() + ass_mat.count();
auto total = solver.count() + ass_tot;
double percentage_ass = (100.0 * ass_tot) / total;
double percentage_sol = (100.0 * solver.count()) / total;
cout << "ass_res : " << ass_res.count() << " ms" << endl;
cout << "ass_mat : " << ass_mat.count() << " ms" << endl;
cout << "ass_tot : " << ass_tot << " ms" << endl;
cout << "solver : " << solver.count() << " ms" << endl;
cout
<< "ass : " << percentage_ass << " \% "
<< "sol : " << percentage_sol << " \%" << endl;
file
<< nx - 1 << "\t"
<< ass_res.count() + ass_mat.count() << "\t"
<< solver.count() << "\t"
<< total << "\t"
<< percentage_ass << "\t"
<< percentage_sol << "\t";
//------------------------------------------------------------
cout << "GPU Execution" << endl;
memset(u, 0.0, nndim * sizeof(double));
set_displ_bc(strain, u);
time_1 = high_resolution_clock::now();
norm = assembly_rhs_acc(u, nullptr, b);
time_2 = high_resolution_clock::now();
cout << "|r| : " << norm << endl;
time_3 = high_resolution_clock::now();
assembly_mat_acc(&A, u, nullptr);
time_4 = high_resolution_clock::now();
time_5 = high_resolution_clock::now();
cg_its = ell_solve_cgpd_acc(&A, b, du, &cg_err);
time_6 = high_resolution_clock::now();
cout << "CG Its : " << cg_its << endl;
cout << "CG Err : " << cg_err << endl;
for (int i = 0; i < nn * dim; ++i)
u[i] += du[i];
time_7 = high_resolution_clock::now();
norm = assembly_rhs_acc(u, nullptr, b);
time_8 = high_resolution_clock::now();
cout << "|r| : " << norm << endl;
ass_res =
duration_cast<milliseconds>(time_2 - time_1) +
duration_cast<milliseconds>(time_8 - time_7);
ass_mat =
duration_cast<milliseconds>(time_4 - time_3);
solver =
duration_cast<milliseconds>(time_6 - time_5);
ass_tot = ass_res.count() + ass_mat.count();
total = solver.count() + ass_tot;
percentage_ass = (100.0 * ass_tot) / total;
percentage_sol = (100.0 * solver.count()) / total;
cout << "ass_res : " << ass_res.count() << " ms" << endl;
cout << "ass_mat : " << ass_mat.count() << " ms" << endl;
cout << "ass_tot : " << ass_tot << " ms" << endl;
cout << "solver : " << solver.count() << " ms" << endl;
cout
<< "ass : " << percentage_ass << " \% "
<< "sol : " << percentage_sol << " \%" << endl;
file
<< ass_res.count() + ass_mat.count() << "\t"
<< solver.count() << "\t"
<< total << "\t"
<< percentage_ass << "\t"
<< percentage_sol << endl;
ell_free(&A);
free(b);