E. Rustico, J. Jankowski, A. Hérault, G. Bilotta and C. Del Negro
Abstract: We present a restructured version of GPUSPH, a CUDA-based implementation of SPH. The new version is extended to allow execution on multiple GPUs on one or more host nodes, making it possible to concurrently exploit hundreds of devices across a network, allowing the simulation on larger domains and at higher resolutions. Partitioning of the computational domain is not limited anymore to parallel planes and can follow arbitrary, user-defined shapes at the resolution of individual cells, where the cell is defined by the auxiliary grid used for fast neighbor search. This allows optimal partitioning even in the case of complex domains, such as rivers with U-turns. The version we present also includes many additional features that have been developed on GPUSPH. Particularly important are: the uniform precision work by Hérault et al. which is essential for numerical robustness in the case of very large ratios between the domain size and particle resolution; a compact neighbor list, which allows larger subdomains to be loaded on each device; the semi-analytical boundary conditions by Ferrand et al., and support for floating objects. All of these features are seamlessly supported in single-GPU, multi-GPU and multi-node modes.
Reference: E. Rustico, J. Jankowski, A. Hérault, G. Bilotta and C. Del Negro (2014) Multi-GPU, Multi-Node SPH Implementation with Arbitrary Domain Decomposition. Proceedings of the 9th International SPHeric Workshop, Paris, France, June 3-5 2014.