2021年4月11日星期日

Passing CuPy CUDA device pointer to pybind11

I am trying to instantiate an array in GPU memory using CuPy and then pass the pointer to this array to C++ using pybind11.

A minimal example of the problem I am running into is shown below.

Python

import demolib #compiled pybind11 library  import cupy as cp    x = cp.ones(100000)  y = cp.ones(100000)    demolib.pyadd(len(x),x.data.ptr,y.data.ptr)    

C++/CUDA

#include <iostream>  #include <math.h>  #include <cuda_runtime.h>  #include <pybind11/pybind11.h>  #include <pybind11/numpy.h>    namespace py = pybind11;    // Error Checking Function  #define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }  inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)  {     if (code != cudaSuccess)     {        fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);        if (abort) exit(code);     }  }    // Simple CUDA kernel  __global__  void cuadd(int n, float *x, float *y)  {    int index = blockIdx.x * blockDim.x + threadIdx.x;    int stride = blockDim.x * gridDim.x;    for (int i = index; i < n; i += stride)      y[i] = x[i] + y[i];  }    // Simple wrapper function to be exposed to Python  int pyadd(int N, float *x, float *y)  {      // Run kernel on 1M elements on the GPU    int blockSize = 256;    int numBlocks = (N + blockSize - 1) / blockSize;    cuadd<<<numBlocks, blockSize>>>(N,x,y);      // Wait for GPU to finish before accessing on host    gpuErrchk( cudaPeekAtLastError() );    gpuErrchk( cudaDeviceSynchronize() );      return 0;  }    PYBIND11_MODULE(demolib, m) {          m.doc() = "pybind11 example plugin"; // optional module docstring          m.def("pyadd", &pyadd, "A function which adds two numbers");  }  

The code throws the following error:

GPUassert: an illegal memory access was encountered /home/tbm/cuda/add_pybind.cu 47  

I realize that this specific example could be implemented using a cupy user defined kernel, but the end goal is to be able to do zero-copy passes of cupy arrays into a larger codebase which would be prohibitive to rewrite in this paradigm.

I have also located this GitHub Issue, which is the the reverse of what I'm trying to do.

https://stackoverflow.com/questions/66989716/passing-cupy-cuda-device-pointer-to-pybind11 April 07, 2021 at 11:54PM

没有评论:

发表评论