2021年5月7日星期五

Monte Carlo Pi estimation in Python on GPU using Numba Cuda.jit

So I am trying to run my program on Google Colab using their Tesla T4 GPU available. I am using Numba to implement the @cuda.jit and I am wondering why when I do my estimation, I am getting that it will run faster on CPU rather than GPU. Is there something wrong with my implementation of the GPU code or should it just not run faster for this, I assumed it should run faster for the Monte Carlo method. And I am sure there are faster ways to do this but I am just trying to do it simplistically and how it makes sense to me first before I further optimize it.

import numpy as np  import matplotlib.pyplot as plt  import time  from random import *  from numba import jit, cuda, njit  from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32    # This is the 10 sphere pi estimation using Monte Carlo.   def pi_value(trial):      hit = 0      for i in range(trial):          x1 = random()          x2 = random()          x3 = random()          x4 = random()          x5 = random()          x6 = random()          x7 = random()          x8 = random()          x9 = random()          x10 = random()          if (x1**2+x2**2+x3**2+x4**2+x5**2+x6**2+x6**2+x8**2+x9**2+x10**2)**(1/2) <= 1:              hit += 1      return hit    iter10 = 10000000  dimen = 10  start = time.time()  hit = pi_value(iter10)  end = time.time()  start1 = time.time()  hit1 = pi_value(iter10)  end1 = time.time()  run_time = end1 - start1  piv = (122880 * (hit1 / iter10))**(1/5)  print("For the {dimen} sphere with {trials} random points, the value of pi is estimated to    be {pi}, and executed in {run_time} seconds.".format(dimen=dimen, trials=iterations, pi=piv,   run_time=run_time))    # This is the 10 sphere run on GPU  @cuda.jit  def pi_value(rng_states, iterations, out):      thread_id = cuda.grid(1)      hit = 0      for i in range(iterations):          x1 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x2 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x3 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x4 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x5 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x6 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x7 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x8 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x9 = xoroshiro128p_uniform_float32(rng_states, thread_id)          x10 = xoroshiro128p_uniform_float32(rng_states, thread_id)          if (x1**2+x2**2+x3**2+x4**2+x5**2+x6**2+x6**2+x8**2+x9**2+x10**2)**(1/2) <= 1:              hit += 1        out[thread_id] = (122880 * (hit / iterations))**(1/5)      threads_per_block = 128  blocks = 32  rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)  out = np.zeros(threads_per_block * blocks, dtype=np.float32)    pi_value[blocks, threads_per_block](rng_states, 10000000, out)  print('pi:', out.mean())  

Any help would be appreciated thanks!

https://stackoverflow.com/questions/67443978/monte-carlo-pi-estimation-in-python-on-gpu-using-numba-cuda-jit May 08, 2021 at 12:04PM

没有评论:

发表评论