11/19
New Project Schedule:
11/20 - 11/26 : develop a plan for parallelizing kernel 5 (Gabi and Elora)
11/26 - 12/3 : implement the plan for parallelizing kernel 5
12/3 - 12/5: implement kernel 6 in parallel (Gabi)
12/5 - 12/9: Clean up and optimize all code
During the first two weeks working on the project, we were stuck on constructing the first two kernels in parallel. The first kernel as described in the paper we are following required a cuda array reduction to find all of the minimums and maximums to construct a bounding box for the root of our tree. The second kernel as described required the construction of a tree to be represented within arrays, so that cuda could function on arrays rather than on objects. Both of these kernels were much more difficult to work on than we initially thought they would be, leading us to meet with Professor Railing and deciding to shift the schedule of our project. Instead of making all of the kernels parallel, we will be having only one or two kernels be parallel, and at this time, we have kernels 1, 2, 3, and 5 working sequentially. (GO BACK TO THIS IF THIS IS NOT TRUE). This means that the octree is constructed sequentially, the centers of mass are found sequentially, and the computation of the forces applied to each body are found sequentially.
With regards to our goals and deliverables, we will no longer be implementing the algorithm in the manner the paper describes. Our goal is to still use their idea of representing our octree through arrays, but most of the steps will be implemented sequentially, and our main goal is to implement kernel 5, which calculates the forces on each body, in parallel. We also want to update the positions of the bodies in parallel. All kernels have to run in each time step, so we will be attempting to optimize all kernels at the end. Unfortunately, since the visualization is not an aspect of parallelization we are concerned about, we won’t be spending time on creating this “nice to have” part. At the poster session, we hope to use graphs to show the difference between running data on the fully sequential algorithm and running data on the algorithm that calculates the forces on the body and updates their positions in parallel.
List of New Goals and Deliverables:
Our concerns include finding an efficient parallel algorithm for finding the forces acting on a body in parallel and finding a way for our program to run at a reasonable rate when the rest of the steps are running sequentially. There are a lot of unknowns regarding whether our ideas for kernel 5 would work, and whether it will be a lot better than having it run sequentially.
New Project Schedule:
11/20 - 11/26 : develop a plan for parallelizing kernel 5 (Gabi and Elora)
11/26 - 12/3 : implement the plan for parallelizing kernel 5
12/3 - 12/5: implement kernel 6 in parallel (Gabi)
12/5 - 12/9: Clean up and optimize all code
- Kernel 1: Gabi
- Kernel 2: Elora
- Kernel 3: Gabi
- Kernel 5: TBD
During the first two weeks working on the project, we were stuck on constructing the first two kernels in parallel. The first kernel as described in the paper we are following required a cuda array reduction to find all of the minimums and maximums to construct a bounding box for the root of our tree. The second kernel as described required the construction of a tree to be represented within arrays, so that cuda could function on arrays rather than on objects. Both of these kernels were much more difficult to work on than we initially thought they would be, leading us to meet with Professor Railing and deciding to shift the schedule of our project. Instead of making all of the kernels parallel, we will be having only one or two kernels be parallel, and at this time, we have kernels 1, 2, 3, and 5 working sequentially. (GO BACK TO THIS IF THIS IS NOT TRUE). This means that the octree is constructed sequentially, the centers of mass are found sequentially, and the computation of the forces applied to each body are found sequentially.
With regards to our goals and deliverables, we will no longer be implementing the algorithm in the manner the paper describes. Our goal is to still use their idea of representing our octree through arrays, but most of the steps will be implemented sequentially, and our main goal is to implement kernel 5, which calculates the forces on each body, in parallel. We also want to update the positions of the bodies in parallel. All kernels have to run in each time step, so we will be attempting to optimize all kernels at the end. Unfortunately, since the visualization is not an aspect of parallelization we are concerned about, we won’t be spending time on creating this “nice to have” part. At the poster session, we hope to use graphs to show the difference between running data on the fully sequential algorithm and running data on the algorithm that calculates the forces on the body and updates their positions in parallel.
List of New Goals and Deliverables:
- Represent the octree using arrays
- Compute the forces acting on a body and update the positions of bodies based off of these forces in parallel
- Make and demonstrate graphs showing difference between fully sequential algorithm vs. partially parallel algorithm
Our concerns include finding an efficient parallel algorithm for finding the forces acting on a body in parallel and finding a way for our program to run at a reasonable rate when the rest of the steps are running sequentially. There are a lot of unknowns regarding whether our ideas for kernel 5 would work, and whether it will be a lot better than having it run sequentially.