Professional Documents
Culture Documents
= +
Interactive
apps
Architecture
& design
Movie
production
Both matter
=
Mrays/s
Effective performance
6
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
Mrays/s
SODA (2.2M tris)
NVIDIA GTX Titan
Diffuse rays
Effective performance
7
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
SBVH
[Stich et al. 2009]
(CPU, 4 cores)
Build time
dominates
Mrays/s
Effective performance
8
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
HLBVH
[Garanzha et al. 2011]
(GPU)
???
Mrays/s
SBVH
[Stich et al. 2009]
(CPU, 4 cores)
Effective performance
9
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
Our method
(GPU)
Mrays/s
HLBVH
[Garanzha et al. 2011]
(GPU)
SBVH
[Stich et al. 2009]
(CPU, 4 cores)
0
50
100
150
200
250
300
350
400
450
1M 10M 100M 1G 10G 100G 1T
Effective performance
10
30M500G
rays/frame
97% of
SBVH
Mrays/s
Best qualityspeed tradeoff for wide range of
applications
Treelet restructuring
11
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet restructuring
12
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
R
Treelet restructuring
13
Treelet root
Treelet leaf
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
R
Treelet restructuring
14
Treelet root
Treelet leaf
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
Grow
R
Treelet restructuring
15
Treelet
internal
node
Grow
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
R
Treelet restructuring
16
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
R
Treelet restructuring
17
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
R
Treelet restructuring
18
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
Treelet restructuring
19
R
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
Largest leaves best results
Treelet restructuring
20
R
C
A B
F
D
G
E
= 7
treelet leaves
1 = 6
treelet internal
nodes
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
Largest leaves best results
Valid binary tree in itself
Treelet restructuring
21
R
C
A B
F
D
G
E
Actual
BVH leaf
Arbitrary
subtree
Idea
Build a low-quality BVH
Optimize its node topology
Look at multiple nodes at once
Treelet
Subset of a nodes descendants
Grow by turning leaves into
internal nodes
Largest leaves best results
Valid binary tree in itself
Leaves can represent arbitrary
subtrees of the BVH
Treelet restructuring
22
R
C
A B
F
D
G
E
Restructuring
Construct optimal binary tree
for the same set of leaves
Replace old treelet
Treelet restructuring
23
C
A B
F
D
G
E
R
Restructuring
Construct optimal binary tree
for the same set of leaves
Replace old treelet
Reuse the same nodes
Update connectivity and AABBs
New AABBs should be smaller
D
E
G
Treelet restructuring
24
A
R
C
F
B
Restructuring
Construct optimal binary tree
for the same set of leaves
Replace old treelet
Reuse the same nodes
Update connectivity and AABBs
New AABBs should be smaller
Treelet restructuring
25
C A B F
D
G E
R
Restructuring
Construct optimal binary tree
for the same set of leaves
Replace old treelet
Reuse the same nodes
Update connectivity and AABBs
New AABBs should be smaller
Treelet restructuring
26
R
C A B F
D
G E
Restructuring
Construct optimal binary tree
for the same set of leaves
Replace old treelet
Reuse the same nodes
Update connectivity and AABBs
New AABBs should be smaller
Perfectly localized operation
Leaves and their subtrees are
kept intact
No need to look at subtree
contents
Processing stages
27
Initial BVH construction
Post-processing
Optimization
Input triangles
One triangle
per leaf
Processing stages
28
Initial BVH construction
Post-processing
Optimization
Parallel LBVH
[Karras 2012]
60-bit Morton codes
for accurate spatial
partitioning
Processing stages
29
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Restructure multiple
treelets in parallel
Processing stages
30
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Processing stages
31
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Processing stages
32
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Processing stages
33
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Processing stages
34
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Processing stages
35
Initial BVH construction
Post-processing
Optimization
Parallel bottom-up traversal
[Karras 2012]
Strict bottom-up order
no overlap between treelets
Processing stages
36
Initial BVH construction
Post-processing
Rinse and repeat
(3 times is plenty)
Optimization
Processing stages
37
Initial BVH construction
Post-processing
Optimization
Collapse subtrees
into leaf nodes
Processing stages
38
Initial BVH construction
Post-processing
Optimization
Collect triangles
into linear lists
Prepare them for Woops
intersection test
[Woop 2004]
Processing stages
39
Initial BVH construction
Post-processing
Optimization
Fast GPU ray traversal
[Aila et al. 2012]
Processing stages
40
Initial BVH construction
Post-processing
Optimization
Triangle splitting
Fast GPU ray traversal
[Aila et al. 2012]
Processing stages
41
Initial BVH construction
Post-processing
Optimization
Triangle splitting
Fast GPU ray traversal
[Aila et al. 2012]
0.4 ms
5.4 ms
6.6 ms
17.0 ms
21.4 ms
1.2 ms
1.6 ms
DRAGON (870K tris)
NVIDIA GTX Titan
23.6 ms / 30.0 ms
No splits
30% splits
Cost model
Surface area cost model
[Goldsmith and Salmon 1987], [MacDonald and Booth 1990]
()
(root)
+
root
()
Track cost and triangle count of each subtree
Minimize SAH cost of the final BVH
Make collapsing decisions already during optimization
Unified processing of leaves and internal nodes
42
Optimal restructuring
43
Finding the optimal node topology is NP-hard
Naive algorithm !
Our approach 3
1 3
Crosses an important
spatial median plane?
Has large potential for
reducing surface area?
Concentrate on triangles
where both apply
but leave
something for
the rest, too
Results
Compare against 4 CPU and 3 GPU builders
4-core i7 930, NVIDIA GTX Titan
Average of 20 test scenes, multiple viewpoints
71
Ray tracing performance
72
SweepSAH
[MacDonald]
SBVH
[Stich]
Tree
rotations
[Kensler]
Iterative
reinsertion
[Bittner]
0%
20%
40%
60%
80%
100%
120%
140%
SweepSAH = 100%
High-quality CPU builders
SBVH = 131%
0%
20%
40%
60%
80%
100%
120%
140%
Ray tracing performance
73
SweepSAH
[MacDonald]
SBVH
[Stich]
Tree
rotations
[Kensler]
Iterative
reinsertion
[Bittner]
LBVH
[Karras]
HLBVH
[Garanzha]
GridSAH
[Garanzha]
Fast GPU builders
67% 69%
SBVH = 131%
Almost 2
Ray tracing performance
74
SweepSAH
[MacDonald]
SBVH
[Stich]
Tree
rotations
[Kensler]
Iterative
reinsertion
[Bittner]
TRBVH TRBVH
+30% split
LBVH
[Karras]
HLBVH
[Garanzha]
GridSAH
[Garanzha]
0%
20%
40%
60%
80%
100%
120%
140%
No splits
30% splits
Our method
Ray tracing performance
75
SweepSAH
[MacDonald]
SBVH
[Stich]
Tree
rotations
[Kensler]
Iterative
reinsertion
[Bittner]
TRBVH TRBVH
+30% split
LBVH
[Karras]
HLBVH
[Garanzha]
GridSAH
[Garanzha]
0%
20%
40%
60%
80%
100%
120%
140%
96% of SweepSAH
91% of SBVH
Effective performance
76
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
Tree rotations
[Kensler]
Iterative reinsertion
[Bittner]
Not Pareto-optimal
SBVH
[Stich]
SweepSAH
[MacDonald]
Effective performance
77
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
SBVH
[Stich]
SweepSAH
[MacDonald]
Effective performance
78
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
HLBVH
[Garanzha]
GridSAH
[Garanzha]
SBVH
[Stich]
SweepSAH
[MacDonald]
LBVH
[Karras]
Not Pareto-optimal
Effective performance
79
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
SBVH
[Stich]
SweepSAH
[MacDonald]
LBVH
[Karras]
Effective performance
80
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
SBVH
[Stich]
Our method
(no splits)
SweepSAH
[MacDonald]
LBVH
[Karras]
Our method
(30% splits)
Effective performance
81
0%
20%
40%
60%
80%
100%
120%
140%
1M 10M 100M 1G 10G 100G 1T
7M60G rays/frame
our method is the best choice
Below 7M
LBVH
Above 60G
SBVH
Conclusion
General framework for optimizing trees
Inherently parallel
Approximate restructuring larger treelets?
Practical GPU-based BVH builder
Best choice in a large class of applications
Adjustable qualityspeed tradeoff
Will be integrated into NVIDIA OptiX
82
83
Thank you
Acknowledgements
Samuli Laine
Jaakko Lehtinen
Sami Liedes
David McAllister
Anonymous reviewers
Anat Grynberg and Greg Ward for CONFERENCE
University of Utah for FAIRY
Marko Dabrovic for SIBENIK
Ryan Vance for BUBS
Samuli Laine for HAIRBALL and VEGETATION
Guillermo Leal Laguno for SANMIGUEL
Jonathan Good for ARABIC, BABYLONIAN and ITALIAN
Stanford Computer Graphics Laboratory for ARMADILLO, BUDDHA and DRAGON
Cornell University for BAR
Georgia Institute of Technology for BLADE