The select
expression in Halide is similar to if
or switch
statement in C. That is to say, the expression select(cond, a, b)
will evaluate to a
if cond
is true
, otherwise it evaluates to b
.
So, I thought select
evaluates only one branch between a
and b
, just like if
does. But it turns out not true, I found select
nearly always evaluates both branches. This is annoying especially when a
or b
is very costly. Maybe I'm not using Halide in the right way?
This problem was asked a long time ago in here, however the answer in that post cannot solve my problem. Here is some actual code:
Var x("x"), y("y"); Func f("f"), g("g"); f(x, y) = x >= 10000; // a1. f is calculated // f(x, y) = cast<bool>(0); // a2. f is constant Expr power = pow(pow(x, x), y); g(x, y) = select(f(x, y), power, 0.f); f.compute_root(); // b1. compute f in advance // f.compute_inline(); // b2. compute f inline Buffer<float> result(1000, 1000); g.realize(result); double time = Tools::benchmark(3, 100, [&] { g.realize(result); }); cout << "time: " << time * 1e3 << " ms" << endl;
In this example, select
evaluates to power
(a costly function) or 0.f
depending on f(x, y)
. And f(x, y)
always evaluates to false
, which means select
should always evaluates to 0.f
. There are some different test choices:
- a1: The value of
f
is calculated, and the result will always befalse
- a2:
f
is set to the constant valuefalse
- b1: Compute
f
in advance - b2: Compute
f
inline
The test result (time cost) is:
- a1+b1: 4.22 ms
- a1+b2: 4.15 ms
- a2+b1: 4.15 ms
- a2+b2: 0.0788 ms
It seems clearly that power
is evaluated in the first three cases, and it should not happen.
An other example is my FAST corner detector. If the threshold
is set to 256 (which can never be reached), is_corner
will always be false
. However, select
in L50 always evaluates score_val
(very time-consuming) unless vectorize
in L64 is disabled.
没有评论:
发表评论