A few days ago, I discovered I had at least one, maybe two mathematical bugs. The first bug is in the line clipping algorithm. Here’s a quick reminder of what’s going on there:
There are six points of interest: the four intersection points, and the two endpoints of the line. We sort the six points according to their parameter, and take the two innermost points as the endpoints of the clipped line. To check if the line is outside the box we make sure that the result points’ parameters are between 0 and 1 (which correspond to the original endpoints).
Turns out this is not always true – here’s a counter example:
In the counter example you can see a line, whose endpoints lie between the intersections. The algorithm described above will say this line lies completely inside the box. Clearly this is not true. How do we fix this? Simple actually. We check for each intersection point whether or not it is on the edges of the clipping rectangle. If at most one point is on – our line is outside. I hope this is the end of it. It will require more rigorous testing.
The second bug was in my solution to the random selection challenge. I wasn’t too sure of my solution, but I really liked it, as its code is rather elegant. So when meeting someone I used to work with, I told her about it. After thinking about my solution a bit, she said it is probably wrong, and gave me the reason: While the probability of a point x passing the first stage is indeed (proportional to) p(x), the probability of it passing the second stage is not 1 (or constant). I still need to think about this one, so no easy fix yet. I guess this reopens the challenge, so if anyone feels like writing a good proof or a counter example for my solution, I’d be happy to see it. I’ll be even happier to see a completely different solution.
Actually, I think your solution is correct. Essentially you are sampling a random point from the area under the curve.
More formally, denote your random variable by z. Then, for x to the left of the curve’s symmetry axis, Pr[z <= x] is the integral from P(a) to P(x) of (x – P^{-1}(y)) dy, which if you think about it, is exactly the integral of P(x) over [a,x]. The argument for the other x values is analogous. (All of this assumes that P(x) is itself a proper distribution, otherwise you need to divide by the area under the curve.)
I don’t see why the fact that the probability for passing the 2nd stage isn’t constant is relevant, since these are not independent samples…
Thanks for the proof!
Just the statement “sampling a random point from the area under the curve” is more elegant and understandable than my original proof.