A colleague dropped some code on me and asked why applying a semi-transparent mask to images was so slow. Apparently it was written by a “strong algorithm engineer” from another team. Here it is:
| |
My first reaction: this is pretty careless. Python is easy to write — that’s true — but that doesn’t mean anything goes.
The problem is obvious: the mask is blown up to the full image size before calling addWeighted. addWeighted is already not the fastest operation; doing it on a full-resolution frame when your mask is tiny makes it much worse. Why not blend only the small bounding-box region and copy it back?
Here’s the fix — this isn’t even an algorithm, it’s just basic engineering intuition: process fewer pixels.
| |
The result was dramatically faster — the sluggishness completely disappeared.
Why? Our mask is typically 200×200 pixels; the source image is 1960×1080. That’s roughly a 50× difference in pixel count per operation. Multiply that by tens of thousands of images processed per hour, and a 50× per-frame inefficiency becomes very significant in aggregate.
Write code that respects efficiency. End of rant.