16.Aug.2007
After having looked into the hardware-accelerated bezier curve computations I checked something more difficult and closer to the reality: hardware-accelerated arbitrary polygon tessellation with OpenGL. This topic has been covered by Zack some time ago, spawning a lot of flame (as most of the GNOME vs. KDE performance comparisons do). All benchmarks are flawed, of course.
I used same setup as with my previous experiment. This time I measured the real framerate to make sure that no anomaly occurs due to GL async API. Each frame of the test consists of random flowers being drawn to screen with random parameters. Each flower is a polygon outlined by eight bezier curves. The flower shape is not special/optimized in any way. Any closed polygon made out of any number of curves could be used for this purpose. Summarizing:

The OpenGL rendering seems to be a pixel “fatter” than the cairo version (prolly a bug in my code). The GL output seems to be slightly brighter blended. I guess the significant difference between cairo x surface performance and cairo image surface performance also comes from the intensive blending.
Some optimizations in the GL code could be made to speed up the code even further:
To render the polygons I’m using a variation of the stencil-based algorithm described in the OpenGL Red Book. It relies on a 1bit stencil buffer, which is commonly available. The basic method is:
Worthy benefit of this approach is that it fits (works with) all the standard OpenGL matrix transformations, depth buffer testing, texturing model etc. It can be easily extended with 2d boolean operations. The CPU is not performing any calculations (except the original path calculation which can be offset to the GPU with a vertex shader).
Once in a while I’m getting questions how did I implement hardware-accelerated video color space conversions in Diva. I’m going to write a bit about that soon along with some boolean operations coverage.
Powered by Mephisto with a micro theme mod
5 Comments
There is another way to do this with OpenGL. You implement the flower drawing algorithm inside the shader. For each x/y presented to the shader, use the flower algorithm decides what color to make the pixel. Then you just draw 200 rectangle using the shader and parameters.
Tessellation occurs inside the shader on a pixel/pixel basis as you evaluate the color to draw the pixel. You can also implement antialiasing using the same technique by oversampling.
http://www.loria.fr/%7Elevy/publications/papers/2005/VTM/vtm.pdf http://research.microsoft.com/%7Ecloop/LoopBlinn05.pdf
Hi,
cool stuff,
I ran sysprof quickly on the cairo xlib code and it shows about a third of the time being spent in creating the gradient (gradientwalker_pixel)
Hi Michael,
Thanks for sharing another very interesting demo/benchmark.
Of course, I look at something like this and think about how a technique like this might be used inside cairo. Here are a couple of issues that would need to be addressed first:
Does this technique handle complex, (self-intersecting), polygons that need to be filled with the winding rather than the even-odd rule? Cairo applications can select and draw with either fill rule, but the winding rule is particularly important for getting correct results for the internal case of generating a polygon to be filled when computing cairo_stroke for example.
Of course, it would be necessary to fix the bugs you mentioned above, (things are too large, and the blending is wrong somewhere). And there does appear to be at least another bug where a single pixel is noticeably wrong on some of the flowers, (could these be the “anti-aliasing artifact pixels” you mention? I wasn’t sure if you were pointing out a bug there, or a technique for avoiding a bug).
Assuming those were resolved, (and presumably anything that’s just bug fixing would be simple enough to fix), there’s still a question of where/how this code could integrate with cairo. Here are some options along with advantages/disadvantages:
An OpenGL-using backend separate from the xlib backend. This would be putting the implementation into something like glitz or a glitz replacement. There is a fill() function in the backend interface that accepts the complete polygon, so everything from tessellation down would be available to be accelerated. This approach has the drawback that applications/toolkits would have to be specifically ported to use this new backend rather than using the xlib backend.
Inside the xlib backend. That is, using OpenGL implicitly within the xlib backend when available. I’m not sure if there’s any OpenGL-specific setup information that would be required from the application, (I don’t think so as an early prototype of GTK+/cairo integration allowed an environment variable to select whether the xlib or glitz backend was used). One difficulty would be to avoid using OpenGL when it was entirely unaccelerated, (where the stencil technique might actually lead to a slowdown).
Inside the X server, using OpenGL to accelerate Render. Here the question of whether or not OpenGL is accelerated and usable is much easier to answer, (since the X server knows what driver is being used, users can configure the X server, etc.). The current Render extension does not have a request to draw an entire polygon, (it assumes client-side tessellation), but that’s something that could be added to the X server if necessary. Meanwhile, I don’t think it is necessary, as a very quick look at profiles of this demo, (I’ll post more details when I’m back from vacation), suggest that only a very minimal amount of time is spent tessellating in the flower-cairo program, (instead it’s software-generated gradients and software fallbacks for rasterization and compositing that are killing things).
Inside the X server, without using OpenGL. It’s really just a minor implementation detail that separates this approach from #3. And again, if it’s not tessellation that’s making the current cairo appraoches go slow, then just accelerating trapezoid rasterization and compositing inside the X server should be sufficient.
I’ve been working toward #4 in my recent efforts to accelerate EXA, particularly on the Intel 965. I’ve started that with text rendering which includes the compositing, but trapezoids will come next. (And a demo like this should be quite useful.)
So thanks again, and as I said before, I’ll probably have more details to share when I’m back next week.
-Carl
The OpenGL version seems fatter because the cairo version is filled but not stroked, whereas the OpenGL version is both filled and stroked. It is filled using the stencil buffer trick, but it’s effectively a 1-bit deep stencil, and hence aliased. You then stroke the flowers with a 1-pixel wide (OpenGL’s default), GLLINESMOOTH line to get an anti-aliased look. Since the stroke is centered on the path (rather than entirely inside it), the result is fatter.
The OpenGL version seems brighter because the cairo version’s gradient goes from y=-1 to y=+1 in user-space (which is what you pass to cairopatterncreate_linear), but the OpenGL gradient goes from -0.9 to 0.9, since you’re going off the actual min/max x and y values (i.e. the bounding box) of your paths. You go to the effort of calculating the maximum x in your loop over of all the vertices, but it’s always just 0.9 for your flower Beziers, and I suspect that, in general, they could be calculated analytically really easily.
P.S. your comments field is turning words surround by _ underscores_ into italics.
@Carl: Thanks for the excellent feedback. I’ll get back to you on the cairo mailing list, since I think it’s a better place to discuss details. The use case that’s most interesting to me at this point would be solved with a glitz-like backend (ie. using cairo with clutter). I think it’s a good start and things can be moved down the stack later on when they prove worthy.
@Nigel: Thanks for spotting the problems. I’m working on a different alghorithm all together (using shaders and shader-implemented anti-alias) that should solve the problems. Will publish results soon.
The comment field accepts markdown hence the italics.
Sorry, comments are closed for this article.