I wrote a ray-tracer in C on an Amiga in 1987, when I was in high school, for a science fair project. Here's an image captured from an emulator in May of 2013, thanks to my brother, Bart Grantham. He ran the original executable to save this image.
|
This is an image from a 26-Year-Old AmigaDOS Executable. It would have taken all night to render in 1987.
|
Click and drag with left button. Mouse/Trackpad scrollwheel zooms. |
I'm one of the generation of programmers who grew up and discovered personal computers as the term was being invented. We had a TRS-80 Model II, then a TI 99/4A, then an Apple //e, then an Amiga 500. I learned how to program in various dialects of BASIC and had a little success with assembly language.
I saw Tron in 1982 in the theater and was thrown. Computers could make images like that? I wanted to learn how. I was 12; I was full of excitement and energy but not very focused and didn't really know how to systematically learn the things I needed to learn. I looked around at computer graphics books at the library, not understanding much about the algorithms. I asked other middle school students about "cosine" and "sine," and how they and other things worked.
I had the coffee table book Creative Computer Graphics, bought on a trip to the Smithsonian. In it were the promise of rich images created entirely by computers. One of the fascinating things to me about that book now is the diversity of images, running the gamut from interactive line rendering for CAD to entertainment to flight simulators.
In fact, we had the subLOGIC flight simulator FS1 on our Apple //e, which amazingly drew a 3D view of (admittedly very simple) terrain interactively. I think this is what made me wonder how straight lines were drawn on a screen using pixels.
I started high school in 1984 and spent a lot of my time playing on the Apple //e, writing little bits of BASIC code. I got as far as writing a line-drawing routine in assembler (which never did quite work right), and some assembly routines that could save bits from the cassette port and play them back to the speaker.
When Christmas 1986 brought us an Amiga 500, of course we were dazzled by the Amiga "Boing" demo. But what really gave me momentum was "the Juggler" by Eric Graham, which came out in 1987 and I probably saw later that year.
I needed to do find out if I could do that same thing.
We had a copy of the Aztec C compiler (probably version 1 or 2). I seem to remember not having much in the way of documentation for Aztec C, but rather had the published books for AmigaDOS' runtime, including the Intuition programming interface.
But the most sophisticated feature of AmigaDOS that I used was just HAM mode.
The data file format represents only matte colored spheres, and looks something like this:
File Contents | Meaning |
---|---|
Some spheres 0;80;50 2;2 15;15;15;20;7 3 30;5;30;5;15;15;0;80 30;10;30;5;15;15;0;80 30;15;30;5;15;15;0;80 18;15;-40 0;0;0 40 |
Title line interlace, width, height X and Y subsampling background color and randomness number of spheres x, y, z, r, then color ... ... eye x, y, z yaw, pitch, roll field of view angle |
The program read all of this with a very simple tokenizer into a structure allocated on the heap, rendered the image to the screen, and then waited. I think I must have used a screenshot tool to copy the output to disk if I saved any images at all, as I don't see image saving in the source.
Here's the data file for just my first initial.
Operation is pretty straightforward; rays are generated for each pixel, those rays are tested against every sphere in the scene, and then the closest hit is shaded with the Z component of the normal at that hit. The shading is thus just Lambertian, assuming a light behind the viewer, infinitely far away. Spheres couldn't be transformed, meaning they couldn't be squashed or stretched.
I used HAM mode in order to maximize my color palette. In HAM mode, during scanout left-to-right, each pixel could change the 4-bit red, green, or blue portion of the currently displayed 12-bit color. While rendering, I kept track of the currently drawn color and updated only the components that had changed. In practice, two or three of the components changed often, so the image has a kind of horizontal smeared quality to it.
There are no reflections, no shadows, no infinite planes with checkboards, no texturing, and no polygons. Kevin Beason wrote a sphere path-tracer in 99 lines of C++ (which is pretty amazing) but I think what I did was reasonable for 500 lines of clumsy C written when I was 16.
I didn't have a good understanding of an "image plane" or a "camera model" at the time. I thought that camera rays were generated by dividing the field of view into equal angles in X and Y. It's probably closest to an equidistant fisheye lens projection, but it's essentially latitude and longitude lines on a sphere. I realize now that the visualization in my head at the time of what the math was doing was not quite right.
There were several things that seemed suspicious to me even then; what happens when the field of view angle is near or greater than 180 degrees; what if the Y field of view is very tall but the X field is very narrow? I don't think I ever rendered any field of view beyond 40 degrees.
I combined that with the idea of camera yaw (rotation around Y), pitch (rotation around rotated X), and roll (rotation around the subsequent view direction). These implement three gimbals.
I never generated enough images to test it, but it turns out the camera model in the code almost works. Turning left and right (yaw) works fine. The projection appears fine for small to medium angles and has the odd characteristic of a given sphere at a given distance from the eye having equal area no matter where it falls in image X and Y.
OpenGL samples the image plane on a rectangular grid. This better matches display on a flat monitor, like an LCD. But the perspective projection in OpenGL distorts shapes at the outside of the image angles, if the angle the image subtends on the display device from the viewer is not the same as the field of view in the camera model.
I did have the notion of aspect ratio. I think the above image was more-or-less correct on the monitor I had at the time.
In any case, it turns out that my camera ray generation code was actually broken. Tilting up or down really distorted the image, and rolling had some kind of trig error and really just didn't work.
I guess I knew at the time that I wanted to move beyond simple shading. However, I knew bump-mapping, anisotropy, and distribution ray-tracing were beyond my understanding at the time, so I simply added a random value to the shaded color. This simple technique did add visual detail and looked better to my eyes, and added an interesting character to the images in combination with HAM mode.
These days you can see the value of path-tracing, texturing, BRDF, and area light sources on the beauty of synthetic images, in which noise matches the actual energy distribution in the scene. I rendered the following image with a different raytracer I wrote later.
|
|
Finally, I think at the time it was apparent that the edges of objects would have jaggies (aliasing) so I had a supersampling factor for both X and Y. The noisiness of HAM pixels probably overwhelmed that.
Unfortunately I'm missing a machine-readable version of a few of the files. But here's main() and the most significant functions. It's what you'd expect from a 16-year-old who just learned C; a couple of errors, no spaces around operators, cryptic error messages, gotos...
I set about bringing this relic to life, mostly out of curiosity.
Here's source for the restored command-line renderer.>
|
|
I ported the code to C++ and libc and abandoned windowed output in favor of the very simple PNM image file format. I had to replace AllocMem with new (and malloc in a couple of instances) and FreeMem with delete and free. I added an OpenMP pragma to optimize the pixel loop. This took just a couple of hours. It was really gratifying to see new pixels made from code that was largely still 26 years old. And it took just seconds for a 512 by 512 image; back in high school a 320 by 200 image took many hours. It was a relief to know that I could look at my my old code, which had taken me months to write, and port it and bring it up to date in just hours.
With a working version the program in hand, I thought it would be fun to push it a little further.
I left much of the old code in place and used much of the 26-year-old style.
You can have the code if you'd like, as a reference.
I limited myself to the spirit of the intersection code, shading method, projection model, and camera model I used 26 years ago. I guess I wanted to imagine what had happened if I had decided "okay, this is good" and then went on to add more optimization and more features. I wanted to be able to use the original inits.ray data file that I still had lying around, and then create more models using the same format that I could potentially have read with the original program. In this way I can squint and pretend that this was where I was headed if I had stayed focused.
I added a simple bounding volume hierarchy.
First the BVH construction code finds the centroid of the current set of spheres by averaging their centers.
The algorithm makes a split plane from the centroid and the axis which has the largest extent in the axis-aligned bounding box containing the current set of spheres.
The algorithm creates two new lists of spheres whose centers lie in the half-spaces on either side of the plane.
Construction recurses on those two lists, making leaf nodes when a list is smaller than a predefined maximum number of spheres, or at a predefined maximum number of levels. The maximum depth restriction takes precedence, so leaf nodes could contain many spheres.
Leaf nodes contain the list of spheres to trace..
Every node in the hierarchy has an axis-aligned bounding box that contains of all the spheres of its children. The bounding boxes of the two children of a branching node may actually overlap each other, and this is okay as long as they the volumes are getting progressively smaller.
During intersection, each node checks to see that the ray intersects the node's bounding box. If it does not, then it terminates. If it does, then it traverses the node's children. In C++, the traversal of the BVH is depth-first, using a stack. In the fragment shader, I used "ropes" (Popov et al, 2007) to avoid using a stack, since a stack caused very high register usage and thus very few threads can be active at a time.
The existing eye position and orientation from the file is discarded and a new one created that places the eye at Z with the scene fully in view. The mouse rotates the camera and light using simple trackball math.
While working on this, I fixed the bug in the camera ray generation which distorted the image when the pitch angle wasn't 0. (Oldest bug I've ever fixed, I think.)
The program can render on the CPU and copy to a GL texture and just render a textured quad, or it can render on the GPU entirely using a GLSL 3.00 shader to traverse the BVH tree, intersect the spheres, and shade them. I limited myself to OpenGL 3 Core Profile, in anticipation of possibly running in OpenGL ES 2 later.
The original implementation only removed hidden surfaces, showing off no other effects that ray-tracing does well. So in the interactive C++ version I added shadows. This required some enhancements to the code; in particular, I snap intersection points to the surface of the intersected sphere in order to reduce self-shadowing effects.
I converted some existing models I had in a format for another ray-tracer I'd written. I also wrote a converted from the widely-available PDB format to the old format. The "large-bearing" and "gears" images are courtesy Mark Sims at Nanorex, and the caffeine, DNA, and tomato bushy virus are from PDB. Here are some JPEG images of these models, and a .zip of the models is here.
|
|
|
|
|
I also generated a 10K sphere dataset and 4M sphere dataset to test performance and resource limits.
All of these models are interactive in CPU-only mode on a 4-core laptop (Q820 at 1.7GHz).
Performance on my GPU (Quadro FX 380M) is close to that of the CPU but the GPU is outpaced further and further as the model complexity goes up. On other GPUs (e.g. whatever NVIDIA model is in the MacBook Pro Retina) the GPU renders faster than the CPU. I've put some work into optimizing the shader but mostly I rely on the compiler for optimization. I don't do much to improve SIMD utilization but since most of my rays are either from the eye or to the light source, they are fairly coherent.
Finally, it seemed to me that this code's journey would only be complete if it was visible to any viewer with a capable web browser. It's at the top of this page. I wrote a C++ program to produce Javascript for the OpenGL textures that encapsulate the spheres and BVH. I ported the main loop of my GLUT program to Javascript and passed the texture data and shader to WebGL. For fun I also threw in one reflection bounce and a shadow occlusion test at the reflection hit. Because there's a wide range of WebGL performance out there, the script runs a rudimentary performance characterization and tries to show the most complicated model that is most likely to run at 5 fps on your machine.
WebGL is based on OpenGL ES 2.0 and specifically mandates that implementations must only accept shaders conforming to the minimum GLSL ES 1.00 specification. That means constant-length for-loops and no while-loops or do-loops. Luckily for me, the "continue" and "break" and "return" keywords are allowed. (These restrictions allow a finite-size unrolled GPU instruction stream containing only forward branching as flow control.)
My bounding volume hierarchy is an arbitrary depth and not perfect binary tree, traversal doesn't visit every node, and nodes contain variable-sized arrays of objects. Those don't map directly to constant-length loops! I made a large constant for the outer BVH tree traversal and "return" out of it when terminating, and I use a smaller constant for the leaf-node object loops and "break" when hitting the end of the object list. One could probably algorithmically determine the constant needed for the BVH traversal based on the worst-possible ray path. I just used inspection for this document.
WebGL also doesn't provide any extension at this time for integer textures (where the components are read by shaders as integer components and not normalized) so I stored all my values as floats instead. I can't exactly represent any integers larger than 2^24 and my models and hierarchy are limited by that. Thus I couldn't have any more than four million spheres nor a BVH with more than four million nodes. Four million spheres wouldn't be interactive on my laptop on WebGL, so I'm okay with that.
After all this, I also found WebGL implementations vary in some frustrating ways.
I started development on my laptop, under Linux, with an NVIDIA GPU. NVIDIA's compiler is pretty forgiving, which makes porting to a stricter compiler a little painful. I found I could not have a variable with the same name as its struct on any other compiler. For example, "inout ray ray" failed on MacOSX. Presumably the GLSL ES spec is clear about this but there's not much incentive to be more strict if a vendor already handles a superset. The error message was also nonsensical, pointing to an empty line and stating only "syntax error syntax error", so that made it very hard to debug.
Chrome on Windows times out on linking my shaders together when I have a larger outer loop count (this appears to be Chrome issue 113009, which is noted will be fixed by Chrome 30 but wasn't fixed in 30 for me...). I assume many viewers of this web page will use Chrome on Windows, so I reduced my BVH loop size on Chrome but the larger models probably will not look right.
Firefox 24 hangs completely running this page. This is filed as Firefox Issue 919886.
Chrome on Android on my Galaxy Nexus starts to render, but appears to time out. (I am pretty impressed I may be able to run my WebGL app on my phone, though.) Other phones and tablets either don't support float textures or high precision floats, and I'm just not willing to shoehorn my code into platforms that limited.
The existing code can visualize molecular models of moderate size in many web browsers, so it actually has some use. With a little work, I could build a browser for one or more repositories of PDB files. Although I'm pretty pleased with the result of a few month's tinkering, this isn't much more than a toy. PDB files these days are statically rendered with more sophisticated methods than a bunch of reflective spheres, but this could be an interactive camera placement tool on the front end of a high-quality renderer.
There's been a lot of work in interactive ray-tracing over the years. There are a couple dozen ways to relatively quickly improve this program. If I didn't have a day job, I'd import more sophisticated models including textured triangles, optimize it more heavily using SSE and packets, add shaders for surface properties, move to WebCL, implement Metropolis Light Transport, etc.
Looking back on what I wrote then, I'm sort of a weird combination of embarrassed and pleased. I was able to update that 26-year-old C pretty quickly to relatively modern data structures and technologies by reading papers and checking documentation.
Should I have tried harder back then to implement from papers and other people's work, rather than try and figure things out for myself? Maybe the skills I have now only developed because I struggled through and internalized those early concepts, rather than just porting from a paper. Or maybe I didn't go as far as I could because I wasted time on incomplete or incorrect understanding of the concepts. I am not sure how I would assess that now.
The advice I would give myself at 16 (or any other 16-year-old, or any person anywhere) would be to learn how to make a plan and stick to it, make priorities and honor them, and read the documentation and papers more thoroughly.