gunsmoke on 30/9/2011 at 22:57
^^^ This.
demagogue on 1/10/2011 at 00:46
I imagine our threads would be 1/3 the size and half as interesting if we didn't have the memedrop commentary, then the commentary on the commentary and the commentary on the commentary on the commentary... :)
Al_B on 1/10/2011 at 21:17
Quote Posted by zombe
Basically, i need to construct a quaternion + handedness to describe the normal and tangent space per vertex. The quaternion MUST preserve the normal (so, the constructed tangent space will pretty much always differ from the surface one - plus the real / surface tangent space is usually sewed anyway).
I know this is one of your older posts but one thing that's been missing is
why you want to do this. A lot of what you're discussing depends on the context - i.e. whether you're optimising for modelling, shading or geometry culling. I appreciate you're sounding out ideas but knowing the background would help a lot.
zombe on 2/10/2011 at 14:29
Well ... why not unravel my thought process a bit here.
Starting point: I need to know texture tangent space per pixel - it simply is an absolute necessity for per pixel lighting with normal maps (Quite a few other non-lighting techniques also require it, but i do not intend to use any of thous currently).
-------------------------------------
So, the question becomes: Where do i get it? There are 4 options i am aware of:
Option 1: Pixel shader - using derivatives.
* Considerable per-pixel overhead for every frame.
* Assuming depth only pre-pass (which i absolutely require anyway) - the overhead is quite stable and predictable (witch is nice for real time stuff).
* No need to precompute the vectors for tangent space.
* The least amount of artifacts (ie. matches with what it should be in the 2x2 screen area).
* There might be support problems on older cards (Derivatives are required for mipmap selection and therefore one might expect them to be free as the rasterizer calculates them anyway - BUT, until relatively recently there apparently was no way for rasterizer to forward that information to fragment program and [primarily on ATI garbage GPUs] the derivatives always just return 0 [Yes, the spec does not allow it and all of them did that anyway - for example all the noise functions on all the top-of-the-top gaming GPUs still always return 0]. Some of the cards worked around the problem by tackling on extra code to calculate the derivatives separately - which, last i tried that on an older gpu, was VERY expensive).
In short, if you have static data and you can precompute the tangent vectors - you should do it. I can precompute - so, this option is off the table.
Option 2: Geometry shader - you have the whole primitive accessible.
* No need to precompute the vectors for tangent space.
* No per-pixel overhead.
* Per primitive overhead (Much higher than getting the information from derivatives in fragment program. However, assuming relatively healthy fragment/primitive ratio - might be faster).
* Very unstable overhead and unpredictability (can bite you badly in worst case scenarios - ie, precisely at the time you least want it).
* Can badly affect stupid/older GPUs as it messes with the flow. Should not have much effect in itself as the program is strictly one-primitive-in => one-primitive-out ... but can not guarantee that the GPU has a special case for it (at least Nvidia has told that they have a special case [ie. rules you must follow to get the fast path] for quad [ie. point-sprite-like behavior] and i assume they have one for triangle too, but i have not seen any of them confirming that).
* Insufficient adjacency information - ie. expect bad quality with curved non-trivial surfaces in comparison to any of the other options.
This option is just riddled with problems, while it does have some merit - i pass.
Option 3: Vertex shader - all sorts of projections.
* No need to precompute the vectors for tangent space.
* Usually quite fast to calculate (tri-planar, planar, cylindrical - pretty much covers all i probably ever need)
* Can only be used when ... well ... possible :p.
I will use it whenever i can, obviously. So, continuing to fish for solutions for the cases where this one is not an option.
Option 4: Precomputed tangent space is given in vertex attributes.
* Well, data comes from external sources and hence we do not have to do anything (Except unpacking if needed + the standard transformation you would have to do anyway regardless of where the tangent info comes from).
Here we go then.
-------------------------------------
Now the question of tangent space data transfer (ie. option-4 continuation).
Two vectors are needed for it: Tangent and Bitangent (U and V respectively, or S and T whichever notation one prefers). That would be 2*3*4=24 bytes of extra data per vertex. My, option-3, vertex data fits nicely into 32B:
* Position, normal, texcoords (or additional misc params), two-materials for array textures (when needed) + interpolant, additional misc per pixel data.
* It fits to cache line very nicely.
* No need to have the position data in a separate buffer as the overhead is not big (in depth-only pass context) - anything that makes my life easier is welcomed. Separate buffers would be annoying to deal with (filling, separate formats, extra buffers to manage).
* 2 byes unused (Could use it for something, but no idea so far)
24 extra bytes for the option-4 stuff is very bad. It would require a strong separation of option-3 and option-4 stuff - which is bad for depth fill pass (ie. considerably more complexity for me to deal with - and my time is finite). Best solution would be to squeeze both options into one vertex format. So, what can one do with 2 extra bytes:
* Reconstruct bitangent from tangent and normal (+ handedness bit somewhere): 3 values ... that does not even get close to fitting into 2 bytes - even when using some more reasonable precision.
* Well, i already am storing the normal - add one component and i have enough for a quaternion which is enough to describe all 3 vectors with sufficient precision (+ the handedness bit somewhere of course). How much space do i need for one component ... 2 ... how much i have unused ... 2 ... YAY!
Having only one vertex format for all major occluders (world geometry primarily) and otherwise (my intent is to use quite a lot of geometry generation - uniform format is a bliss there) is a MAJOR implementation complexity simplification and just too good to pass (it probably is good for performance too, but it is not my primary concern - premature optimization is, well, premature).
-------------------------------------
And, finally, the question of calculating the data ... which i (probably) solved in the previous posts (With simplifications as precomputing it offline is probably not an option due of the generative nature of the world - should be fast enough for initialization time precomputing. + probably amortized considerably by post transform optimization step which, although has a linear complexity, has a hefty constant part).
-------------------------------------
Did some coding yesterday and got the renderer to work (well, there are still pits of duct tape hanging here-and-there - you know the drill):
* fills the vertex/index buffers with constructed data (joined per material into one patch regardless of what generators it did come out of) on a sector by sector basis when the need arises (currently, my duct-tape approach is => just do all of them).
* draw commands are automatically grouped per material per pass.
Tests "map" (well, it does come from map data structures - just that the data there is just random test-shit and none of it resembles anything one would call a map) worked nicely :)
Funnily enough, i actually do not need tangent space stuff YET - i just needed to decide my vertex formats (+ by extension: everything related to map markup) and for that i needed to know how much of a problem calculating thous are (and can i afford to squeeze it into one format).
My implementation specific terms:
* Sector: aka. localized drawing group. For example: a room with different textures used for different surfaces (+ non-static entities which are absent atm) all either drawn or not (exceptions excepted for stuff determined to be not visible).
* Material: state object that has all the GL state it needs (state object actually can describe much more than any material itself needs - ie, it is not only for materials, but for any state grouping you need) + lookup to determine the least wasteful order in relation to any other state.
* Pass: aka. order independent drawing group. Renderer is allowed to reorder any draw command within a pass to prevent state trashing - however, it must preserve order when changing it does not help (ie: stable sort !).
* Every pass is part of a queue, which itself is constructed out of order (For example - ui passes are done before the main scene although they actually end up after the main scene passes.)
Koki on 5/10/2011 at 09:47
Quote Posted by Yakoob
Oh yea. I love MS for those kind of reasons. People shit on MS because they are the big bad corporation but honestly, they treat their devs so damn well. With pretty exhaustive documentation, amazing tools (VS, debugger, PIX), good cross-platform APIs (DX, XAudio, XNA, .Net) and even options for indies to develop on their proprietary console (via XNA), its is really nice developing for their platforms (especially compared to stuff like Wii or PS3 development).
Weird, just yesterday I read in PC Gamer that the Braid dev said the GFWL adds so much bullshit it makes the development process twice as hard.
Yakoob on 5/10/2011 at 10:06
We never used GFWL in our games.
sNeaksieGarrett on 6/10/2011 at 05:07
True, not everyone uses GFWL, but it is terrible, and it is from Microsoft.
Yakoob on 6/10/2011 at 07:58
And thus, Microsoft is terrible too.
Koki on 6/10/2011 at 08:25
And so is America.
The Alchemist on 6/10/2011 at 08:34
Fuck off Koki.
What? I missed you. :(