vfig on 20/1/2023 at 10:25
jan 20, 2023woke up this morning with the joyful realisation that for this mission's needs, i dont need my merge tool to be smart at all. each of the pieces i am merging is taken from the same .mis anyway, so i only need to merge the worldreps and can simply copy all other tagblocks from just one of the files into the output. i shouldnt need to merge texture lists or anything like that.
(it
would be nice to merge the "csg data" at the end of the worldrep that dromed uses for e.g. hit testing polys to find brushes. the game doesnt need it, so its not vital. maybe i will implement that, i will have to see what my workflow feels like without it.)
so lets skip the boring incremental tests and see what happens when i throw some of the real mission data at it!
Inline Image:
https://i.imgur.com/3mNYF9m.pngwell, that looks promising... lets go a bit closer?
Inline Image:
https://i.imgur.com/d4l3Cmn.pngah. ahaha. yeah no. something is quite wrong.
also monolog is yelling at me about various objects (including the player!) being outside the world. and sure enough, if i unfly, i fall right through this terrain.
okay, back to the boring incremental tests, lets try and find the fault.
vfig on 26/1/2023 at 19:26
jan 26the complexities of the worldrep merge have temporarily defeated me. something is going wrong that i havent been able to debug yet: olddark dromed simply doesnt render the problem wr, while newdark dromed crashes. normally a crash is the easier kind of problem to debug, because it gives you a definite starting point for investigating; but the lack of symbols for newdark dromed makes it hard. msvc isnt the best debugger when you dont have source code: either it doesnt have the tools to annotate the disassembly to keep track of things—or if it has those, theyre hidden enough that i havent found them. but i dont at the moment have any better debugger to use.
so for now i set that aside, and got back to the ordinary business of dromeding: the ruined version of the mansion still has only placeholders for most interior rooms, so i have begun the drudge work of copying over the detailed rooms from the unruined version, in preparation for utterly trashing them.
i started out with the "conservatory". but instead of just copying it as i should have, i looked at it and decided i didnt like the layout. it felt too bland:
Inline Image:
https://i.imgur.com/ygmrjYY.pngso i reworked it to have the secure room in the center instead of in a corner. is this layout better? for a typical thief mission, i would say yes, it makes traversing the room more interesting. for this mission, im not actually sure! but for now the new layout is staying. what do you think, is it better?
Inline Image:
https://i.imgur.com/ZTYGFwd.pngthe other significant change to this room is the planters themselves. up to now they had been made of brushwork. as part of a general effort to reduce cell counts, i decided they would be better off as objects, so i made a model for them. one downside of an object is it can only have a single material, so while the planter objects are wood, i have second, unrendered object on top of each that has the dirt material.
the room still needs plants and AIs and stuff, but that will happen in a later detail pass.
gamophyte on 26/1/2023 at 20:15
VFIG I am not sure if you read this whole thread yet but it was a life saver for me on cell count (
https://www.ttlg.com/forums/showthread.php?t=148845&highlight=reduce+cell+count)
I believe it was this one that talked about how the order of your giant air brush you're populating matters. I think there is a benefit to making sure it is moved to the earliest possible in the time. But don't quote me, I've suspending my building as I am taking a break and it's been a long time since I had to read it. But it's worth the read and whatnot, hopefully that part comes up sooner than later. Cheers.
vfig on 26/1/2023 at 21:52
yes, ive read that one, and most if not all the old threads about optimizing cell counts. but right now the low hanging fruit in my mission has already been harvested. there are a few smaller places where i could mess about with brush timing and probably get some small benefits, but im pretty sure reducing detail in various places is going to be necessary—at least if i cant get the worldrep merge working when im ready to go back to it.
nicked on 27/1/2023 at 07:11
Compositionally, the first conservatory is more interesting. I would leave the secure room in the centre but do more interesting arrangements than a circle with the planters. The new layout also looks like it'll be easy to cheese AI with if there's no gaps in the ring of planters.
It's a good model - don't forget you can always convert worldrep into models via the export tool, useful if you have any complex brushwork left with minimal collision/lighting requirements.
vfig on 27/1/2023 at 14:17
compositionally, youre right. but as a gameplay space, it was just: go in through the window, proceed straight across the room and out the door on the other side. not particularly interesting even with a civilian ai wandering around. but the secure room in the middle gives you a way to avoid ai contact even without much shadow, which i like. and if you fail at stealth, well cheesing the ai is always fun! :D
though as soon as i posted the screenshots i was thinking of undoing the change: not for the sake of the room layout itself, which i definitely think is better, but because theres another reason to want the secure room in the corner instead of the middle. so who knows, i might change it back later!
Quote Posted by nicked
It's a good model - don't forget you can always convert worldrep into models via the export tool, useful if you have any complex brushwork left with minimal collision/lighting requirements
personally i think its a
terrible model, but appropriately terrible :D — and yeah, i nearly always start a model by exporting a brushwork blockout (even if just a cube) for size. this one is not much more than a direct conversion.
vfig on 28/1/2023 at 23:05
jan xx-28, 2023no screenshots today, because today was all about debugging the worldrep merge tool. all very technical stuff, so skip this if youre not here for programming blather.
so, the problem i had been wrestling with on and off for the past week or two was this: i merged the two worldreps into one. it seemed cromulent. but if i loaded the level into dromed, dromed crashed the moment i moved the camera into any "in the world" space, as it tried to render the view from inside a cell.
unfortunately, the olddark dromed with debug symbols wasnt at all helpful here, because it didnt crash! it just didnt render anything. so i was having to wade through the newdark dromed disassembly in visual studio, while correlating it with the disassembly/decompile in ghidra, and correlate that with the leaked source, just to try to begin to figure out what was happening. problem was, keeping track of things in visual studio was a mess, because of ASLR. Address Space Layout Randomisation is a security feature in windows that loads programs at a different base address in memory every time it launches, which means that the addresses of the functions and variables in the program change also. which means that i would launch dromed, put breakpoints in various functions and inspect various variables for one run, and then hit the crash, realise i needed more info from what was happening before the crash, and so have to relaunch dromed—and now all my breakpoints and memory references were wrong. just far too much overhead, i couldnt juggle all that
and try to understand what dromed was actually doing.
all i had managed to discover up to this point was that it was trying to access the render info for a cell that was offscreen, and had never been prepared for rendering. if i had messed up the bsp nodes when doing the merge, or the "destination cell" ids in the portal info, that would have been an obvious cause. but no, i double checked that. even wrote a function to dump the merged bsp tree to a graphviz file so i could generate a diagram, making it much easier to check for correctness. here are the two trees that were the input:
Inline Image:
https://i.imgur.com/Ueef34X.pngand merged, with a new root node inserted, and all the node and cell ids renumbered:
Inline Image:
https://i.imgur.com/Fqp72mX.pngand all that looked just fine. i couldnt for the life of me figure out why this offscreen cell was ending up in the list of cells to be rendered! so i set the whole merging thing aside for a while to focus on building out the ruins (i.e. the stuff from the previous post).
anyway, yesterday i learned of a way to disable ASLR by patching dromed.exe with (
https://blog.didierstevens.com/2010/10/17/setdllcharacteristics/) this very helpful utility. with that, dromed.exe now always launched with the same base address every time, so i could finally keep consistent breakpoints and memory dumps between runs. which meant i could finally effectively trace what was happening. i set up a few breakpoints at the entry point of various functions that would print out the relevant arguments, and similarly a few leading up to the crash point itself. here's what they printed (with some explanatory comments afterwards):
Code:
// at the beginning of the render pass:
initialize_first_region( 2 ) // camera is in cell 2, so put it in the list to be rendered.
setup_cell( 0x0a6b0970 ) // prep cell 2's pointer for rendering.
examine_portals( 0x0a6b0970 ) // find other cells connected to 2 that also need rendering.
add_region( 1, ..., 0x0a6b0970 ) // found cell 1, add it to the list.
setup_cell( 0x0a6b0850 ) // prep cell 1's pointer for rendering.
examine_portals( 0x0a6b0850 ) // find other cells connected to 2 that also need rendering.
// (there are no more to find that would be onscreen).
// now later in the render pass, just before the crash:
> active_regions[ 0 ] // look up the first cell in the list...
> wr_cell[ 2 ] // its cell 2 (just as expected)
> cell ptr: 0x0x0a6b0970 // and here is its pointer again.
> active_regions[ 1 ] // look up the second cell in the list...
> wr_cell[ 0 ] // its cell 0?? how? it was cell 1 that got added to the list above!
> cell ptr: 0x0x0a6b07f0 // yeah this is cell 0's pointer. wtf is going on?
Exception thrown at 0x00552801 in DromEd.exe: 0xC0000005: Access violation writing location 0x00000028.
it still didnt make sense, but now that i could step through all this in repeated runs, exploring different pieces of the code along the way, i finally discovered
where the list of cells to render was getting mangled. and it was in a function called "sort_via_bsp()", whose job was to sort the list of cells to render in front-to-back order. this didnt make sense to me: this function has a simple job, which obviously worked just fine ordinarily. and my bsp tree all looked correct! how could it be messing up when walking my merged bsp tree but just fine with either of the original two? and then i stopped looking at the logic and maths that sort_via_bsp() was doing, and noticed something small, something obvious and
usually unremarkable: it was checking a "Marked" flag on each leaf node it encountered; and if the flag was set, writing that node's cell id into the (sorted) list of cells to render. to make that work, just before sort_via_bsp() is called, a function called setup_bsp() is responsible for setting those flags: it first calls unmark_bsp() with the root node, to clear all the Marked flags; and then for each cell in the (unsorted) list of cells to render, it sets the flag. so this Marked flag simply means "this cell is going to be rendered this turn". all very ordinary, and normally i would never have batted an eye at this code. but…
but i remembered the flags i had seen days earlier in the bsp trees i was using as input. two of the flags made sense for the data structure, but the third flag, "Marked" had been set on some nodes, even in the .mis on disk, and i didnt understand what it meant. i had looked up where this flag was used, seen it was only used for this common clear-and-mark pattern that meant it was transient and its value on disk didnt matter at all, and disregarded it. i wasnt even checking the flag when generating the visual graphs. but now that it was implicated, i regenerated the graphs with an 'M' for the marked nodes. the pictures above are output from this updated graph generation, and you can see that all of the non-leaf nodes in the "top" worldrep are marked; and all of the nodes in the "bottom" worldrep are unmarked. and when merged, my root node obviously was unmarked, because why would i set this flag on it? after, every node gets the marked flag cleared at the start of the render pass, right? right?
nope! turns out the unmark_bsp() function responsible for clearing the flags is written recursively. and to avoid walking more of the bsp tree than it needs to, if it encounters a node that isnt marked, it concludes that its subtrees are also entirely unmarked, and takes a shortcut: it doesnt recurse any deeper from there. and this shortcut is perfectly reasonable and in line with how the marked flag is set later on. but with my new root node that was not marked, this shortcut meant that the entire tree was never getting unmarked.
so that finally explained how cell 0 had ended up in the sorted list: as you can see in the merged graph, its parent, node 11, had the marked flag set (as do the parents for cells 1 and 2, which were the cells that were actually supposed to be rendered; so they wouldve been marked regardless).
so this little unremarkable flag, that honestly, being transient, should never have been written to disk at all, turned out to be the cause of days of pain for me.
like most bugs, once the cause is understood, the fix is easy. when i merge the two worldreps, i go through every node and clear the marked flag. it really should never have been written to disk, so lets just clear it from the on-disk bsp tree. and with that, my crash went away! the merged worldrep rendered perfectly.
of course, i immediately encountered a new crash! but that will be a job to debug another day…
vfig on 2/2/2023 at 07:47
feb 2, 2023big progress on the worldrep merge in the last few days. first new thing to be merged in is lighting, including animlights. while the lightmaps themselves are stored alongside each cell, and so end up correctly merged "for free", the table of light data at the end of the worldrep, which is used for calculating the amount of light shining on the player and to objects, does not come quite so easily. up until now my code had just recklessly skipped writing this light table entirely, since my test maps did not have any lighting at all.
but merging lighting was more of a pain than i expected, particularly animlights. there is a table at the end of the worldrep that maps each animlight to the cells that it illuminates; and each cell has a small table of the animlights that illuminate it. so thats a double set of references that need to all be correctly fixed up when i merge together two sets of cells and two sets of lights-that-reached-cells.
but for the first time i had to move beyond the worldrep chunk of data (named "WR", "WRRGB", or "WREXT") in the .mis file: because the Renderer>AnimLight property itself on animlight objects
also stores references to the animlight-to-cell mapping table in the worldrep. so now i needed to read and write the "P$AnimLight" chunk too. and, of course, the data format has changed slightly in newdark (thankfully PinkDot's data structure notes had forewarned me of what exactly was different), so i need to support two versions of the chunk. well, for the actual mission im making, only support for the newdark version would be needed; but supporting olddark is important for developing this tool because i can debug olddark dromed problems a lot more easily.
anyway, after a bunch more code and the usual number of flubs along the way, i got that all working. it doesnt look like much, but heres screenshots from my test map generated by merging separate "top" and "bottom" .mis files; each with one normal light and one animlight:
the top:Inline Image:
https://i.imgur.com/vsDzTRT.pngInline Image:
https://i.imgur.com/WZ1IzXF.pngthe bottom:Inline Image:
https://i.imgur.com/LnnWL3v.pngInline Image:
https://i.imgur.com/RUGV157.pngbut there was still the "other crash" i mentioned in the previous post, and, as luck would have it, it only happened in newdark dromed, not olddark! all i knew going in was that newdark dromed was trying to clearing some memory that olddark dromed didnt bother about; but the memory hadnt actually been allocated, so a crash resulted. thankfully without aslr anymore, it was now easy to correlate the address of the crash with my annotated disassembly, and by studying that a little more, identify what this piece of memory was supposed to be.
after all the lighting bits, there is one more set of data tables at the end of the worldrep, that is called "csg internal data". its only used by dromed; thief.exe itself never bothers to read it from the .mis. its a set of tables that map the polygon faces in the worldrep to the brush faces that they belong to, and vice-versa, so that you can click on a face in the 3d view and select it; and so that dromed knows which polygons in the worldrep to redraw when you change the texture on a brush face (without reportalizing). again, up until now i had been skipping outputting this data while i focused on all the rest of the worldrep. but it was easy to see that the crash was happening because newdark dromed was trying to clean up this csg internal data—which wasn't there in my test map!
but these half-dozen tables actually looked pretty straightforward to merge: one of them has one entry per polygon; one of them has one entry per brush. i wrote the code to do the merge in one go, and—of course it didnt work. i stopped for lunch, and as i was washing the dishes afterwards, i facepalmed: i suddenly realised i had merged all these tables together but never written them back out to the final .mis! after fixing that oversight, all this seemed to be hunky dory! i could:
select brush face by clicking it...Inline Image:
https://i.imgur.com/qgpXBgu.png...and change its textureInline Image:
https://i.imgur.com/inPrCHC.pngso my little test map was working great! time for another stress test with two parts of the actual mission. i had high hopes.
which were immediately dashed. my little test map has only 6 non-object brushes (2 in the top, 2 in the bottom, and 2 area brushes demarcating each part), and so there were naturally 6 entries in the per-brush csg table. but my mission has over 6000 terrain brushes—9,027 brushes in total when you include object brushes. and yet the per-brush csg table in the "top half" input .mis had 9,203 entries—176 more! and "bottom half" input .mis file had 9,377 entries in its table—350 more than expected! what in the builders name is going on?
well, i dont know for certain yet, but i have a strong suspicion i know the evil culprits causing these strange numbers:
doors.
PinkDot on 3/2/2023 at 22:26
Do you even need doors? ;)
A bit more seriously - great to see the progress! And I appreciate the detailed explanations. I'm sure I'll make a use of that knowledge at some point.
vfig on 4/2/2023 at 06:39
february 3, 2023so (
https://lizengland.com/blog/2014/04/the-door-problem/) the problems with doors are very many and quite well-studied, but the specific problem i am having with them in the dark engine is this:
they need their own special little cell so that they can turn off its portals when closed. this means the renderer doesnt even need to think about drawing anything on the other side of the door.
a door in a room, with "cell view" enabled. each different colour is the sides of a different cell:Inline Image:
https://i.imgur.com/kU2pkF4.pngbut how does a door get its own special little cell? when you tell it to portalize dromed sneakily creates
a new brush for every single door in your mission (that has the "Blocks Vision" flag on). this brush is set to the
blockable type (indeed this exact use is why that brush type exists), which is really a hint to the portalizer and optimizer to not optimize away its edges; that way the cell will always be there, and be guaranteed to remain the right size for the door. once the portalize is complete, dromed sneakily deletes those extra brushes again, leaving you and me none the wiser.
when the door is open, you can see the periwinkle cell on the near side, and the olive cell on the far side, and just a glimmer of the tiny puce cell that was made by this door's blocking brush:Inline Image:
https://i.imgur.com/WdEnABA.pngzoomed in for a better look:Inline Image:
https://i.imgur.com/vTYCD3X.pngwhy is this a problem for me? well it shouldnt be, except that: there is a variable that keeps track of
what is the id to use for the next brush that gets made?. i dont know its real name, but lets call it next_brush_id_jim. whenever you add a brush, dromed adds 1 to next_brush_id_jim. when dromed sneakily adds these door brushes, it adds 1 to next_brush_id_jim for each of them. what can happen is that although your level looks like it only has 1,000 brushes, meaning next_brush_id_jim ought to be 1,001, the actual value of next_brush_id_jim can be hundreds higher than that.
but so what? its just a number, right? well the other bit that makes this a nuisance is the "csg info" tables at the end of the worldrep. these are really only there so that dromed can answer the questions of
if i click on a polygon in the 3d view, what face of which brush should be selected? and
if i change the texture on a brush face, which polygons need to be updated? in other words, a two-way mapping, brushes => polygons and polygons => brushes. but the brushes => polygons table is created with one entry for every brush id just up to next_brush_id_jim. so because of the door brushes, this table can end up containing hundreds of entries beyond the ones that actually belong to actual brushes.
again, so what? there are no brushes with ids corresponding to those table entries, so nobody is ever going to look at those table entries. and this is true, except
i am that nobody, because i am doing shenanigans. to merge my two worldreps, i have to make sure all these tables have cromulent data in them. yet here i found myself with all this extra data that couldnt be merged together, because it didnt make sense. so its no problem for anybody except me:
if worldrep one has table entries for brushes 1, 2, 3, 4, and 5, and worldrep two has table entries for brushes 1, 2, 3, 4, 5, 6, and 7, how do i merge those together in any meaningful way, especially if only brushes 1, 2, and 3 actually exist?
thankfully by poking around in various places and doing some tests, i determined that all these extra table entries were really not going to be used. so now i figure out what next_brush_id_jim is set to (by looking at all the brushes stored in the .mis file), and once ive merged all the entries that belong to real brushes, not fake lying dromed brushes, i just say "la la la im not listening" and ignore the rest of the table entries. so the merged table would end up with entries for only 1, 2, and 3 (because i insist that the two .mis files have the exact same actual brushes; that's just fine for my purposes, and simplifies a whole lot. but more on that in a later post about editing workflow).
the important thing is that now, with all this csg data sorted out and sensible, i can use the merged .mis file in dromed without it crashing anymore! at least for the trivial test maps. does it work with the pieces of the real thing?
merged island test, upper part:Inline Image:
https://i.imgur.com/4bauqfj.pngmerged island test, lower part:Inline Image:
https://i.imgur.com/Cd8Trhc.pngyes! you can't really tell it from the screenshots, but these two different versions of this island were in their separate .mis files, and successfully merged together here! and so far everything about them seems all in good order.
i still have a little more work to do to be sure: i have to look into flow brushes, room brushes, pathfinding, and physics stuff, at least to see if they need any work or if i can simply go about rebuilding them post-merge. and i need to iron out the workflow for when i do need to make terrain changes to this mission, but…
this success is a decisive—albeit highly constrained and not at all generalisable—
VICTORY OVER THE TYRANNY OF CSGMERGE CELL COUNTS!!