Back in January, I did a post about renders for my then current character run through of Baldur’s Gate (Test with Engines and Styles). I stated at the time it was sort of a look at where things currently stood with the renderers I was using (in January of 2026).
Funny how much things change in a few months. First SeeDream came available with the ability to add multiple reference images, which allowed for much better character stability (still in January of 2026). It seemed to know a lot of D&D type critters too, and was good with the photographic style I like so much. Then came the latest iteration of GPT Image, that did everything SeeDream does, better (I started using it in March 2026).
So now it seems in just a few months the way I do things has dramatically changed. The big thing, with DALLE and other such renderers the whole game was *describing* the image you wanted. That included any/all characters, setting, style in a set *small* number of words (what a test that always was!). Each render was a unique event (with some exceptions I’ve discussed elsewhere), which meant you could get a beautiful image, with one detail wrong, and then you had to figure how to change your description to fix the error without adding some other error (or if to just try again and see if the AI could sort itself out).
This could often get frustrating, like if the AI wanted to face somebody in the wrong direction and you had to fight to get that character to change their focus. There were editing options, but these were often tedious. And of course, the whole appeal of this is that I’m better with words than pictures.
Now a big thing that’s changed, is I can render several characters, monsters and a setting individually. Then put them all together for a new render. No doubt, there’s still pitfalls with this. And an AI does struggle with more than three characters. Most of the games I’ve been playing feature a six party team, it is seriously tricky to get all six right even now. But this is light years better than just a few months ago.





Then I just describe the action. The AI has some ability to recognise which image is which, and I still mention some details. So I’ll say “male warrior attacks with two swords” and “short woman warrior thrusts with her spear”. It may not do *exactly* what all I say, but those sort of mentions keep the swords and spear with the right character. Otherwise it tends to jumble those things up. Or maybe it just really thinks every fighter should have exactly one sword? Seriously, if I don’t specify that will happen to both of these characters.
There are other errors it makes I fight to fix.


But I see two major problems here. One, those don’t count as cave paintings! That’s more like something you’d see at a Cabela’s. So when I re-rendered it I said “Lascaux style primitive art.”
More obvious problem is Two, that’s not Moya! Funny it gets the point on pointed ears with me saying nothing about it. But when I re-rendered I said “Asian half-elf”.




I considered changing the cave paintings again to “spear fishing” or something, but really this is close enough.
One thing I noticed years ago in scale modeling; everytime you learn how to do something *better*, it also takes *longer*. I feel like this is happening with the AI renders I’m doing. No doubt, the renderers are better. They understand text better, they can keep characters (more) consistent and they can handle much more complicated creatures and actions.
But establishing your image references, building your own character reference library, and layering things together all take time. I’m having a blast! Its more fun to be doing more complex things better. But better is often time consuming. It will continue to be interesting to see where all this goes.




Leave a comment