Back in January, I did a post about renders for my then current character run through of Baldur’s Gate (Test with Engines and Styles). I stated at the time it was sort of a look at where things currently stood with the renderers I was using (in January of 2026).

Funny how much things change in a few months. First SeeDream came available with the ability to add multiple reference images, which allowed for much better character stability (still in January of 2026). It seemed to know a lot of D&D type critters too, and was good with the photographic style I like so much. Then came the latest iteration of GPT Image, that did everything SeeDream does, better (I started using it in March 2026).
So now it seems in just a few months the way I do things has dramatically changed. The big thing, with DALLE and other such renderers the whole game was *describing* the image you wanted. That included any/all characters, setting, style in a set *small* number of words (what a test that always was!). Each render was a unique event (with some exceptions I’ve discussed elsewhere), which meant you could get a beautiful image, with one detail wrong, and then you had to figure how to change your description to fix the error without adding some other error (or if to just try again and see if the AI could sort itself out).
This could often get frustrating, like if the AI wanted to face somebody in the wrong direction and you had to fight to get that character to change their focus. There were editing options, but these were often tedious. And of course, the whole appeal of this is that I’m better with words than pictures.

Now a big thing that’s changed, is I can render several characters, monsters and a setting individually. Then put them all together for a new render. No doubt, there’s still pitfalls with this. And an AI does struggle with more than three characters. Most of the games I’ve been playing feature a six party team, it is seriously tricky to get all six right even now. But this is light years better than just a few months ago.

This image used four other rendered images as inputs.
I started with this macabre banquet hall as the setting. The AI filters fought me on this, even describing the people as “sleeping” the filters didn’t like where this was heading I guess, and rejected fully half of my attempted renders. And honestly, I worried about this too much. Looking at the image above (what I finally settled on for the post), the most troubling aspects of this setting aren’t even visible.
Then I built a sort of generic Yuan-Ti. I went with a white background to reduce confusion with the setting. For this image, I described the character as a Yuan-Ti, then added a parenthetical description from the Forgotten Realms Wiki.
I have a half dozen or so renders of all my team members just to use in other images.
I suppose I should have used the white background again! But I was thinking “Icewind Dale” when I did the first batch of these.

Then I just describe the action. The AI has some ability to recognise which image is which, and I still mention some details. So I’ll say “male warrior attacks with two swords” and “short woman warrior thrusts with her spear”. It may not do *exactly* what all I say, but those sort of mentions keep the swords and spear with the right character. Otherwise it tends to jumble those things up. Or maybe it just really thinks every fighter should have exactly one sword? Seriously, if I don’t specify that will happen to both of these characters.

There are other errors it makes I fight to fix.

Moya is often a struggle. I’ve learned to always add the “Asian half-elf” descriptor whenever I use her in a scene. And it often looses her green eyes.
I got this render fairly quickly of Moya and Psyche examining cave paintings. I like the framing, style, colors.

But I see two major problems here. One, those don’t count as cave paintings! That’s more like something you’d see at a Cabela’s. So when I re-rendered it I said “Lascaux style primitive art.”
More obvious problem is Two, that’s not Moya! Funny it gets the point on pointed ears with me saying nothing about it. But when I re-rendered I said “Asian half-elf”.

Which led to this. My wife looked at the image and said “wow, she got older.” Well, shoot.
So I said “make the Asian half-elf woman younger”.
I redid again as “20 year old Asian half-elf”. Of course now Psyche is loosing some, um, Psyche-ness? Aw, she’s in the background, close enough!
I feel like the AI is doing this to Psyche!

I considered changing the cave paintings again to “spear fishing” or something, but really this is close enough.

One thing I noticed years ago in scale modeling; everytime you learn how to do something *better*, it also takes *longer*. I feel like this is happening with the AI renders I’m doing. No doubt, the renderers are better. They understand text better, they can keep characters (more) consistent and they can handle much more complicated creatures and actions.
But establishing your image references, building your own character reference library, and layering things together all take time. I’m having a blast! Its more fun to be doing more complex things better. But better is often time consuming. It will continue to be interesting to see where all this goes.

Leave a comment