Back in January, I did a post about renders for my then current character run through of Baldur’s Gate (Test with Engines and Styles). I stated at the time it was sort of a look at where things currently stood with the renderers I was using (in January of 2026).

Funny how much things change in a few months. First SeeDream came available with the ability to add multiple reference images, which allowed for much better character stability (still in January of 2026). It seemed to know a lot of D&D type critters too, and was good with the photographic style I like so much. Then came the latest iteration of GPT Image, that did everything SeeDream does, better (I started using it in March 2026).
So now it seems in just a few months the way I do things has dramatically changed. The big thing, with DALLE and other such renderers the whole game was *describing* the image you wanted. That included any/all characters, setting, style in a set *small* number of words (what a test that always was!). Each render was a unique event (with some exceptions I’ve discussed elsewhere), which meant you could get a beautiful image, with one detail wrong, and then you had to figure how to change your description to fix the error without adding some other error (or if to just try again and see if the AI could sort itself out).
This could often get frustrating, like if the AI wanted to face somebody in the wrong direction and you had to fight to get that character to change their focus. There were editing options, but these were often tedious. And of course, the whole appeal of this is that I’m better with words than pictures.

Now a big thing that’s changed, is I can render several characters, monsters and a setting individually. Then put them all together for a new render. No doubt, there’s still pitfalls with this. And an AI does struggle with more than three characters. Most of the games I’ve been playing feature a six party team, it is seriously tricky to get all six right even now. But this is light years better than just a few months ago.

This image used four other rendered images as inputs.
I started with this macabre banquet hall as the setting. The AI filters fought me on this, even describing the people as “sleeping” the filters didn’t like where this was heading I guess, and rejected fully half of my attempted renders. And honestly, I worried about this too much. Looking at the image above (what I finally settled on for the post), the most troubling aspects of this setting aren’t even visible.
Then I built a sort of generic Yuan-Ti. I went with a white background to reduce confusion with the setting. For this image, I described the character as a Yuan-Ti, then added a parenthetical description from the Forgotten Realms Wiki.
I have a half dozen or so renders of all my team members just to use in other images.
I suppose I should have used the white background again! But I was thinking “Icewind Dale” when I did the first batch of these.

Then I just describe the action. The AI has some ability to recognise which image is which, and I still mention some details. So I’ll say “male warrior attacks with two swords” and “short woman warrior thrusts with her spear”. It may not do *exactly* what all I say, but those sort of mentions keep the swords and spear with the right character. Otherwise it tends to jumble those things up. Or maybe it just really thinks every fighter should have exactly one sword? Seriously, if I don’t specify that will happen to both of these characters.

There are other errors it makes I fight to fix.

Moya is often a struggle. I’ve learned to always add the “Asian half-elf” descriptor whenever I use her in a scene. And it often looses her green eyes.
I got this render fairly quickly of Moya and Psyche examining cave paintings. I like the framing, style, colors.

But I see two major problems here. One, those don’t count as cave paintings! That’s more like something you’d see at a Cabela’s. So when I re-rendered it I said “Lascaux style primitive art.”
More obvious problem is Two, that’s not Moya! Funny it gets the point on pointed ears with me saying nothing about it. But when I re-rendered I said “Asian half-elf”.

Which led to this. My wife looked at the image and said “wow, she got older.” Well, shoot.
So I said “make the Asian half-elf woman younger”.
I redid again as “20 year old Asian half-elf”. Of course now Psyche is loosing some, um, Psyche-ness? Aw, she’s in the background, close enough!
I feel like the AI is doing this to Psyche!

I considered changing the cave paintings again to “spear fishing” or something, but really this is close enough.

One thing I noticed years ago in scale modeling; everytime you learn how to do something *better*, it also takes *longer*. I feel like this is happening with the AI renders I’m doing. No doubt, the renderers are better. They understand text better, they can keep characters (more) consistent and they can handle much more complicated creatures and actions.
But establishing your image references, building your own character reference library, and layering things together all take time. I’m having a blast! Its more fun to be doing more complex things better. But better is often time consuming. It will continue to be interesting to see where all this goes.

4 responses to “Diomedes: IWD Addenda and Outtakes”

  1. Zeno Avatar

    Its interesting you had trouble with the banquet hall. I routinely ask for “dead bloody bodies” and the like out of GPT without issue. I think I’ve had maybe 10 “”censored” in total in the entire time I’ve used it. Rare enough that I haven’t really dived into what exactly sets it off. Certainly nothing like DALL-E, in which one out of every four renders getting the puppy dog was par for the course. Heck I had GPT-4o (or whatever the GPT version on Copilot is) object to “diagram of a steampunk device that looks kind of like a flare gun”).

    That “generic Yuan Ti” is awesome. Only complaint would be if I look too closely at his tail the coiling doesn’t make any sense. But that’s the sort of thing that trips up every AI.

    I like to do my “character images” either with solid black backgrounds or with a background similar to that for the composite image. I’ve found that the background behind the character often influences the composite image. So if I do a blank one I minimize that.

    I tend to solidify characters with an emphasis statement and then adding detail. And more specific than “Asian” to nail it down. So in Moya’s case I would probably do (korean woman, age 25, half-elf). Or even just (korean woman, age 25, pointy ears).

    The “cave paintings” that look like they came from a sporting goods store are hilarious!

    My biggest problems nowadays are that OpenAI is much stingier with credits, so I feel like I really want to understand what I want before I go spending credits there. And I still feel like I have more freedom with style with DALL-E. Its like an insane genius painter. Half the time what it gives you only vaguely represents what you want, but its so *good* when it gets it right that I keep trying to poke it some more.

    Like

    1. atcDave Avatar
      atcDave

      wow, I wonder why the reject rate was so high then? Snakes? Very strange. I did go a very long time with no rejects ever at OpenArt, but these last couple months, as I’ve leaned on it more heavily, it’s happened more often. I often have no clue why, so it may be random oddities. But I thought I knew why on the macabre banquet, now I wonder.

      Funny, it hadn’t even occurred to me to say anything other than “Asian” on Moya. I will see if that makes a difference.

      yeah I’m often sweating out my last points at OpenArt at the end of each month. Now I don’t use Co-Pilot nearly as much, and when I do it’s usually MAI-Image. That looks more photographic to me, and it’s okay at component renders. Especially backgrounds and generic (non D&D) type characters.

      Like

      1. Zeno Avatar

        I think being more specific with ethnicity helps to stabilize the image. With “asian” there’s a lot of leeway for “winging it”. Anything from Indian to Chinese. Being more specific narrows down the match set a bit. I suggested Korean because to me that seemed most like the baseline image you have for her. I also use short descriptors like that in referring to the images (image2,5’9″,irish rogue woman) and such to solidify how I want it to interpret images.

        I would do an experiment series on the OpenAI filters… but its a lot harder to justify that when the credits don’t regenerate daily. A least with OpenAI it appears to refund your credits on rejections. When I post my next installment it has – among other things – Cass literally searching a bloody dead body. And I got no rejections whatsoever.

        I’ve been using MAI and GPT4o in CoPilot to do Cass’ journal sketches, and DALL-E for source image generation. I can afford to blow 100 takes getting a source image just right in DALL-E, and then let GPT 1.5 take it from there. And it appears to do very well adapting the style of the image as needed.

        Liked by 1 person

      2. atcDave Avatar
        atcDave

        it’s funny, when I first created her I used Chinese. But at some point I got away from that. But yeah, Korean is tighter, that is better.

        Like

Leave a comment