AI Art: A Long Way to Go
I was initially intrigued by what I heard (and saw) on social networks about AI art. I have a big project that needs an illustrator, and after several failed attempts to find an artist willing to take on (there are various complex themes that need to be captured by the art and I had difficulty conveying my vision to a potential illustrator), I decided to see what can be done with the help of AI.
At first glance AI art is interesting and seems boundless with opportunities. To start with, I just wanted to get a feel of how it works. My project, a novel, is about an aging Wall Street bro who has a midlife crisis. I decided to begin with some simple, plain-vanilla prompts, and then, depending on the outcome, add nuance. From what I saw the AI can do, the kind of fantastical worlds and creatures it could build, surely it could give me a picture of a finance bro walking in midtown. This was my first prompt: ‘finance bro in midtown Manhattan.’ This is what I got.
No sign of a finance bro. It probably doesn’t know what a finance bro is, which is fair, it’s kind of a niche culture thing. It also probably needs more specifics to work with. The users on reddit say that the more specific your text description the better the resulting picture. So I then went with: ‘financial analyst in fleece vest in midtown Manhattan.’
Despite the scrambled picture, one can single out the word ‘Fleece’ as the key parameter in the AI’s thinking. It thought that it was winter, as you can tell by the frost covering the surfaces and the poor creature that was supposed to be my ‘financial analyst’. Note that the algorithm also leaned on the word ‘vest’ and thought that it meant an orange city worker vest. Ugh. This was going into a totally wrong direction.
I dropped the ‘fleece’ and tried again. ‘Financial analyst in vest in midtown Manhattan.’
Yeah, ok. Then I dropped the ‘vest’ altogether. ‘Financial analyst walks in midtown Manhattan.’ My analyst looked like a total bum now.
All right, I needed to change my approach. Maybe because the majority of users use AI to create sci-fi images the AI didn’t have enough normal, everyday things in its memory to refer to. I decided to use a stock image of a trader in front of Bloomberg terminals as a starting image. I typed ‘investment banker sits at his desk on the trading floor, Edward Hopper photorealism,’ and also changed the creation method from ‘coherent,’ that I’ve been using up to this point, to ‘stable’. Here’s what I got:
I think it’s an awesome image, and I’d frame it and put it on the wall in my office, but it’s not what I need. Looks like the AI also doesn’t know what a trading floor is. It can create beautiful otherworldly art, but it stumbles with real life. Maybe the reason it’s good for fantastical art is because the realistic depiction is not that important in drawing the worlds that don’t really exists. No one can complain about inconsistencies.
Does it know what a Bloomberg terminal is? Skeptical that it does, I downloaded another image of a trader sitting in front of Bloomberg terminal and typed: ‘a trader sits in front of several Bloomberg screens.’
I got this image and I wondered whether the initial pictures of things the AI didn’t know about are somehow stored (and subsequently deployed) in all later images. I don’t know if it does, but I think it should. After I taught it what a Bloomberg terminal is, does it actually remember and recall it for other users?
I decided to move from trying to illustrate an average midtown drone professional life into something more approachable: city nightlife. How can the AI possibly have problems with drawing people enjoying a drink at the bar in the after-hours? Without using any stock photo as a crutch, I typed: ‘a woman and two men drink martini at the bar Edward Hopper photorealism.’ The AI came back with this:
Not bad, but I said ‘two men.’ Maybe the AI thought that because the dude looks a little like Matt Damon, it can be counted as ‘two men.’ Still, it’s progress. I played around some more: ‘happy hour crowd at Manhattan bar film noir.’ The faces are all scrambled, and the guy is wearing a wifebeater for some reason, but I think AI got the right idea. Does it improve with each new simulation?
How about nature? My story oscillates between the busyness of the city and the serenity of the forest, and I needed some arresting forest imagery. I typed the text: ‘man leans on the deck rails of his mansion surrounded by deep forest sunset Caspar David Friedrich.’ The colors are awesome, but no deck rails, some meager bushes for a forest, and what’s with that ridiculous hat?
Several subsequent attempts gave me rich in color phantasmagory of the woods, worthy of a good acid trip, but unsuitable for my use case.
The AI art would work splendidly if I was writing about fantastical places and non-human lifeforms. But for depictions of the grim, everyday reality, of human condition, it’s just not the right tool. It’s back to the drawing board for me.