Exploring the limits; 4 ways to use generative AI for product photography
In the past few months, generative AI images have gotten really, really good. But can you use generative AI to create product images? Let's find out...
In late 2023 most of the popular image generation apps released new upgrades (e.g. Midjourney v6, Firefly 2) that offer stunning realism, and more control over your images than ever before.
However, they still have an achilles heel;
Generating high quality and accurate product photography with AI, at scale, is currently impossible.
Yes, you can create insanely beautiful, creative and high quality images that resemble product photography. The challenge with the current tools is to faithfully render the product in all its detailed glory.
The Australian Open is on at the moment, so let’s go with a tennis theme for our tests.
Below are two images of Nike tennis shoes. One is generated with AI (Midjourney v6) and the other is from the Nike Web site. Can you tell which is real and which is fake? Which shoe do you prefer?!
They both look convincing, but only one of these shoes is real, and people don’t buy AI shoes (other than in the metaverse, but that’s for another post).
The reason image generation apps struggle to nail the details is for the same reason ChatGPT will often make up an answer (‘hallucinate’): there is not enough data in the pre-trained dataset (this is the “P” in ChatGPT) to recreate most products in detail.
Midjourney knows generally what a Nike sneaker looks like. It probably even knows what a Nike tennis sneaker looks like. But it doesn’t know that the NikeCourt Air Zoom Vapor 11 is (per Nike) “Lower to the court and loaded with speed components”.
Btw, before we move on, the sneaker on the right is the real Vapor 11, and they look great. I want them.
There are four options for how you can generate product photos using AI image software;
1. Normal generation: using the pre-trained data in their model (see above)
2. Fine-tuning: training the model with images you have of the product
3. Image prompts: using an image as part of a prompt to influence the composition, style, and colour
4. Background replacement: if you have a high quality product photo, and would like to imagine the product in a different scene
We have covered normal generation already, so moving on to…
Fine-tuning
If you have at least 10 - 20 high quality photos of your product, you can fine-tune an AI model and then use AI to generate new images, with no limit to your imagination.
Fine-tuning has been available for about a year, and is improving all the time. Right now, the most popular method is to fine-tune with Stable Diffusion (and open source image generation model) and in particular their recent high quality SDXL model.
We wrote an article with a series of experiments recently, trying to recreate the Porsche 911 Dakar using a fine-tuned SDXL model. Here are some of the resulting images:
Our conclusion was that although you can achieve beautiful looking images, the attention to exact product detail falls short for commercial use at the moment.
Unfortunately, not all image generation apps allow fine-tuning yet;
In addition to these, there are also apps such as Booth.ai and Photoai.com which provide easy fine-tuning specifically for product photography. These are worth experimenting with, but from my experience the same challenges of product specificity remains.
Fine-tuning can also be used to adhere to a style, rather than the content of an image. For example, if Nike wished to create a series of lifestyle images that matched a particular photography or illustration style, you can achieve that with fine-tuning.
As a method, I think that fine-tuning is the future for AI product imagery. By the end of 2024 I expect we will have great improvement in accuracy, and also wider support by the popular image generation apps. Fine-tuning in Midjourney will be a game-changer!
Also, I’m particularly excited about Adobe’s plans in this space, as many large brands and retailers already store product imagery in Adobe’s digital asset management software. In the future, a marketer from Nike will be able to type in a text prompt, press a button and generate new photos (with AI) which are fine-tuned automatically from their existing product library.
Image prompts
Using image prompts is much faster and easier than fine-tuning, you simply upload a ‘source’ image or include it in your prompt, and the image generation app uses this as a guide when generating outputs.
For example, to compare Midjourney’s outputs before and after using an image in the prompt;
The adherence to the real product photo (the prompt image) is much greater, but it the ‘after’ image is still a long way off being a true representation of the shoe.
Right now, image prompting works well to guide the ‘style’ of an image, but not so well for retaining the product details as is needed for product photography.
Background replacement
Let’s imagine you are Rebel Sport and want to create an eye-catching social post featuring the NikeCourt Vapor 11’s.
There are now a plethora of background replacement apps, such as Flair.ai, that use AI to a) remove the background of a product photo, and b) generate a new background scene.
For example, using the fictional shoe image we generated earlier, in under a minute I created this new image in Flair.ai;
Google have recently announced similar functionality will be built into Google Product Studio, and Amazon have also started rolling out features like this for retailers.
In addition to background replacement, there are other promising and helpful tools (such as the open source ControlNet project) which can be used to manipulate photos of existing products. For example we’re demonstrating to a fashion retailer how ControlNet can be used in creative ways for their design process.
Copyright protection
This is a topic for a longer post, but in summary it’s currently unclear whether a creator (e.g. the Nike marketer) can protect the copyright of AI generated images. This is being tested in US courts at the moment, but to our knowledge hasn’t yet been tested in Australia.
For now, you can assume that any images you create (e.g. the social post above) cannot be copyright protected. Whether that is important is up to you.
Bias and Diversity in AI models
There are many ethical questions and opinions when it comes to generative AI, and in particular image generation. One particular topic to be aware of is bias in the training data, which can lead to a lack of diversity in the outputs.
My colleague Jason Ross made this example to demonstrate (all from Midjourney):
If you are aware of this bias you can either use an AI model that provides more diverse output (e.g. Adobe Firefly), or you with more detailed prompting you can ask the image generation app to create more diverse images.
Comparison
We couldn’t wrap-up this post without a good ol’ fashioned AI Model shoot-out. Below are images generated with the most popular Generative AI image apps, aiming for a photo realistic image of a female tennis player wearing Nike sneakers.
We have a clear preference, which do you think wins?
The prompt was “female tennis player, wearing Nike tennis shoes, close up on the shoes, photo realistic, magazine advert”. note: Firefly has strict brand protections, so the prompt excluded the “Nike” brand reference.
There are also impressive AI post-production apps such as magnific.ai which ‘upscale’ images and can make AI images look even more realistic (by adding film grain for example).
Summary
Although we are getting close to having the technology to generate product photos with AI, as it stands now there are very limited commercial uses that can be used at scale.
That said, things are changing every day, and many of the leading companies in this space are investing heavily to solve the quality + accuracy challenge for commercial product photography.
For now, it takes some experimentation to achieve the results you need.
If you are a retailer, brand or manufacturer and would like to discuss our research & ideas, or run some tests with AI generated product images, please get in touch.
Work with Time Under Tension
We work with agencies, companies and brands to elevate your Customer Experience with generative AI. Our advisory team help you to understand what is possible, and how it relates to your business. We provide training for you to get the most of generative AI apps such as ChatGPT and Midjourney. Our technical team build bespoke tools to meet your needs. You can find us here www.timeundertension.ai/contact
A handful of Gen AI news
Here are five of the most interesting things we have seen and read in the last week;
AI’s expected impact on the Australian labour market - research from Jarden
Walmart unveils new GenAI search tech for shoppers at CES
“Everybody shut up, ChatGPT is coming” (funny Instagram Reel)
Sam Altman interviewed by Bill Gates - highly recommended podcast
CES 2024… a glimpse into our AI-powered future (funny 4 min video)