Tuesday Nov 12, 2024

Creating a clean generative AI data set with Getty Images

At the beginning of the wave of generative AI hype, many feared that generative models would replace the jobs of creatives like artists and photographers.

With generative AI models such as Dall-E and Midjourney seemingly creating unique works of art and images, some artists found themselves at a disadvantage. Some say the generative systems took their artwork, copied it and used it to produce their own images. In some cases, the generative systems allegedly outright stole the creative work.

Two years later, artists have to some extent been reassured by the support of stock vendors like Getty Images.

Instead of trailing behind generative AI tools such as Stable Diffusion, Getty created its own image-generating tool: Generative AI by Getty Images.

Compared with other image generators, Getty has taken great lengths to restrict its model through the data set. The stock photography company maintains what it calls a clean data set.

"A clean data set is really a training data set that a model is trained on that can lead to a commercially safe or responsible model," said Andrea Gagliano, senior director of AI and machine learning at Getty Images, on the latest episode of TechTarget Editorial's Targeting AI podcast.

Getty's clean data set does not contain brands or intellectual property products, Gagliano said. The model's data set also does not include images of well-known people or likenesses of celebrities like Taylor Swift or presidential candidates.

"We have taken the very cautious approach where our generator will not generate any known person or any celebrity," Gagliano said.

"It will not generate Donald Trump," she said, referring to the President-elect. "And it will not generate Kamala Harris," referring to the vice president and former presidential candidate.

"It has never seen a picture of Donald Trump," she continued. "The model has never seen a picture of Kamala Harris."

Gagliano added that removing this possibility also guards against those who want to misuse the technology to create deepfakes. Therefore, any generated output is labeled synthetic or AI-generated.

"We don't want any situation where we start to undermine the value of a real image," Gagliano said.

Finally, the data set that Getty uses produces images with licenses on them, ensuring that creators get compensated. Thus, a portion of every dollar made by Generative AI by Getty Images is given to the creator who contributed to the data set.

"The reason for that is the more unique imagery that we bring into the training data set, the more additive it is," Gagliano said.

Getty updated its generative AI tools Tuesday. The new capabilities include Product Placement, which lets users upload their own product images and generate backgrounds, and Reference Image, which enables users to upload sample images to guide the color and composition of the AI-generated output.

Esther Ajao is a TechTarget Editorial news writer and podcast host covering artificial intelligence software and systems. Shaun Sutner is senior news director for TechTarget Editorial's information management team, driving coverage of artificial intelligence, unified communications, analytics and data management technologies. Together, they host the Targeting AI podcast series.

 

Copyright 2023 All rights reserved.

Podcast Powered By Podbean

Version: 20241125