Can Artificial Intelligence Systems like DALL-E or Midjourney Perform Creative Tasks?

Recently we are witnessing a big change in the process of generating images. The recent DALL-E or Midjourney influx and growth of machine learning and artificial intelligence raises questions about how creative processes evolve and develop through technology.

Systems such as DALL-E, DALL-E 2, and Midjourney are AI programs capable of generating images from text descriptions using a dataset of text-image pairs. The diverse set of capabilities includes creating anthropomorphic versions of animals and objects, combining unrelated concepts in plausible ways, and applying transformations to existing images.

Recently we are witnessing a big change in the process of generating images. The recent influx and growth of machine learning and artificial intelligence raises questions about how creative processes evolve and develop through technology.

DALL-E and similar systems can create plausible images for a wide variety of sentences that explore the compositional structure of language. DALL-E has some of the capabilities of a 3D rendering engine, but the difference lies in the nature of the inputs. For 3D rendering, the input must be specified in full detail, while DALL-E can often “fill in the blanks”. You can also independently control the attributes of a small number of objects.

While these models have limitations, the field DALL-E or Midjourney is evolving at an unprecedented rate. Apple recently released Gaudi, a “neural architect” that takes this process a step further by creating 3D scenes from text prompts like “go upstairs” or “walk down the hall.” It is difficult to predict something new.