Cipheron on 1/12/2022 at 02:09
Quote Posted by Azaran
My bold prediction: within the next few decades, most of our entertainment will be AI generated. The tech to make AI generated (
https://www.youtube.com/watch?v=qf6eOSJgN0Y) music is out there too, and AI video tech is already starting up.
Music, movies, tv shows, will be mainly AI generated within 20 years. You'll be able to feed a full movie /show script into an AI program, and it will spit out a complete film or show within a few minutes. All it will need is human revision to iron out the kinks, some editing, and voilà
That's probably not going to happen like that.
It's possible to pump billions of images into a deep learning engine, because we do in fact have billions of sample images. But ... the search space for "movie" is vastly larger than the search space for "image". The problem is that there just aren't enough movies ever made that would allow the same trick to work. If you need 1 billion images to make an image generator work, you probably need like 100 quadrillion possible movies to make the same trick work for that. The limit of data-driven generators is that you need the data to start with.
And we already have something that solves most of the problems a movie-generator AI would need to solve: video game engines. These don't have issues like forgetting how many fingers a human has, or what was happening in the story 2 minutes ago. Anyway, every second of footage you see in a video game is already generated by "AI". Games already let you tell your own story.
What we've seen for the last 3 decades is ever-more CGI/AI generated content in games and movies, but the costs and amount of labor has always gone up, not down. So the most likely thing going forward is that, yes, more content is generated, but at the same time, humans will be making more than before, not less.
Pyrian on 1/12/2022 at 03:03
A Roguelike is kind of an AI generated game each time. :p
Tocky on 2/12/2022 at 06:00
Let's see it do this-
Inline Image:
https://i.imgur.com/kxylFUR.jpgPainting by Jere Allen, father of an old friend.
It may recreate but it can't original. Its database is still stuck on existing work.
Qooper on 2/12/2022 at 08:20
This was a tricky puzzle chamber. To get the tetris piece, you have to get the jammer fr... Oh wait, wrong game.
Thirith on 2/12/2022 at 14:23
Quote Posted by Tocky
It may recreate but it can't original. Its database is still stuck on existing work.
I've seen AI do pretty remarkable remixes - such as
Jodorowsky's Frasier - which I'd consider relatively original. Not wholly, obviously, because they're remixes, but then, I believe that most new art is a remix of various influences to a large extent.
Qooper on 2/12/2022 at 15:39
Quote Posted by Cipheron
That's probably not going to happen like that.
It's possible to pump billions of images into a deep learning engine, because we do in fact have billions of sample images. But ... the search space for "movie" is vastly larger than the search space for "image". The problem is that there just aren't enough movies ever made that would allow the same trick to work. If you need 1 billion images to make an image generator work, you probably need like 100 quadrillion possible movies to make the same trick work for that. The limit of data-driven generators is that you need the data to start with.
And the search space for "game" is vastly larger than that of "movie".
I think you might be right. Although I would argue that a generator doesn't always need to be data-driven in this specific way. There are many ways to use neural networks, and many ways to connect them with other mechanisms.
A movie differs from an image in that there are deeper concepts to be understood in a movie, such as plot and the whole temporal dimension. I'm not an expert on AI, but if neural networks are a mechanism for measuring concepts, and that they can be chained to measure higher-level concepts from lower-level ones, then for movies we'd need longer chains of these and thus more compute power.
Another way I'd approach this is that it wouldn't need to learn entire movies, and it wouldn't need to be one chain of neural networks but rather separate engines. There could be an engine that understands space and lighting, and one that understands short-term and long-term causality, one that understands human facial and body expressions, one that understands psychology, one that understands how plots work in textual form, etc.
So in this case we'd give it movies to learn visuals, not necessarily plot. Storytelling it would learn from movie scripts combined with storyboards so it gets the connection how the script translates to something visual.
demagogue on 2/12/2022 at 23:51
Yes that last point is how I'd think about it too. As a technical point, the data works at different levels of overlapping scale. For a movie, you don't need a quadrillion images. You need additional levels of scale e.g., starting at the top level with the broad plot arc, including characters, motivations, and setting, etc., then all that decomposed into a series of scenes, then that decomposed into a series of actions with their own mini-arcs for the scene, and only then creating the setting and action, probably not directly via images but rendering models in a 3D engine.
And I do think AI will be making very original-looking art just because I think the conceptual space it's working with is much bigger than our conceptual space of what's "recognizable". The issue isn't the AI's capacity. I think it's the ability of prompt-writers to understand the model and know how to pick the right prompts and prompt-logic that works with the model to create interesting and original-looking art. When you significantly change the model, you can change the prompt-logic.
There's a learning curve happening right now where people that are really into this are learning this new language and getting better and better at it. One of the interesting threads in the SD Discord I thought was the Chinese Telephone game, where a person tries to recreate the image above theirs. It's like watching this community learning how prompts work at a really fine level of detail in real time. What they're really doing I think is feeling their way around inside the guts of the model and how things are "organized" in there, and pulling structure out of it. It's not always intuitive in natural language at all.
So it's not only the model, but it's the users that are advancing in leaps and bounds how to use this tech, at least it seemed to me.