Generative AI machine learning is a game-changer, allowing computers to create new content, such as images, music, and text, that's unique and often indistinguishable from human-created work.
This technology is based on neural networks, which are a type of machine learning algorithm that can learn and improve on their own. They work by analyzing vast amounts of data and identifying patterns, which they can then use to generate new content.
One of the key benefits of generative AI is its ability to automate repetitive tasks, freeing up human time and resources for more creative and strategic work.
For your interest: What Is Human in the Loop
History of Generative AI
Generative AI has made tremendous progress in recent years, and its history is a fascinating story. The first major breakthrough came in 2021 with the release of DALL-E, a transformer-based pixel generative model.
DALL-E was followed by the emergence of practical high-quality artificial intelligence art from natural language prompts, thanks to the release of Midjourney and Stable Diffusion. This marked a significant milestone in the development of generative AI.
The public release of ChatGPT in 2022 popularized the use of generative AI for general-purpose text-based tasks. This was a major turning point, making generative AI more accessible to a wider audience.
In March 2023, GPT-4 was released, which some scholars argued could be viewed as an early version of an artificial general intelligence (AGI) system. However, others disputed this claim, saying that generative AI is still far from reaching the benchmark of "general human intelligence".
China is leading the world in adopting generative AI, with 83% of Chinese respondents using the technology, surpassing the global average of 54% and the U.S. at 65%. This is evident from a survey by SAS and Coleman Parkes Research.
Meta released an AI model called ImageBind in 2023, which combines data from text, images, video, thermal data, 3D data, audio, and motion. This is expected to allow for more immersive generative AI content.
Technologies and Modalities
Generative AI systems can be trained on various data sets, including text, images, audio, and even robotic movements. This versatility is made possible by the different modalities or types of data used.
Unimodal systems take only one type of input, such as text or images, whereas multimodal systems can handle multiple inputs. For example, OpenAI's GPT-4 accepts both text and image inputs.
Audio clips can be used to train generative AI systems for natural-sounding speech synthesis and text-to-speech capabilities. ElevenLabs' context-aware synthesis tools and Meta Platform's Voicebox are great examples of this.
Generative AI can also be trained on the motions of a robotic system to generate new trajectories for motion planning or navigation. UniPi from Google Research uses prompts to control movements of a robot arm.
Curious to learn more? Check out: Smart Parking Systems Machine Learning
Neural Nets (2014-2019)
In 2014, advancements in neural nets led to the creation of the first practical deep neural networks capable of learning generative models for complex data such as images.
The variational autoencoder and generative adversarial network played key roles in this breakthrough.
These deep generative models were the first to output not only class labels for images but also entire images.
The Transformer network enabled further advancements in generative models in 2017, surpassing older models like Long-Short Term Memory.
The first generative pre-trained transformer, GPT-1, was introduced in 2018, marking a significant milestone in the field.
GPT-2, released in 2019, demonstrated the ability to generalize unsupervised to many different tasks as a Foundation model.
Unsupervised learning allowed for larger networks to be trained without the need for humans to manually label data, a major shift from traditional supervised learning.
On a similar theme: Machine Learning Unsupervised Clustering Falls under What Category
5. Training Paradigm
ML models typically follow supervised or unsupervised learning paradigms. Supervised learning involves using clear data examples with answers or feedback to learn the relationship between input and output.
The training process involves adjusting model parameters to minimize a predefined loss function, which measures the disparity between predictions and actual outcomes. This is a crucial step in ensuring the model learns from its mistakes.
Recommended read: Velocity Model Prediciton Using Generative Ai
Generative AI models often rely on unsupervised or self-supervised learning approaches. These approaches allow the model to learn from data without explicit labels or feedback.
Adversarial training techniques, such as GANs, can also be used to improve the quality of generated samples. In GANs, two neural networks compete against each other to produce better results.
Modalities
Generative AI systems can be trained on various data sets, including audio clips, to produce natural-sounding speech synthesis and text-to-speech capabilities.
These systems can be trained extensively on audio waveforms of recorded music along with text annotations to generate new musical samples based on text descriptions.
A generative AI system is constructed by applying unsupervised machine learning to a data set, and its capabilities depend on the modality or type of the data set used.
Generative AI can be either unimodal or multimodal, with unimodal systems taking only one type of input and multimodal systems accepting more than one type of input.
For example, one version of OpenAI's GPT-4 accepts both text and image inputs, showcasing the multimodal capabilities of generative AI systems.
A unique perspective: Which Is One Challenge in Ensuring Fairness in Generative Ai
Actions
Generative AI can be trained on the motions of a robotic system to generate new trajectories for motion planning or navigation.
UniPi from Google Research uses prompts like "pick up blue bowl" or "wipe plate with yellow sponge" to control movements of a robot arm.
Multimodal "vision-language-action" models such as Google's RT-2 can perform rudimentary reasoning in response to user prompts and visual input.
These models can be used to control robots to perform tasks like picking up a toy dinosaur when given the prompt to pick up the extinct animal at a table filled with toy animals and other objects.
Input vs Output
In Machine Learning, the quality and reliability of outputs depend heavily on the input data quality and features extracted during training.
The focus lies in optimizing models for accurate results rather than generating entirely new information.
Generative AI operates differently by utilizing random noise as input to generate outputs that exhibit characteristics learned from training data.
This approach allows for the creation of novel content that doesn't merely mirror existing input but goes beyond by creating something entirely distinctive yet coherent.
Machine Learning models are optimized for accurate results, whereas Generative AI is optimized for generating new information.
Check this out: Generative Ai Training
Software and Hardware
Generative AI models can power a wide range of products, from chatbots to programming tools and text-to-image products.
Smaller models with up to a few billion parameters can run on smartphones, embedded devices, and personal computers, such as the Raspberry Pi 4 and iPhone 11.
Larger models with tens of billions of parameters require accelerators like NVIDIA and AMD's GPU chips or Apple's Neural Engine to achieve acceptable speed.
Running generative AI locally offers advantages like protecting privacy and intellectual property, and avoiding rate limiting and censorship.
Code
Large language models can be trained on programming language text, allowing them to generate source code for new computer programs.
This capability is already being utilized in tools like OpenAI Codex, which can produce functional code based on a given task or prompt.
These models can learn from vast amounts of programming language text, enabling them to understand the syntax, semantics, and structures of various programming languages.
With this ability, developers can potentially automate the process of coding, freeing up time for more complex and creative tasks.
For more insights, see: Are Large Language Models Generative Ai
Software and Hardware
Generative AI models can power a wide range of products, from chatbots like ChatGPT to programming tools like GitHub Copilot.
Many commercially available products have integrated generative AI features, such as Microsoft Office, Google Photos, and the Adobe Suite.
Larger generative AI models with tens of billions of parameters can run on laptop or desktop computers, but may require accelerators like GPU chips from NVIDIA or AMD.
The 65 billion parameter version of LLaMA can be configured to run on a desktop PC, but smaller models with up to a few billion parameters can even run on smartphones or embedded devices.
A version of LLaMA with 7 billion parameters can run on a Raspberry Pi 4, and one version of Stable Diffusion can run on an iPhone 11.
Running generative AI locally offers several advantages, including protection of privacy and intellectual property, and avoidance of rate limiting and censorship.
The subreddit r/LocalLLaMA focuses on using consumer-grade gaming graphics cards to run large language models, and is a trusted source for language model benchmarks.
Copyright and Ethics
Copyright laws don't fully apply to generative AI, as the output is often considered a derivative work rather than an original creation.
This raises questions about ownership and authorship, as the AI is not a human creator but a machine learning model.
Generative AI models can be trained on copyrighted materials, which can lead to copyright infringement if not properly licensed or attributed.
The lack of clear regulations and guidelines for generative AI creates a gray area in terms of copyright and ethics.
Copyright of Content
Copyright of content is a complex issue, especially when it comes to AI-generated works. In the United States, the Copyright Office has ruled that works created by artificial intelligence without human input cannot be copyrighted because they lack human authorship.
The lack of human input is a crucial factor in determining copyright eligibility. The office has also begun taking public input to determine if these rules need to be refined for generative AI.
A unique perspective: Generative Ai Human Creativity and Art Google Scholar
Misuse in Journalism
Copyright infringement is a serious issue in journalism, often resulting in costly lawsuits and damaged reputations. The article highlights how a news organization was sued for $1 million for using a copyrighted image without permission.
Journalists must be mindful of the rights of others, including photographers and writers. They often rely on free or low-cost images from stock photo websites, but even these can be copyrighted.
The consequences of copyright infringement can be severe, including fines and even imprisonment in some cases. The article notes how a blogger was fined $10,000 for copyright infringement.
Journalists must also be aware of the ethics surrounding the use of images. The article cites an example of a news organization using a photo of a public figure without permission, which led to a public outcry and a retraction.
In some cases, journalists may unknowingly commit copyright infringement. The article notes how a news organization used a copyrighted image without permission, and the photographer had to contact them multiple times before they removed it.
For another approach, see: Generative Artificial Intelligence News
Frequently Asked Questions
What is the difference between generative AI and machine learning?
Generative AI creates new content, while machine learning improves computer decision-making. Essentially, generative AI generates, while machine learning learns
Is generative AI a subfield of machine learning?
Generative AI is not a subfield of machine learning, but rather a distinct approach within artificial intelligence that focuses on creating new content. While machine learning enables computers to learn from data, generative AI enables them to generate new data or content.
Sources
- 10.31235/osf.io/c4af9 (doi.org)
- the original (adweek.com)
- "Google Is Paying Publishers Five-Figure Sums to Test an Unreleased Gen AI Platform" (archive.today)
- cs.CV (arxiv.org)
- 2211.01777 (arxiv.org)
- 10.1038/s41586-024-07566-y (doi.org)
- "How Much Research Is Being Written by Large Language Models?" (stanford.edu)
- cs.DL (arxiv.org)
- 2403.16887 (arxiv.org)
- 10.18653/v1/2024.findings-acl.103 (doi.org)
- 2401.05749 (arxiv.org)
- the original (theconversation.com)
- 10.1038/s42256-020-0219-9 (doi.org)
- 10.1145/3442188.3445922 (doi.org)
- "On the Dangers of Stochastic Parrots: Can Language Models be Too Big? 🦜" (acm.org)
- 10.1109/ACCESS.2023.3300381 (doi.org)
- 2307.00691 (arxiv.org)
- "The generative A.I. software race has begun" (fortune.com)
- cs.LG (arxiv.org)
- 1910.03810 (arxiv.org)
- "Large language models are biased. Can logic help save them?" (mit.edu)
- "The Generative AI Copyright Fight Is Just Getting Started" (wired.com)
- "China says generative AI rules to apply only to products for the public" (reuters.com)
- "3 Obstacles to Regulating Generative AI" (hbr.org)
- "Detecting AI may be impossible. That's a big problem for teachers" (washingtonpost.com)
- "Adobe Adds Generative AI To Photoshop" (mediapost.com)
- "Text-to-video AI inches closer as startup Runway announces new model" (theverge.com)
- "10 Best Artificial Intelligence (AI) 3D Generators" (eweek.com)
- 2307.15818 (arxiv.org)
- "UniPi: Learning universal policies via text-guided video generation" (googleblog.com)
- "Meta in June said that it used 20,000 hours of licensed music to train MusicGen, which included 10,000 "high-quality" licensed music tracks. At the time, Meta's researchers outlined in a paper the ethical challenges that they encountered around the development of generative AI models like MusicGen" (musicbusinessworldwide.com)
- 2301.11325 (arxiv.org)
- 2306.04141 (arxiv.org)
- 2107.03374 (arxiv.org)
- 2108.07258 (arxiv.org)
- "Explainer: What is Generative AI, the technology behind OpenAI's ChatGPT?" (reuters.com)
- "A History of Generative AI: From GAN to GPT-4" (marktechpost.com)
- "China leads the world in adoption of generative AI, survey shows" (reuters.com)
- 10.1177/02683962231200411 (doi.org)
- cs.CL (arxiv.org)
- 2303.12712 (arxiv.org)
- cs.AI (arxiv.org)
- 2303.04226 (arxiv.org)
- 10.1109/5254.722362 (doi.org)
- 10.1023/A:1007469218079 (doi.org)
- 264113883 (semanticscholar.org)
- "Misinformation reloaded? Fears about the impact of generative AI on misinformation are overblown" (harvard.edu)
- "How Generative AI Can Augment Human Creativity" (hbr.org)
- 10.3386/w31161 (doi.org)
- Generative AI at Work (nber.org)
- "Google Cloud brings generative AI to developers, businesses, and governments" (google.com)
- "A Coming-Out Party for Generative A.I., Silicon Valley's New Craze" (nytimes.com)
- 2201.08239 (arxiv.org)
- "OpenAI Plans to Up the Ante in Tech's A.I. Race" (nytimes.com)
- "Generative models" (openai.com)
- 2307.15208 (arxiv.org)
- generative machine learning (geeksforgeeks.org)
- pursuing generative AI (technologyreview.com)
- generative AI systems (gartner.com)
- Generative AI vs Machine Learning: What's the Difference? (blueprism.com)
- Generative AI vs Machine Learning: Key Differences and ... (monterey.ai)
- Generative AI vs. Machine Learning: Exploring the Key ... - Lore (lore.com)
Featured Images: pexels.com