• Loop
  • Posts
  • ⌛️ AI companies are running out of data for new models. Here's why.

⌛️ AI companies are running out of data for new models. Here's why.

Plus more on Microsoft’s quantum breakthrough, OpenAI’s AI image editor, and how to jailbreak the latest AI models.

Image - Loop relaxing in space

Welcome to this edition of Loop!

To kick off your week, we’ve rounded-up the most important technology and AI updates that you should know about.

‏‏‎ ‎ HIGHLIGHTS ‏‏‎ ‎

  • Why the top AI companies are running out of data

  • How to jailbreak the latest AI models

  • Microsoft’s recent advances with quantum computing

  • … and much more

Let's jump in!

Image of Loop character reading a newspaper
Image title - Top Stories

1. AI models can be “jailbroken” with dozens of examples

We start with the latest from Anthropic’s researchers, who have discovered a new jailbreak technique. This was used to override the Large Language Model’s (LLM) restrictions, which meant that it gave them detailed instructions on how to build weapons.

They achieved this by using the “many-shot” technique. This is when you give the model lots of examples and clearly show how you want the question to be answered.

This encourages the model to answer our future questions in the same way, leading to better results.

Surprisingly, it works just as well when you give it lots of inappropriate questions. The researchers included over 100 harmful questions, such as how to hijack a car and evade the police.

As a result, the model was much more likely to ignore its restrictions and provide the answers. Anthropic has since alerted their competitors about the issue and put their own safeguards in place.

However, this is one of the unintended consequences of allowing models to process much more information - known as the context window.

While it’s really useful for businesses, since these AI models can now process huge documents and even novels, it also raises new challenges for securing them.

Image divider - Loop

2. Tech companies are running out of data for new AI models

As the top AI companies are now struggling to get enough data to train new models, some are turning to other sources.

It’s been alleged that OpenAI has transcribed over 1 million hours of YouTube videos, which were then used to train GPT-4 - despite the legal risks around using copyrighted material.

Meta are also looking at partnerships with book publishers, as they have exhausted nearly all the available written content on the Internet for English.

They even discussed buying Simon & Schuster, which is a huge publisher for authors like Stephen King.

Just a few years ago, GPT-3 was trained on around 400 billion words. Since then, all the major companies have raced to collect as much data as they can.

They’ve scaled at a remarkable rate, with Databricks using over 9 trillion words to create their latest model. Yes, trillion.

We’re reaching the point where there just isn’t enough data. Eventually, these companies will need to use synthetic data instead - which is when you use an AI model to create new data.

But even this has its challenges, since data created by the AI can reinforce and increase any biases that it already has.

Image divider - Loop

3. Samsung will spend $44 billion on a new US semiconductor factory

Previously, the company said they would spend $17 billion on the Texas factory, but that has more than doubled to $44 billion.

This money will be used to create two chip-making factories, an advanced packaging facility, and a R&D office.

They’re likely to receive billions in subsidies from the US Chips Act, as President Biden pushes for more domestic manufacturing.

This is being done to protect US access to the most advanced chips, most of which are manufactured in Taiwan.

The news follows on from Intel’s recent announcement, when they confirmed they’ve been awarded $8.5 billion in grants from the US Government and will spend $20 billion on building a new facility in Ohio.

Image divider - Loop

4. Big Tech companies form a new group to study AI job losses

The new AI-Enabled ICT Workforce Consortium (ITC) is being led by Cisco, with support from Google, Microsoft, IBM, and others.

They are tasked with exploring how AI will impact jobs within the tech industry and outlining what new training might be required. For this, their report will look at 56 job roles and will be published in the summer.

It’ll be interesting to see how detailed the final report is, as their initial comms are quite vague.

This new group has likely been created to inform politicians, who are facing the prospect that some of their workforce will be impacted by new AI tools and will need to be re-skilled.

Image divider - Loop

5. Katy Perry, Billie Eilish, and other musicians sign an open letter against irresponsible AI

A group of 200 musicians, including other artists like Imagine Dragons and Nicki Minaj, have signed an open letter urging tech companies to responsibly use AI music generation tools.

They’re arguing that the irresponsible use of AI is a threat to their livelihoods, along with other musicians who are less well-known.

While some companies are working on music generators that use licensed or royalty-free music, even these tools could hurt the smaller artists that make jingles for popular commercials.

But musicians aren’t alone here, as over 15,000 authors have signed a similar open letter. Although, it doesn’t seem to be making an impact - as the industry is pushing for access to even more data.



Image title - Closer Look

You can now edit DALL-E images, but this has much bigger implications

Gif - DALLE image editor

Thanks to OpenAI’s latest update for DALL-E, you can select areas in the generated image and ask the model to make changes.

In their example, they generated an image of a dog. To improve it further, they used the mouse to highlight the dog’s ears and typed in “Add bows”.

As you’d expect, the image stayed the same but now included two red bows.

While this seems minor, it actually addresses a major pain point with AI-generated images. You’ll have probably generated an image that was 90% right, but looked a bit weird in some places.

Before this update, you couldn’t do much about it. You’d just have to submit the same text prompt and hope the next image is better. It’s a terrible user experience.

By being able to edit files, it’s possible to fix those AI images and make them even more useful.

This is really interesting for AI images, but we should look at the bigger picture. A few months ago, OpenAI revealed their work on a video generator called Sora. It created absolutely stunning videos and the company is now pitching it to all the major Hollywood studios.

But there are a few issues with it. The biggest issue is that it doesn’t have a perfect understanding of physics - so some elements in their videos will look really weird. In one example, the tyres on a race car were moving in completely the wrong direction.

For these scenarios, you need to be able to fix these flaws and ask the model to re-generate that section. This is where the biggest opportunities lie for businesses and creatives, as it turns AI content from “mostly there, but still a bit weird” to “almost perfect”. And that’s a big jump.

So while this looks to be a minor update for DALL-E on the surface, the potential benefits for image and (eventually) video generation could be huge.



Image title - Announcement

Microsoft and Quantinuum make advances with quantum computing

Image - Microsoft quantum

If properly achieved, quantum computers could solve complex problems much faster than the computers we have today - leading to significant advances in areas like drug discovery and cryptography.

Impressively, the two companies said that they were able to run over 14,000 experiments and didn’t encounter a single error - which is a real achievement for the industry.

But what really stood out was how they demonstrated the ability to spot and fix errors, without actually destroying the qubits.

Of course, others will need to replicate these results before they can be hailed as a step forward.

While quantum can bring some huge benefits for society, it also poses new challenges. One of the biggest is around how we secure today’s data.

If quantum computers can break our current encryption standards, they could decode valuable secrets - including those about foreign spies.

This was raised by a former CIA agent at a conference I attended last year, as governments around the world are collecting all the encrypted data they can - with the hope of decrypting it in a few years with quantum computers.

Either way, this latest research seems to have brought us even closer to unlocking the technology’s full potential.



Image title - Byte Sized Extras

🍎 Apple lays off over 600 employees, after abandoning electric car project

💻 Princeton researchers reveal SWE-Agent

🚗 Tesla slashes Model Y inventory prices by up to $7,000

✈️ Canoo spent double its annual revenue on the CEO’s private jet

🔑 Sam Altman gives up control of the OpenAI Startup Fund

💰 Pigment raises $145M in rare French tech mega-round

🤖 SiMa.ai secures $70M to develop multimodal GenAI chips

🇺🇸 US and EU commit to boosting AI safety and risk research

Image of Loop character with a cardboard box
Image title - Startup Spotlight
Gif - Robovision platform

Robovision

This is a startup based in Belgium, which has developed their own no-code platform for computer vision and robotics.

Essentially, Robovision wants to be the go-to platform for companies that are developing their own computer vision models. These can then be used in drones, robots, or even farm machinery.

Companies can easily upload their data, add their own labels, test the custom model, and then deploy it to production.

Robovision is mainly targeting the agri-tech, healthcare, and manufacturing sectors - with the aim of expanding into the US as well.

While there are plenty of other companies in this sector, I was surprised at how their platform allowed you to label 3D content as well. This area is only going to grow, as more services are made for spatial computing and we increasingly create 3D scans of physical items.

Robovision have recently raised $42 million in Series A funding, which should be enough to support their plans going forward. I’ve included a link below, if you want to read more.



This Week’s Art

Image - Rio de Janeiro

Loop via Midjourney V6



Image title - End note

We’ve covered quite a lot this week, including:

  • How to jailbreak the latest AI models

  • Why the top AI companies are now running out of data

  • Samsung’s $44 billion investment in a new semiconductor factory

  • The new group that will study AI job losses

  • Open letter against irresponsible AI from musicians

  • Why DALL-E’s new image editor could have much bigger implications

  • Microsoft’s advances with quantum computing

  • And how Robovision’s platform can be used to create custom Computer Vision models

Have a good week!

Liam

Image of Loop character waving goodbye

Share with Others

If you found something interesting in this week’s edition, feel free to share this newsletter with your colleagues.

About the Author

Liam McCormick is a Senior Software Engineer and works within Kainos' Innovation team. He identifies business value in emerging technologies, implements them, and then shares these insights with others.