• Loop
  • Posts
  • 🎥 OpenAI's new video generator is a huge leap forward

🎥 OpenAI's new video generator is a huge leap forward

Plus more on Russia’s threat to US satellites, Google’s new Gemini 1.5 model, and how Perplexity wants to replace Google Search.

Image - Loop relaxing in space

Welcome to this edition of Loop!

To kick off your week, we’ve rounded-up the most important technology and AI updates that you should know about.

‏‏‎ ‎ HIGHLIGHTS ‏‏‎ ‎

  • Meta’s new way to teach machines about the physical world

  • OpenAI’s challenger for Google Search

  • Russia’s “serious threat” to global satellites & communications

  • … and much more

Let's jump in!

Image of Loop character reading a newspaper
Image title - Top Stories

1. OpenAI is working on a web search tool to challenge Google

According to a recent report, OpenAI is planning for an expansion into web search. The details aren’t clear at this stage, but it’s rumoured that the new product will partly use Microsoft's Bing technology.

Microsoft, who are a major investor in OpenAI, integrated the GPT models into Bing last year as they hoped to eat into Google’s market share.

That hasn’t happened. Microsoft has only seen Bing’s market share increase from around 4% to 9% - while Google remains dominant at 84%.

This move by OpenAI would certainly intensify their competition with Google, but it will also increase public interest in startups that are already building these tools - such as Perplexity, which you can read more about in this week’s Startup Spotlight section.

Image divider - Loop

2. Russia is “developing nuclear weapons for space” that can target US satellites

In more worrying news, US officials have said there is a “serious threat to national security”. For security reasons, officials have been vague about the threat - but this follows on from years of warnings by aerospace experts.

Both Russia and China have been steadily developing their military capabilities in space, as they aim to catch up with the US.

Just last year, a report suggested that Russia was developing multiple anti-satellite weapons - including a missile that was successfully tested against an old Soviet satellite.

Our society has rapidly changed over the last 30 years and has become increasingly technology-dependent.

The sudden loss of satellite communications would have a profound impact on the global economy, supply chains, emergency services, and of course military operations and surveillance.

Members of Congress have since been invited to a secure facility, where they can view the classified intelligence.

Image divider - Loop

3. Google is using satellites & AI to track methane emissions

Sticking with satellites, Google have announced a new partnership with the Environmental Defense Fund (EDF). They plan to develop AI models that will analyse satellite imagery and detect methane emissions around the world.

EDF are launching a new satellite in March, called MethaneSAT, and it will orbit the earth 15 times a day. Google’s team helped EDF develop a tool that can spot where methane is being produced and track the changes over time.

Google are also doing the same to identify oil & gas infrastructure around the globe, which will allow organisations & governments to spot high emission areas and take action. The data will be freely available via Google Earth Engine later this year.

Image divider - Loop

4. Meta unveil a new way to teach machines about the physical world

The new V-JEPA model is a step towards Yann LeCun’s goal of machines that act more like humans. It’s capable of understanding how objects are interacting in a video.

For example, Meta’s video showed a piece of paper being ripped in two. The model was able to determine that the video showed “tearing something into two pieces”.

It’s important to stress that this isn’t a generative AI model. Instead, it was trained to make an abstract description of what was being shown - rather than trying to work out each pixel.

To achieve this, Meta trained it on lots of videos - but hid sections of them with a black box. They then asked the model to work out what was going on.

This is quite similar to the work that Google’s Lumiere can do, except V-JEPA aims to describe the scene rather than extend it.

It’s impressive stuff, but it is very early days. This is evident since Meta only showed one example in their promotional video. Regardless, it’ll be interesting to see how far they can build upon this in the future.

Image divider - Loop

5. Largest text-to-speech AI model ever shows “emergent abilities”

Amazon’s researchers wanted to test if larger text-to-speech (TTS) models would follow the same path as LLMs - as the models become bigger, they’re able to do more tasks.

To test this out, they trained the largest TTS model we’ve seen so far - using over 100,000 hours of speech for the training process.

The BASE TTS model showed significant improvements in how it could handle complex language, such as emotional expressions and foreign words.

Most other TTS models would struggle with these tasks. Usually they will mispronounce these words, skip them entirely, or spend too long emphasising them.

These results confirmed that as a TTS model becomes bigger, it can perform a wider range of tasks - which isn’t something we knew before about text-to-speech.

Of course, large language models have gotten a lot of attention in the last year. But advances in text-to-speech could have profound benefits for those who rely on accessibility features.

It’s worth pointing out that the model hasn’t been publicly released, as there are fears over how it could be misused by bad actors, but you can read their research paper using the link below.



Image title - Closer Look

OpenAI’s Sora can create stunning videos

Gif - Results from OpenAI's Sora

After years of releasing text, image, and audio generators - OpenAI has finally released details about their video model.

You’ll often read how a new AI advancement is “ground-breaking”, “mind blowing”, or maybe even “a game changer”. We all read the same hype-filled posts on social media. It’s tiresome.

But this genuinely is different.

Sora’s results are incredible and it does seem capable at simulating some level of physics. I say some because there are several examples where it doesn’t work perfectly.

That doesn’t really matter right now, since it’s a huge step forward compared to everything else on the market.

Gif - Results from OpenAI's Sora

Surprisingly, it’s also able to generate minute-long videos. Before we get too excited, it’s worth considering the impact this would have.

Given we’re in an election year, this could land OpenAI in hot water if it’s misused.

As a result, they’ve decided not to release it to the public yet. But it’s only a matter of time before they do.

For now though, it is only being tested by a select number of OpenAI employees. If you want to watch the full set of videos, which I highly suggest you do, you can use the link below.


Google’s Gemini can now process 1 million tokens

Image - Gemini context window

There have been plenty of headlines this week, but one of the biggest was Google’s announcement that their new Gemini 1.5 model can process up to 1 million tokens.

For context (sorry I had to), Anthropic’s Claude model used to hold the top spot at 200k tokens. Google has blown past that and this opens up a lot more use cases for companies - since they can process much more data to the model than before.

On paper this is significant. Although, it remains to be seen how important this is in reality.

To date, we have seen LLM accuracy degrade as we use more data in its context window. Time will tell if Gemini suffers from the same issues.

That aside, the big tech companies are making significant progress with their LLM architectures and their understanding of a technology that’s still in its infancy. This is allowing them to rapidly increase the number of tokens that can be processed.

In 2020, OpenAI’s GPT-3 could only process 2k tokens. But just a few months ago, they announced that GPT-4 could now handle 128k tokens.

Gemini 1.5 can also process 11 hours of audio, or 1 hour of video - which works for several different languages.

Currently, it’s only available for a small number of developers, but this won’t be the case for very long.

Google’s AI tools are shaping up to be strong competitors for OpenAI, which is boosted by their integration with other Google services and now Gemini 1.5.



Image title - Byte Sized Extras

🤖 OpenAI disrupts 5 state-backed actors from maliciously using GPT

🚗 A Waymo robotaxi was burned in San Francisco

🗳️ TikTok will create in-app Election Centers for EU users, aims to tackle disinformation

💭 ChatGPT can now remember things that you told it

🧱 A Dutch startup is building a robot bricklayer

📈 OpenAI completes a deal that values it at $80 billion

Image of Loop character with a cardboard box
Image title - Startup Spotlight
Image - Perplexity web search

Perplexity

With Google Search, you have to click through a dozen links until you’ve found the information you need. It’s been like this for over 20 years.

Perplexity is tackling search in a different way. You still write questions, like you would in Google’s search bar, but instead it scours the web for you and uses AI to write a summary of what it finds. It also includes links to the sources that were used.

I’ve used it myself and the results are pretty impressive. When I was looking for somewhere good to eat nearby, it returned a list of results and a map with pins showing the location for each restaurant.

When I did the same thing via Google, I had to click through multiple reviews on Tripadvisor and blogs until I learnt what I needed. Is that really necessary now, if an AI model can summarise multiple web pages? Probably not.

Of course, there is always the slight risk of the model hallucinating and making things up. But I do find it much more useful than ChatGPT’s search feature, which can rely on its inaccurate training data - rather than actually searching the web.

Perplexity raised over $73 million dollars in funding last month and already has over 10 million monthly users. You can try their search tool for free, or upgrade to use their Copilot feature - which helps you find more relevant information.



This Week’s Art

Image - vehicle on fire in San Francisco, pixel art

Loop via Midjourney V6



Image title - End note

Some weeks have a flurry of announcements and this was certainly one of them.

OpenAI certainly dominated the week with Sora and its breathtaking visuals, but Google got a lot of attention for Gemini 1.5.

We’ve covered a lot this week:

  • OpenAI’s reported plans to challenge Google Search

  • US concern over Russia’s anti-satellite weapon

  • Google’s use of satellites to track emissions

  • Meta’s new model that learns about the physical world

  • Amazon’s development of the largest ever text-to-speech model

  • OpenAI’s new video generator and the stunning results

  • Google’s updated Gemini and huge context window

  • How Perplexity are re-inventing the way we search the web

Have a good week!

Liam



Feedback

Image of Loop character waving goodbye

Share with Others

If you found something interesting in this week’s edition, feel free to share this newsletter with your colleagues.

About the Author

Liam McCormick is a Senior Software Engineer and works within Kainos' Innovation team. He identifies business value in emerging technologies, implements them, and then shares these insights with others.