- Loop
- Posts
- 🎥 OpenAI's new video generator is a huge leap forward
🎥 OpenAI's new video generator is a huge leap forward
Plus more on Russia’s threat to US satellites, Google’s new Gemini 1.5 model, and how Perplexity wants to replace Google Search.
Welcome to this edition of Loop!
To kick off your week, we’ve rounded-up the most important technology and AI updates that you should know about.
HIGHLIGHTS
Meta’s new way to teach machines about the physical world
OpenAI’s challenger for Google Search
Russia’s “serious threat” to global satellites & communications
… and much more
Let's jump in!
1. OpenAI is working on a web search tool to challenge Google
According to a recent report, OpenAI is planning for an expansion into web search. The details aren’t clear at this stage, but it’s rumoured that the new product will partly use Microsoft's Bing technology.
Microsoft, who are a major investor in OpenAI, integrated the GPT models into Bing last year as they hoped to eat into Google’s market share.
That hasn’t happened. Microsoft has only seen Bing’s market share increase from around 4% to 9% - while Google remains dominant at 84%.
This move by OpenAI would certainly intensify their competition with Google, but it will also increase public interest in startups that are already building these tools - such as Perplexity, which you can read more about in this week’s Startup Spotlight section.
2. Russia is “developing nuclear weapons for space” that can target US satellites
In more worrying news, US officials have said there is a “serious threat to national security”. For security reasons, officials have been vague about the threat - but this follows on from years of warnings by aerospace experts.
Both Russia and China have been steadily developing their military capabilities in space, as they aim to catch up with the US.
Just last year, a report suggested that Russia was developing multiple anti-satellite weapons - including a missile that was successfully tested against an old Soviet satellite.
Our society has rapidly changed over the last 30 years and has become increasingly technology-dependent.
The sudden loss of satellite communications would have a profound impact on the global economy, supply chains, emergency services, and of course military operations and surveillance.
Members of Congress have since been invited to a secure facility, where they can view the classified intelligence.
3. Google is using satellites & AI to track methane emissions
Sticking with satellites, Google have announced a new partnership with the Environmental Defense Fund (EDF). They plan to develop AI models that will analyse satellite imagery and detect methane emissions around the world.
EDF are launching a new satellite in March, called MethaneSAT, and it will orbit the earth 15 times a day. Google’s team helped EDF develop a tool that can spot where methane is being produced and track the changes over time.
Google are also doing the same to identify oil & gas infrastructure around the globe, which will allow organisations & governments to spot high emission areas and take action. The data will be freely available via Google Earth Engine later this year.
4. Meta unveil a new way to teach machines about the physical world
The new V-JEPA model is a step towards Yann LeCun’s goal of machines that act more like humans. It’s capable of understanding how objects are interacting in a video.
For example, Meta’s video showed a piece of paper being ripped in two. The model was able to determine that the video showed “tearing something into two pieces”.
It’s important to stress that this isn’t a generative AI model. Instead, it was trained to make an abstract description of what was being shown - rather than trying to work out each pixel.
To achieve this, Meta trained it on lots of videos - but hid sections of them with a black box. They then asked the model to work out what was going on.
This is quite similar to the work that Google’s Lumiere can do, except V-JEPA aims to describe the scene rather than extend it.
It’s impressive stuff, but it is very early days. This is evident since Meta only showed one example in their promotional video. Regardless, it’ll be interesting to see how far they can build upon this in the future.
5. Largest text-to-speech AI model ever shows “emergent abilities”
Amazon’s researchers wanted to test if larger text-to-speech (TTS) models would follow the same path as LLMs - as the models become bigger, they’re able to do more tasks.
To test this out, they trained the largest TTS model we’ve seen so far - using over 100,000 hours of speech for the training process.
The BASE TTS model showed significant improvements in how it could handle complex language, such as emotional expressions and foreign words.
Most other TTS models would struggle with these tasks. Usually they will mispronounce these words, skip them entirely, or spend too long emphasising them.
These results confirmed that as a TTS model becomes bigger, it can perform a wider range of tasks - which isn’t something we knew before about text-to-speech.
Of course, large language models have gotten a lot of attention in the last year. But advances in text-to-speech could have profound benefits for those who rely on accessibility features.
It’s worth pointing out that the model hasn’t been publicly released, as there are fears over how it could be misused by bad actors, but you can read their research paper using the link below.
OpenAI’s Sora can create stunning videos
After years of releasing text, image, and audio generators - OpenAI has finally released details about their video model.
You’ll often read how a new AI advancement is “ground-breaking”, “mind blowing”, or maybe even “a game changer”. We all read the same hype-filled posts on social media. It’s tiresome.
But this genuinely is different.
Sora’s results are incredible and it does seem capable at simulating some level of physics. I say some because there are several examples where it doesn’t work perfectly.
That doesn’t really matter right now, since it’s a huge step forward compared to everything else on the market.
Surprisingly, it’s also able to generate minute-long videos. Before we get too excited, it’s worth considering the impact this would have.
Given we’re in an election year, this could land OpenAI in hot water if it’s misused.
As a result, they’ve decided not to release it to the public yet. But it’s only a matter of time before they do.
For now though, it is only being tested by a select number of OpenAI employees. If you want to watch the full set of videos, which I highly suggest you do, you can use the link below.
Google’s Gemini can now process 1 million tokens
There have been plenty of headlines this week, but one of the biggest was Google’s announcement that their new Gemini 1.5 model can process up to 1 million tokens.
For context (sorry I had to), Anthropic’s Claude model used to hold the top spot at 200k tokens. Google has blown past that and this opens up a lot more use cases for companies - since they can process much more data to the model than before.
On paper this is significant. Although, it remains to be seen how important this is in reality.
To date, we have seen LLM accuracy degrade as we use more data in its context window. Time will tell if Gemini suffers from the same issues.
That aside, the big tech companies are making significant progress with their LLM architectures and their understanding of a technology that’s still in its infancy. This is allowing them to rapidly increase the number of tokens that can be processed.
In 2020, OpenAI’s GPT-3 could only process 2k tokens. But just a few months ago, they announced that GPT-4 could now handle 128k tokens.
Gemini 1.5 can also process 11 hours of audio, or 1 hour of video - which works for several different languages.
Currently, it’s only available for a small number of developers, but this won’t be the case for very long.
Google’s AI tools are shaping up to be strong competitors for OpenAI, which is boosted by their integration with other Google services and now Gemini 1.5.
🤖 OpenAI disrupts 5 state-backed actors from maliciously using GPT
🚗 A Waymo robotaxi was burned in San Francisco
🗳️ TikTok will create in-app Election Centers for EU users, aims to tackle disinformation
💭 ChatGPT can now remember things that you told it
🧱 A Dutch startup is building a robot bricklayer
📈 OpenAI completes a deal that values it at $80 billion
Perplexity
With Google Search, you have to click through a dozen links until you’ve found the information you need. It’s been like this for over 20 years.
Perplexity is tackling search in a different way. You still write questions, like you would in Google’s search bar, but instead it scours the web for you and uses AI to write a summary of what it finds. It also includes links to the sources that were used.
I’ve used it myself and the results are pretty impressive. When I was looking for somewhere good to eat nearby, it returned a list of results and a map with pins showing the location for each restaurant.
When I did the same thing via Google, I had to click through multiple reviews on Tripadvisor and blogs until I learnt what I needed. Is that really necessary now, if an AI model can summarise multiple web pages? Probably not.
Of course, there is always the slight risk of the model hallucinating and making things up. But I do find it much more useful than ChatGPT’s search feature, which can rely on its inaccurate training data - rather than actually searching the web.
Perplexity raised over $73 million dollars in funding last month and already has over 10 million monthly users. You can try their search tool for free, or upgrade to use their Copilot feature - which helps you find more relevant information.
This Week’s Art
Loop via Midjourney V6
Some weeks have a flurry of announcements and this was certainly one of them.
OpenAI certainly dominated the week with Sora and its breathtaking visuals, but Google got a lot of attention for Gemini 1.5.
We’ve covered a lot this week:
OpenAI’s reported plans to challenge Google Search
US concern over Russia’s anti-satellite weapon
Google’s use of satellites to track emissions
Meta’s new model that learns about the physical world
Amazon’s development of the largest ever text-to-speech model
OpenAI’s new video generator and the stunning results
Google’s updated Gemini and huge context window
How Perplexity are re-inventing the way we search the web
Have a good week!
Liam
Feedback
Share with Others
If you found something interesting in this week’s edition, feel free to share this newsletter with your colleagues.