
Hello,
Welcome to this edition of Loop! We aim to keep you informed about technology advances, without making you feel overwhelmed.
To kick off your week, weâve rounded-up the most important technology and AI updates that you should know about.
In this edition, weâll explore:
- Microsoftâs efforts in developing smaller Language Models and reducing costs
- Sam Altmanâs return to OpenAI
- How CarViz are using computer vision to detect car damage
- ⊠and much more
Let's jump in!

Top Stories
1. Googleâs Bard AI chatbot can now answer questions about YouTube videos [Link]
Occasionally, you will see that the top result on Google Search has a transcript of a YouTube video, which aims to answer your query. Bard is likely making use of the same text transcript to answer questions about the videoâs content, with it being sent to the LLM as context, rather than any actual multi-modal functionality - since the costs of doing so are still quite large.
This is part of the gradual shift to allow users to ask questions about the content theyâre watching. Microsoftâs MM-Vid, which is described later in this post, aims to do something very similar. Sometimes video creators can be slow to get to the point, so Google gave an example of a user asking how many eggs were needed for the recipe - allowing you to get organised as the creator gives their video introduction.
2. Inflection announce their 2nd generation model [Link]
Inflection AI, which is being led by DeepMindâs co-founder, has announced a new version of their Large Language Model (LLM). They have released benchmark tests that show their model is the second most capable LLM, behind OpenAIâs GPT-4.
Interestingly, Inflection 2 is able to outperform Googleâs code optimised version of PaLM-2 - and this is despite the fact that Inflection did not focus on training it for âcoding and mathematical reasoningâ. Itâs impressive work, especially considering that Inflection AI has just 50 employees, which pales in comparison to the over 750 working at OpenAI.
3. Microsoft Research unveil a small Language Model called Orca 2 [Link]
The research team at Microsoft has been tasked with making smaller language models, as the costs of running LLMs are incredibly high. It has been reported that theyâre losing between $20-$60 on each GitHub Copilot subscription, with this occuring every month. As the company integrates LLMs into Word, PowerPoint, Excel, the Windows OS and elsewhere - thereâs a real need to quickly reduce costs.
Orca 2 is the latest part of that work, which aims to produce cheaper models, but still maintain most of the functionality we have come to expect. The model comes in 2 sizes, 7 and 13 billion parameters, and theyâve shown that it can outperform models that are 5-10 times larger. The modelâs weights have been made available on HuggingFace.
4. Sam Altman has returned to OpenAI [Link]
Itâs been quite the week. OpenAI suddenly fired their CEO Sam Altman, which led to their co-founder Greg Brockman and several others leaving the company. A showdown then ensued, as over 700 employees then called for the board to reinstate Altman or they would resign.
But on Sunday night the board hired Twitchâs co-founder as their new CEO, with Microsoft swiftly announcing that Sam would lead their new Advanced AI research team. Eventually, OpenAIâs board relented to external & internal pressure, agreeing to bring Sam Altman back as their CEO and to change how OpenAI is governed. Still with me?
Itâs been an astonishing week thatâs gripped the tech world. The constant twists and turns were entertainment for some, but companies who rely on OpenAIâs GPT models worried about what it meant for them. A huge amount of money and software relies on what OpenAI have created. While the drama has died down for now, OpenAI will have some work to do in rebuilding trust with their customers.
5. Anthropic release Claude 2.1 to developers [Link]
As mentioned last week, Anthropic was founded after a group of OpenAI researchers disagreed with Sam Altman on how to safely develop AI and then left to start their own company.
Anthropic aimed to capitalise on OpenAIâs internal struggles and released Claude 2.1, which has a 200k context window, âsignificantlyâ less hallucinations, and now supports tool use. To get a better idea of how big a 200k context window really is, you could include 150,000 words (or around 500 pages of text).
Although, it is worth noting that as you increase the context window, LLM responses will often become less accurate. Itâs hoped that this can be minimised over time, but itâs great to see nonetheless. OpenAI might be leading the field with GPT-4, but Anthropic and Inflection arenât far behind.
Closer Look
Microsoft have used GPT-4 Vision to analyse TV episodes, live sports, and games


The MM-Vid project used a mixture of GPT-4 vision - along with other computer vision, audio, and speech tools - to better analyse longer form videos, such as TV episodes. Similar to what Google have done with Bard and YouTube videos, Microsoftâs team were able to create a detailed script for the uploaded video. This script accurately describes the characterâs movements, expressions, and dialogue throughout - which is then processed by a LLM and used to answer the userâs questions.
The team have provided lots of examples of how it could be used, such as answering questions about a characterâs motivations in a TV show, having an AI system that can play a game of Super Mario, or to show the âmost exciting momentâ of a MLB baseball game.
It can even describe what is happening in a scene, even without any audio. This could be useful in creating audio descriptions for those with visual impairements, even if the content creator didnât add them, and make a huge catalogue of content more accessible for people.
If you want to see the full list of examples, you can view them on GitHub.
Announcement
Stable Diffusion for Video is now here

Weâve seen some substantial progress with text-to-video generators in the last few months, following announcements by Runway and Meta, and now Stability AI has joined the race. Theyâre well-known for their image model, but the video model looks to be just as impressive. The company claims it can be easily adapted for other tasks, such as creating 3-dimensional views from a single image.
However, the model isnât being released to the public just yet - since Stability AI are focused on further improving the video quality and safety aspects - but they âlook forward to sharing the full releaseâ at some point in the future. This will work well with the companyâs other models, which span across audio, image, 3D, and text generation.
If you want to read the full announcement, you can see it on their website.
Byte-Sized Extras
F1 is using computer vision to detect when cars are leaving the track [Link]
Generative AI startup AI21 Labs raises extra cash from investors [Link]
SpaceX plans to sell shares next month at $150B valuation [Link]
Cruiseâs CEO Kyle Vogt resigns, following months of turmoil [Link]
Binance to pay $4.3B in fines and their CEO steps down from the crypto exchange, will plead guilty to anti-money laundering charges [Link]
Hyundai and Motional are to jointly manufacture an IONIQ 5 robotaxi in Singapore [Link]

Startup Spotlight

CarViz
This is a computer vision startup thatâs based in France, which aim to analyse a carâs condition and give users a more accurate valuation. They can detect scratches, dents, tyre condition, and other types of damage. The company also uses data from government sources, which provides information about the vehicle specification, and compare this to the documents that were provided by the owner.
The final report is then sent to the user, which is often a large dealership - CarViz currently work with 6 of the major dealers in France, along with others in Spain, Germany, and the US.
If you want to read more about what they do, you can check out their website.

Analysis
Weâre starting to see the big tech companies explore what insights can be gathered from using LLMs to analyse video content. You can imagine a situation where we will soon be able to ask Netflix questions about a TV show as we watch it. This would be useful if youâve just missed something that was said and want it explained, without having to scroll back, or if you want a quick recap of the last few episodes.
As streaming companies - such as Netflix & Disney - are starting to see their subscriber growth slow and need to justify further price rises, this could prove invaluable. Imagine a situation where you could talk to an AI version of your favourite TV or movie character, just by using your remote - most of our TV remotes are already equipped with voice assistants. As competition heats up in the video streaming world, whether itâs YouTube or Netflix, you can expect more features to be developed around this.
Iâm excited to see what this means for live sports games and other types of content, going forward. Creating highlights of sports games can be quite intensive and has to be done within a very short timeframe, before itâs either broadcast on television or is added to YouTube for fans to view.
Does this mean we could use AI models to create scripts of the event in real time, then have a LLM analyse it for what it suggests are the highlights? It could certainly speed up the process for staff, while it could also feed into the live text commentary pages that sports websites often have.
Thereâs endless possibilities of how the tech could be used in this way - and it looks like weâve only taken the first step in exploring whatâs possible.
This Weekâs Art

Prompt: An ultra-realistic modern art studio scene, featuring a sleek robotic arm painting on a canvas. The studio is vibrant and filled with bright, pop art-inspired colors. The painting depicts a generic tomato soup can, styled colorfully in a manner reminiscent of pop art, avoiding any trademarked designs. The atmosphere combines retro and futuristic aesthetics, illustrating the fusion of traditional art themes with advanced technology. The scene captures the essence of pop art without any specific public figures or copyrighted designs.
Platform: DALL-E 3

End Note
While the week has been pretty dramatic with Sam Altmanâs firing and return to OpenAI, there are much wider stories to take note of. OpenAIâs rivals have tried to capitalise on the dramatic twists, by unveiling their new models and have offered jobs to OpenAI's dissatisfied employees.
But serious advances are being made with both GenAI video generation and video analysis. These could open the door to a huge number of new applications, which is very exciting.
This week weâve looked at Inflection, Anthropic and Microsoftâs new models, Stability Diffusion for Video, the return of Sam Altman as CEO, Microsoftâs MM-Vid, using Bard to ask questions about YouTube videos, and how CarViz are using computer vision to spot vehicle damage.
Have a good week!
Liam

Share with Others
If you found something interesting, feel free to share this newsletter with your colleagues.
About the Author
Liam McCormick is a Senior Software Engineer and works within Kainos' Innovation team.
He identifies business value in emerging technologies, implements them, and then shares these insights with others.

