• Loop
  • Posts
  • šŸ“ˆĀ OpenAIā€™s rivals are closing in on GPT-4

šŸ“ˆ OpenAIā€™s rivals are closing in on GPT-4

Plus more on Inflectionā€™s new model, Claude 3, and how Athina can monitor your LLM app.

Image - Loop relaxing in space

Welcome to this edition of Loop!

To kick off your week, weā€™ve rounded-up the most important technology and AI updates that you should know about.

ā€ā€ā€Ž ā€Ž HIGHLIGHTS ā€ā€ā€Ž ā€Ž

  • Chinaā€™s secretive ā€œDelete Americaā€ plans

  • Anthropicā€™s new model that beats GPT-4

  • Why we shouldnā€™t use OpenAIā€™s models to review job applications

  • ā€¦ and much more

Let's jump in!

Image of Loop character reading a newspaper
Image title - Top Stories

1. Russian hackers stole Microsoft source code and spied on executives

We start with Microsoft, who believe that hackers involved in the 2020 SolarWinds attack have been involved in this one.

It appears as though they got access to an account that didnā€™t have two-factor authentication enabled, which was used to download source code.

The same hacker group have previously spied on Microsoftā€™s leadership and obtained their emails. This information was then used to compromise other systems within the company.

Although thereā€™s currently no evidence that the group have compromised systems that are used by Microsoft customers, itā€™s possible that they will use this information for future attacks.

Image divider - Loop

2. China secretly plans to remove American software from core industries by 2027

Known as ā€œDelete Aā€, the Chinese government is planning to eliminate American technology from state-owned companies by 2027. The classified memo states that these gaps will be replaced with local companies instead.

Itā€™s a significant push and one that we should be worried about. In the short term, this will affect all the major software providers, which have used China to rapidly expand and grow their profits.

In the longer term, itā€™s a clear sign of just how relations have worsened between China and the West - with both sides preparing to cut ties.

Over in the US, there have been growing calls to restrict Chinese influence and Congress members have moved quickly to potentially ban TikTok. There are fears that the huge amounts of data it collects about Americans could be used to influence voters in future elections.

Although itā€™s also an indicator of how confident China is in their own industries. They already produce a huge amount of AI research, manufacture many of the chips needed for our devices, and their low-cost electric vehicles are now entering Western markets.

That golden era of opening up to foreign technology now looks to be over.

Image divider - Loop

3. Sam Altman rejoins OpenAIā€™s board after investigation into sudden firing

After he was removed from the board late last year, Sam Altman has returned again. It follows an independent investigation, which found that his behaviour didnā€™t warrant the dismissal.

Thereā€™s not much detail in OpenAIā€™s public summary of the report, but it does add that 3 new board members will join alongside Altman - with each having experience working at Facebook, the Gates Foundation, and Sony Entertainment.

Itā€™s one of the last things OpenAI had to organise, before they could move on from Novemberā€™s saga which brought the company on the brink of collapse - after 95% of employees threatened to leave if Sam Altman wasnā€™t reinstated.

Image divider - Loop

4. OpenAIā€™s models showed bias when used for job recruiting

In the last year, there has been a rush for businesses to adopt generative AI and stay ahead of the competition. Thatā€™s natural, although it can cloud our thinking when weā€™re in this mentality.

Many companies are struggling to hire staff that meet their needs. Sometimes they can have thousands and thousands of applications for just one position. It takes a lot of time and money to go through this process.

Now that weā€™ve seen just how incredible OpenAIā€™s GPT models are at summarising and answering questions, surely it can be used to filter out job applications too?

Bloomberg tested GPT-3.5 and found that it would routinely discriminate against names that seemed to be from a specific ethnic group.

Of course, thatā€™s not unexpected behaviour - these models have been trained on millions of social media posts and books, which means those biases are actually being amplified.

But itā€™s a reminder that we need to take a step back and have a more critical view of how this technology - and future breakthroughs - can be used in practice.

Image divider - Loop

5. Adobe is making it easier to create social media content with AI

Adobe has launched their Adobe Express app, which allows you to use Fireflyā€™s AI models to more quickly create content for social media.

There are several different options, such as their text-to-image generator, generative fill which uses text prompts to edit the existing image, and text effects to dynamically style the text.

This is mainly targeted at small businesses and those who make a living from social media, as thereā€™s more flexibility in what can be created.

Itā€™s worth adding that this is a beta app, so iOS users will have to sign up to gain access. Or if youā€™re an Android user, you can simply download the app from the Google Play store.



Anthropicā€™s new version of Claude has beaten GPT-4

Image - Claude benchmarks

Thereā€™s been a lot of excitement over the new Claude models, which Anthropic claims can beat GPT-4 across many of the major benchmarks - and are much more consistent than Claude 2.1.

Theyā€™ve released three different models - Haiku (small), Sonnet (medium), and Opus (large & most powerful). Based on your needs and budget, there are plenty of options available.

If you want to make a judgement about a small section of text, such as categorising an email, youā€™ll want to use Haiku.

Sonnet is a very capable model and will be useful for most tasks. Thereā€™s obviously the desire to use the biggest and most powerful model for every task, but the technology has advanced to a point where thatā€™s not necessary.

For a very small number of use cases, such as analysing books or huge documents, then you will want to use Opus. It offers improved performance and recall, but the cost will be quite a bit higher compared to Sonnet.

All of these models are multimodal and have a context window of 200k tokens, but a select number of customers can see this increase to 1 million tokens.

While OpenAI will regularly steal the headlines, their rivals are quickly catching up to last yearā€™s GPT-4 model. Itā€™s only a matter of time before the next version of GPT is released, but we now have a much greater choice available to us - beyond just one dominant company.

Anthropic are mostly targeting the enterprise sector, which is why they have placed a greater focus on context windows and consistent results.

Given their ties with Googleā€™s Vertex AI and AWSā€™ Bedrock platforms, itā€™s very easy for businesses to integrate these new models into their processes.



Image title - Closer Look

Inflectionā€™s powerful new model closes in on GPT-4

Image - Inflection benchmarks

The company is being led by DeepMindā€™s co-founder and has unveiled their new Inflection-2.5 model.

Trying to keep up with all these models can be tricky, but Inflection is focusing on a different area to everyone else - theyā€™re creating AI assistants that can provide you with career advice, draft business plans, or just discuss your favourite movie.

Companies like OpenAI and Anthropic are tackling very generalised tasks, while Inflection is tackling conversational AI.

Their latest model is incredibly efficient, as it almost matches GPT-4ā€™s performance but uses only 40% of the computing power. The model will also be updated with the latest news via real-time web search, which isnā€™t currently possible with Anthropicā€™s Claude models.

Iā€™ve used the older versions of their Pi assistant and itā€™s a completely different experience to what you get with ChatGPT.

When youā€™re chatting to ChatGPT, youā€™re constantly reminded that itā€™s a bot from the language and phrasing it uses. But Pi isnā€™t like this, it genuinely does feel like a real conversation most of the time.

If you havenā€™t tried it before, Iā€™d highly recommend you give it a go. Iā€™ve used it in the past to get career advice and explore different options, which just canā€™t be done using ChatGPT.



Image title - Byte Sized Extras

šŸ’¼ OpenAI says Musk wanted to merge with Tesla and take control

šŸ—‘ļø Amazon teams with a recycling robot firm to track package waste

šŸ’µ Former Twitter CEO is suing Elon Musk for unpaid severance

šŸ“” AWS and Google now let you transfer data to other cloud providers for free

šŸ” US sanctions founder of spyware company for targeting Americans

Image of Loop character with a cardboard box
Image title - Startup Spotlight
Image - Athina dashboard

Athina AI

This startup focuses on helping you evaluate and monitor your LLM application as itā€™s being used in production.

Theyā€™re able to give you insights into how the app is performing, what questions your users are asking it, whether the model responded accurately, the response time, and cost reports outlining the changes over time.

Trying to understand how users are interacting with your LLM can be quite difficult and the major cloud providers donā€™t have great answers.

If youā€™re on AWS, they recommend you create your own custom CloudWatch logs. Azure recommends something similar, but they are working on a tool called Prompt Flow to tackle this problem.

I met with one of Athinaā€™s co-founders last year and their dashboard looks impressive. They can even evaluate different LLMs for your task, helping you to select the best one for your project.



This Weekā€™s Art

Image - SXSW at Austin, Texas

Loop via Midjourney V6



Image title - End note

This week was dominated by the release of Claude 3, but there are some other stories to take note of - such as Inflectionā€™s work on v2.5.

Itā€™s clear that OpenAIā€™s rivals are closing in on GPT-4, which is now a year old, and they could start taking away some of its customers.

Weā€™ve covered a lot this week, including:

  • Russian hackers breaching Microsoftā€™s systems and stealing source code

  • Chinaā€™s secretive memo to remove American software

  • Sam Altman rejoining OpenAIā€™s board

  • Why we shouldnā€™t use GPT to review job applications

  • Adobeā€™s new app for creating social media content

  • Anthropicā€™s new model that beats GPT-4

  • Inflectionā€™s new conversational model which almost matches GPT-4

  • And how Athina AI can help you monitor your LLM app in production

Have a good week!

Liam

Image of Loop character waving goodbye

Share with Others

If you found something interesting in this weekā€™s edition, feel free to share this newsletter with your colleagues.

About the Author

Liam McCormick is a Senior Software Engineer and works within Kainos' Innovation team. He identifies business value in emerging technologies, implements them, and then shares these insights with others.