• Loop
  • Posts
  • šŸ¤–Ā Microsoft reveals new work on AI agents

šŸ¤– Microsoft reveals new work on AI agents

Plus more DeepMindā€™s robotics research, Stanfordā€™s robot that can clean-up, and Appleā€™s Vision Pro.

Image - Loop relaxing in space

Welcome to this edition of Loop!

To kick off your week, weā€™ve rounded-up the most important technology and AI updates that you should know about.

ā€ā€ā€Ž ā€Ž HIGHLIGHTS ā€ā€ā€Ž ā€Ž

  • Stanfordā€™s work on creating a household robot

  • OpenAIā€™s app store for custom GPTs

  • Googleā€™s new generative AI models for retailers

  • ā€¦ and much more

Let's jump in!

Image of Loop character reading a newspaper
Image title - Top Stories

1. Apple Vision Pro headset will be released next month

This marks Apple's first major product launch since the Apple Watch in 2014. After unveiling the new headset last June, it will launch on February 2nd - with pre-orders opening on January 19th.

At launch, it will support over a million iOS and iPadOS apps - along with apps that have been specially designed to take advantage of the headsetā€™s features. At $3,499 it wonā€™t be accessible for most people, but Apple hopes to demonstrate how spatial computing could change the way we interact with technology.

Itā€™s a significant move by the company, who have been rumoured to be developing an AR/VR headset for several years. Those who tried it at the preview event have praised the device for its visuals and superior user experience.

To control the headset, you simply look at an element and then tap your fingers to click on it - no additional joysticks are needed. But there are plenty of scenarios where you might need more fine-grained control, such as gaming or tasks at work. To accommodate these, the headset supports game controllers, keyboards, and trackpads - which opens up a lot of different use cases.

Image divider - Loop

2. DeepMind is researching how to train robots with LLMs

AutoRT allows DeepMind to deploy robots and then instruct them to gather training data in new environments. Essentially, the system acts as a manager for the robot and tells it what to do.

They achieved this using a Large Language Model (LLM), a Visual Language Model (VLM) and a Robot Transformer (RT). The VLM takes in video footage and tries to understand what the environment looks like for the robot. The LLM then gives it a task to do, with the RT finally controlling the robot and completing the task.

That was a lot of acronyms, but essentially this project looked at how we could use large AI models to control robots and train them for new tasks. The team at DeepMind has also been researching how we can safely restrict robots and prevent them from causing harm to humans.

This is a key area for all the major tech companies. Google, Meta, and Microsoft have been researching new ways we can train robots for several years now - and a lot of their research has revolved around using Generative AI.

But theyā€™re not the only players here, Tesla and Agility are developing their own humanoid robots. Just recently, a company called 1X, which is backed by OpenAI, has raised $100 million for their own research into this area. Itā€™s likely weā€™ll see a lot more announcements around this in the coming year.

Image divider - Loop

3. OpenAI release their app store for custom GPTs

OpenAI has launched the GPT Store, which allows users to access versions of ChatGPT that have been lightly customised by other people. Itā€™s only available to subscribers on their premium plan, with apps in categories such as writing, research, programming, and education.

You donā€™t actually need any programming experience to make your own GPTs - simply type in what you want the GPT to do and then upload any files it should use as a knowledge base. All of the GPTs are free to use, but OpenAI plans to pay developers via a revenue sharing program.

This will be based on user engagement, which raises a few question marks as to how this will impact future development for the OpenAI app store. Weā€™ve already seen what has happened with social media platforms, as they relentlessly prioritised engagement over everything else. OpenAI wonā€™t want to become fixated on the same issue and make the same mistake.

Image divider - Loop

4. Walmart hopes that AI can do your shopping for you

The company is exploring a new AI feature, which would learn from your individual shopping habits and the broader trends of other shoppers who are similar to you.

It would then anticipate your needs and automatically create a grocery list - without you having to lift a finger. At the moment, shoppers need to manually look through and select items.

While the idea is still in development, it signals how Walmart aims to broaden their offering. Theyā€™re certainly one of the most innovation-focused retailers and regularly explore the art of the possible. In the last few days, Walmart have announced that theyā€™re expanding their drone delivery system to 1.8 million more homes.

Image divider - Loop

5. Google launches new generative AI tools for retailers

Google is making a push into the retail sector with its newly unveiled GenAI products, which aim to further personalise the online shopping experience and help retailers to produce more marketing content.

One of these tools is Conversational Commerce, which allows retailers to integrate GenAI-powered agents into their websites and apps. These agents will use a chat interface to speak with shoppers and offer product suggestions. If the bot needs to be customised further, the retailer can do this by providing their own data.

Additionally, retailers can use their Catalog and Content Enrichment tool to automatically generate product descriptions, metadata, categorise the item, and even produce new product images.



Image title - Announcement

Stanfordā€™s work on household robots

Gif - robot cooking and calling the elevator

Several researchers from Stanford have released their latest work on Mobile Aloha, with the aim of creating a general purpose robot.

It is able to autonomously complete several tasks - such as cleaning up after a drink was spilled, pressing the button to call an elevator and then entering it, open cabinets and placing items inside, or even pushing chairs forward.

Just 50 videos have been used to train the robot on those tasks, which shows how quickly it was able to gain these new skills. The robotic arms are attached to a stand, which can be moved around the room, and a computer is placed on top to control the robot.

Itā€™s worth noting that there are several demo videos of Mobile Aloha. Some of these showcase the autonomous features, which you can see above.

Although some people online have been confused by the other demos, which feature teleoperating and a human involved in actually moving the robot. It seems these teleoperating demos were included to demonstrate what other use cases could be possible with the robot.

Despite some initial confusion around the demos, itā€™s a fascinating project and really gives a good insight into some of the research work being done by the top universities.



Image title - Closer Look

Microsoft releases a new framework for AI agents

Image - agent conversation

TaskWeaver is an open-source framework from Microsoft, which allows you to create your own agents that specialise in data analysis.

The project is similar to Microsoftā€™s previous work with AutoGen, which also allows you to create AI agents. However, TaskWeaver is much more focused on analysing data and complex information.

An AI agent is essentially a bot that aims to figure out all the steps needed to complete a task. Then, it will try to carry out the task and implement any code needed. These agents can be completely autonomous, or you can be more involved and guide them on how to complete the task.

Let's say that you're a business analyst like the person in the scenario above. You need to analyse some data about your companyā€™s sales, which is stored in a SQL database. With TaskWeaver, you can simply ask the assistant to identify any anomalies in the data.

The AI agent will then fetch the data you asked for, check for the anomalies you specified, and visualise the results ā€“ and this is all done through a chatbot interface.

To achieve this, TaskWeaver has 3 agents - a Planner, Code Generator, and Code Executor. These agents will work together to understand your request, break it down into smaller tasks, generate the necessary code, and then execute it.

Itā€™s an interesting tool and something that should be useful for a lot of different roles - from data and business analysts, to support staff that analyse customer data, to marketing teams that use data to improve their advertising campaigns.



Image title - Byte Sized Extras

šŸ“° OpenAI fights back against the New York Timesā€™ copyright lawsuit

šŸš• Hyundai says its electric air taxi business will take flight in 2028

šŸ›’ Walmart debuts generative AI search feature at CES

āœ‚ Duolingo cuts 10% of its contractor workforce, as the company instead looks to AI

šŸ“„ OpenAI changes policy to allow military applications

āœˆļø Shield AI raises another $300M, scaling valuation to $2.8B

šŸš— Volkswagen is bringing ChatGPT into its cars and SUVs

šŸ¤– DeepMind says images that can trick computers, can also trick humans

Image of Loop character with a cardboard box
Image title - Startup Spotlight
Gif - Luma results for Moose plush

Source: @alexcarliera on X

Luma AI

This is a Generative AI startup thatā€™s got quite a bit of attention recently. They allow users to quickly create 3D objects, which is simply done through text prompts.

While text-to-3D isnā€™t a huge area right now, that will change with the launch of Appleā€™s Vision Pro this year. Weā€™re going to see a huge growth in the number of apps and websites that support 3D experiences, as more people adopt what Apple likes to call ā€œspatial computingā€.

Luma AI have recently raised $43 million in funding, as they seek to double their teamā€™s headcount in the coming year. Theyā€™ve just released their new 3D generation model, called Genie, which can generate an object in under 10 seconds - and the results look pretty promising.

If you want to try it out, you can check out their post below.



This Weekā€™s Art

Image title - AI Art

Loop via Midjourney V6



Image title - End note

While OpenAIā€™s release of the GPT store has captured a lot of headlines, there have been plenty of other stories recently that are worth taking note of.

Weā€™ve covered a lot this week:

  • Appleā€™s plans to release the Vision Pro next month

  • DeepMindā€™s robotics & generative AI research

  • OpenAIā€™s app store for custom GPTs

  • How Walmart hopes that AI can do your shopping for you

  • Googleā€™s GenAI tools for retailers

  • Stanfordā€™s work on household robots

  • Microsoftā€™s new framework on AI agents

  • And Lumaā€™s ambitions for text-to-3D models

Weā€™re increasingly seeing how the big tech companies are using LLMs to manage robots and then assign them different tasks. DeepMindā€™s research is particularly exciting, but they have been involved in this area for some time now - itā€™ll be interesting to see how they actually start to apply this going forward.

Have a good week!

Liam



Feedback

Image of Loop character waving goodbye

Share with Others

If you found something interesting in this weekā€™s edition, feel free to share this newsletter with your colleagues.

About the Author

Liam McCormick is a Senior Software Engineer and works within Kainos' Innovation team. He identifies business value in emerging technologies, implements them, and then shares these insights with others.