Designing for Multimodal Interaction for AI native products
Why it’s more important than ever before
In this blog, I want to share what multimodal interactions are and why they’re important when designing AI-powered apps..
What is multimodal interaction design?
To start with, let’s understand what multimodal interaction means. It refers to designing user interfaces that take into account multiple modes of communication between humans and computers. For example, interacting with ChatGPT using both voice and text. Other modes of interaction include haptics, audio, gaze, and gestures—all of which create a richer, more natural user experience.
This type of interaction extends across all mediums—from mobile apps to spatial experiences. If you look at AI apps today, you’re not just chatting with them. You might speak, share photos, or ask follow-up questions. These capabilities allow users to interact with AI through the communication method of their choice, without feeling restricted to just one. For example, if you’re trying to explain a complex situation but don’t feel like typing it all out, you can simply speak to the AI instead. Modern AI models are even starting to detect emotions, which helps them better understand context and provide more thoughtful, relevant responses.
Image credit : Apple
Why is this important now?
We can already see this shift in the products we use every day—from note-taking apps to search tools—we’re rapidly adapting to AI. It’s projected that by 2025, the number of AI users will reach around 378 million, with approximately 64 million new users adopting AI tools in that year alone (reference). That kind of growth means we need to meet users where they are and design experiences that make interacting with AI more natural, intuitive, and accessible for everyone.
How can designers explore multimodal interactions?
I’m currently working on a side project—an AI mentor iOS app—where I’m exploring how users interact with AI, especially through voice and chat. One thing I’ve noticed is that our current design tools often fall short when it comes to supporting these kinds of interaction design patterns. To bridge that gap, I’ve been combining tools like Figma, Play, and SwiftUI. Figma helps me shape the visual design and overall vibe, while Play allows for advanced prototyping—even integrating OpenAI’s capabilities and tapping into Apple’s native toolkits. For deeper explorations and shipping actual features, SwiftUI has been incredibly powerful for designers. I’ve also looked into tools like Bolt for cross-platform prototyping.
When designing these experiences, considering usability principles becomes even more important. Things like feedback, affordance, and intuitive flows help make AI-native products feel seamless and accessible—so users aren’t just impressed, but also empowered.
How can you apply this in the apps you're designing?
We definitely don’t need to implement features just for the sake of using AI—or force it on users like many apps do today without offering real value. Instead, ask: How are users currently interacting with my product? Could multimodal interaction make those experiences more accessible or intuitive? What types of interactions are my users already comfortable with?
By asking these kinds of questions and thoughtfully prototyping your solutions, you can apply multimodal interaction in a way that actually enhances the user experience—making your app not just smart, but truly helpful.
Thank you for reading.
Resources
If you’re a startup founder and want to chat about your AI product, feel free to reach out: john@john-rodrigues.com
If you’re a designer and want to learn from my work, check out my Gumroad library