Revolut Tackles 2M Tickets with Rasa, Uncovers AI Pitfalls

Revolut manages over 40 million accounts. Users frequently face issues like blocked transactions and lost cards. In 2024, the British fintech implemented chatbots using Rasa Open Source to handle 65% of their primary inquiries. Two years later, these bots process over 2 million tickets monthly. However, the surprising element isn't the volume of queries resolved automatically, but how often the chatbot must be "recalibrated" due to incorrect patterns learned from frustrated users.

a computer chip with the letter a on top of it Photo: Igor Omilaev on Unsplash

Most articles about chatbots highlight metrics like response time. This tutorial reveals the architecture Revolut designed with Rasa, addresses rarely documented operational issues, and explains how to set up a similar system without incurring technical debt. Interestingly, an NLU model trained with millions of real conversations also absorbs toxicity and ambiguities from users who hate talking to bots.

Why Revolut Chose Rasa Over Dialogflow or Lex

In 2023, Revolut considered three platforms: Google Dialogflow CX, Amazon Lex v2, and Rasa Open Source. Dialogflow offered native integrations with GCP and built-in sentiment analysis. Lex allowed for instant deployment on AWS with auto-scaling. However, Rasa, though lacking these features, provided complete control over the NLU pipeline—avoiding vendor lock-in risks.

Revolut handles sensitive financial information across 38 countries. Each conversation could involve account numbers or legal details. Sending this data to external APIs from Google or Amazon posed a regulatory risk the team didn't want to take. With Rasa, models can be trained locally and infrastructure hosted on their own servers, ensuring total auditability of the bot without relying on third-party black boxes.

Customization of the NLU pipeline was another crucial reason. Revolut needed to recognize intents in multiple languages and understand specific financial jargon. Dialogflow required training separate intents for each language. With Rasa, it's possible to share features using multilingual transformers, reducing training time for new languages by 40%.

The Full Architecture: From Rasa Open Source to Production on Kubernetes

A close up of a computer circuit board Photo: Luke Jones on Unsplash

Revolut's chatbot stack includes:

Rasa Open Source 3.6 for NLU and dialogue management.
PostgreSQL 15 as a tracker store for conversation storage.
Redis 7 for synchronization between replicas.
RabbitMQ 3.12 for asynchronous dialogue events.
Kubernetes 1.28 for orchestration and autoscaling.
Sentry and Grafana + Prometheus for monitoring and metrics.

The basic flow starts like this: a user contacts via the mobile app, the message reaches an internal API Gateway and is routed to a Rasa pod on Kubernetes. Rasa identifies intents and entities, queries the tracker store in Postgres, executes dialogue policies, and sends the response back to the client, while also generating an event to RabbitMQ for analysis.

Tracker Store in Postgres: each conversation has a unique conversation_id. Rasa stores events like user_uttered, resulting in a large volume of monthly events. For quick queries, the team uses date partitioning in Postgres and archives monthly in Google Cloud Storage.

Reactive Autoscaling: the system fine-tuned by Revolut allows Kubernetes to increase the number of pods during traffic spikes, like Monday mornings. This mechanism lets the cluster scale from 12 to 60 pods in under 3 minutes.

The NLU Training Hell: When the Bot Learns from Angry Users

Rasa's NLU model is trained with real examples. Initially, Revolut used a manually curated dataset. But, the bot crashed in production. The reality is that users don't write formally. It's more common to see "WTF someone stole my money fix this NOW".

Three months in, it was discovered that 18% of messages had a low confidence score. The solution was to implement supervised active learning. When confidence is low, the bot escalates to a human agent and stores the message for review. These examples are then added to the retraining dataset.

The Data Contamination Problem: in 2025, the bot misinterpreted neutral messages as urgent. An analysis revealed that "help" was mistakenly associated with urgency. The dataset was rebalanced and explicit negative examples added.

Aggressive users received quicker attention. This led to adjusting dialogue policies so that sentiment didn't directly prioritize queries.

Dialogue Policies: When Rules Aren't Enough

Rasa uses rules (deterministic) and machine learning policies (predictive). Rules are useful, but can fail when the user changes topics. For these cases, Revolut employs TED Policy, a transformer-based model that manages context shifts.

A recent implementation is the context change meta-policy. If the user mentions a different topic, the bot asks if they wish to pause the current inquiry. This simple question reduced escalations due to frustration by 22%.

Custom Actions: When the Bot Needs External APIs, Not Just Pre-made Responses

Rasa executes custom actions to interact with external systems. At Revolut, this includes verifying transactions, freezing cards, and conducting KYC checks. Each action is an independent service, handled via HTTP.

The challenge is error handling. If an external API fails, the bot automatically escalates the user to a human agent. Implementing a circuit breaker efficiently manages these errors.

Metrics That Matter (and Those That Mislead)

Revolut measures chatbot success with:

Deflection rate: in 2026, it's at 68%. But beware, a "completed" conversation doesn't always mean satisfaction. It might simply mean the user gave up.

Adjusted Containment Rate by Satisfaction: only counts as successful if the user responds "yes" to a post-conversation satisfaction question. Thus, the real containment is 54%.

Average Confidence Score: is 0.78. However, the 10th percentile is 0.52, where the bot guesses. This 10% represents 34% of escalations.

Turn Count per Conversation: effective conversations average 6 turns. If they exceed 10 turns without resolution, the escalation probability rises to 76%.

End-to-End Latency: P50 is 420ms, but the P99 is problematic, especially when Rasa queries multiple external APIs.

Operational Lessons You Won't Find in Rasa Documentation

1. The model needs continuous retraining: Revolut retrains its NLU model weekly with new examples. Without retraining, accuracy drops rapidly.

2. Slots are your nervous system: they store conversation context. Strict validation is crucial to avoid issues.

3. Rules are fragile: more than 200 rules and the system becomes unmanageable. Revolut migrated 40% of its rules to patterns managed by TED Policy.

4. The action server is your single point of failure: Revolut runs multiple replicas with load balancing and health checks to avoid downtime.

5. Human escalation must be elegant: the agent receives a complete conversation summary, avoiding starting from scratch.

Conclusion: Conversational AI is Ops-Intensive, Not Just NLU

Implementing Rasa at scale isn't just about training an NLU model. It's about creating a solid architecture that supports millions of conversations, building continuous retraining pipelines, and debugging unexpected dialogue policies. Revolut processes 2 million tickets monthly with this system, supported by a team dedicated to its maintenance. If you're considering building a chatbot with Rasa, ask yourself: do you have the team to maintain it? Or do you just want it to work and then forget it?

Editorial note: This article was generated with AI assistance and reviewed by the NewsTide editorial team to ensure accuracy and relevance. Read our editorial policy.