Our SaaS and Software expert, Vijay Raina, is a specialist in enterprise SaaS technology and tools. He also provides thought-leadership in software design and architecture. Today, he unpacks the intricate world of AutoML (Automated Machine Learning) and the challenges of deploying real-time AI. We’ll explore how past industry struggles have shaped new, user-friendly platforms, the nuances of competing with tech giants, and the practical steps businesses can take to harness machine learning without a dedicated data science team. The conversation will also touch on the critical, yet often overlooked, aspect of maintaining a model’s health post-deployment and validating its performance before it impacts live operations.
Deploying real-time machine learning can be a complex process. Could you share a specific anecdote from your ad-tech background that illustrates this struggle and explain how that experience directly shaped Datomize’s end-to-end automation, from data enrichment to deployment?
Absolutely. In the ad-tech world, everything is about real-time. I remember the immense pressure at Taptica when we were trying to implement our machine learning algorithms. It felt like we were building a custom car engine from scratch for every single race. The process was just so arduous; you’d spend weeks cleaning data, then more weeks feature-engineering, then an eternity trying to find the right algorithm and tune it. Deploying it was another mountain to climb, and by the time you got it live, the data landscape had already shifted. That constant, exhausting cycle is the very pain Datomize was born from. The goal became to eliminate that entire struggle, to build a platform that handles everything from the moment you connect your data, all the way to a live, updating model, with just a single line of code.
Your platform is designed for users without a data science background. Can you walk us through the user journey, from connecting a data lake to deploying a real-time model, and highlight the key decisions the platform automates versus what the user still controls via the dashboard?
The journey is designed to be incredibly intuitive. A user, let’s say a marketing manager at an e-commerce company, starts by simply connecting their data source, like a customer data lake. From that point on, the automation kicks in with a vengeance. Our system automatically sifts through the noise, cleaning the data and running a smart feature selection process to identify the most influential parameters for their business goal. The user doesn’t have to worry about the technical weeds. What they do control is the ‘what’ and the ‘why.’ Through the dashboard, they define their business objective—say, increasing cart size—and can visualize the experiments as the platform tests different algorithms. They watch the performance metrics, see the model training, and can even get explanations for the model’s predictions in plain English, all without ever writing a line of code.
The AutoML market includes major players like Google and Microsoft. How does your platform’s proprietary hyperparameter-tuning and smart feature selection deliver superior accuracy and run-time, and what does it mean in practice to cover a “longer distance of automation” than competitors?
It’s a crowded space, but the differentiation is in the depth and breadth of the automation. When we talk about proprietary hyperparameter-tuning, we’re talking about an intelligent system that doesn’t just blindly test combinations but learns from each experiment to zero in on the optimal settings much faster. This directly translates to better run-time and, as we’ve benchmarked, often superior accuracy compared to the big players. Covering a “longer distance of automation” means we don’t just stop at giving you a trained model. We automate the data prep, the feature selection, the algorithm bake-off, the deployment, and critically, the ongoing model health management. Many platforms handle parts of this pipeline, but our vision is to streamline the entire journey, from raw data to a real-time, self-updating business solution, making it a truly hands-off experience.
You used the platform to model the spread of COVID-19 with high accuracy. Could you detail the steps involved in that project? Please explain how the system processed the data and selected the best algorithm to produce a model that proved so effective in its predictions.
That was a powerful proof-of-concept for us. We fed the system publicly available datasets on the spread in the New York area. The first thing the platform did was automatically clean and enrich this raw data, identifying the key influential parameters—things like population density, mobility data, and initial case numbers. From there, it recognized this as a time-series analysis problem. Our AutoML engine then ran a competition between various suitable algorithms, from classic regression models to more complex ones, each being tuned on the fly for the highest predictive accuracy. The system ultimately selected and built the model that showed the highest performance, which was then trained and validated. The incredible part was seeing its predictions for the coming weeks materialize with such accuracy, proving the platform could handle a complex, real-world scenario entirely on its own.
Managing a model’s health after deployment is a critical challenge. Could you explain how a user would leverage your dashboard to define rules for model updates or automatically restore a model to its last healthy state, and what kind of notifications they would receive?
This is one of the most vital, yet overlooked, parts of the machine learning lifecycle. A model isn’t static; its performance can degrade as the world changes. On our dashboard, a user can very simply set rules for this. For example, they could define a rule that says, “If the model’s prediction accuracy drops by 10% for two consecutive days, automatically retrain a new state using the latest data.” They can also set a rule for what happens in an unhealthy state, such as automatically reverting to the last known healthy version of the model to prevent business disruption. The entire process is transparent. The user would receive a notification, perhaps via email or a dashboard alert, saying, “Model performance degraded. A new version has been deployed,” or “Unhealthy state detected. Model restored to its previous version.” It’s about giving them control without forcing them to be on-call 24/7.
Your platform is described as industry-agnostic, with use cases from e-commerce recommendations to fraud detection. Please provide a step-by-step example of how a business could use your sandbox environment to validate a new model offline before taking it live for a real-world application.
Let’s imagine a financial services company wants to build a new fraud detection model. First, they would connect their historical transaction data to the platform and define their goal: identifying fraudulent transactions. The platform would then run its automated process to build and train the most accurate model. Before deploying this model to flag real-time transactions and potentially block a customer’s card, they would use the sandbox. Inside this offline environment, they can feed the model a separate set of historical data it has never seen before. The dashboard would show them the model’s predictions on this test data, allowing them to see its accuracy, how many fraudulent cases it caught, and, just as importantly, if it incorrectly flagged any legitimate transactions. They can validate its performance and fine-tune their rules in this safe space, ensuring they are completely confident before flipping the switch to go live.
What is your forecast for the AutoML industry?
I believe the AutoML industry is at an inflection point. Right now, we’re seeing tremendous growth, with revenues hitting $269 million in 2019, but that’s just the beginning. The future isn’t just about making machine learning easier for data scientists; it’s about making it invisible for everyone else. I forecast that AutoML will become a standard, embedded feature within all major business software, from CRMs to ERPs. The focus will shift from “building models” to “solving business problems,” where the AI is a seamless, intelligent layer working in the background. The companies that succeed won’t just be the ones with the most accurate algorithms, but those that provide the most complete, end-to-end automation, truly democratizing the power of real-time machine learning for every business, regardless of size or technical expertise.
