The days when Machine Learning (ML) models were confined to the Data Scientist’s laptop are over. Today, the true potential of ML is realised when models are deployed and integrated into the company’s processes. Multidisciplinary teams are needed to achieve this level of integration. As well as Data Scientists these teams include Machine Learning Engineers, Backend Developers, UX Researchers, Product Managers etc.
In this blog post, we’ll deep dive into the multidisciplinary team working on all the Machine Learning models behind Infojobs, one of Adevinta’s marketplaces in Spain. Get ready to discover how our team is structured, and how that structure has allowed ML models to escape from the Data Scientists’ laptops to deliver true business and product impact.
Where are we coming from?
Some years ago, it was usual that ML projects were implemented by a standalone Data Science or Data team, acting as an agency to other product or business teams. Data Scientists would work in isolation, receiving a list of requirements from the business, and implement a solution that would run on their computer (or a company R or Python server if you were lucky).
It is hard to believe now, but we are talking about a time when GIT wasn’t used by Data Scientists, code could get lost and wouldn’t be reproducible, and unit or integration tests were complete unknowns. This had many risks in terms of consistency, reproducibility and correctness. But this also meant the potential impact of the Data Science work was limited to offline models that would be used for retrospective insights, analysis or very specific applications.
How we got to where we are now
The ML field has evolved a lot over the last few years. This has meant not only the arrival of crazy new models like the ones from Generative AI, but also improvements in ways of working as the technology reaches some level of maturity. This maturity translates into teams having taken many of the software development best practices (git, testing etc.), but also the emergence of MLOPs: the field focused on taking models to production, and maintaining and monitoring them.
All this has led to:
- Greater robustness, correctness and reliability of ML models, code and processes.
- The ability to leverage ML models from new perspectives, by integrating models and predictive outputs into the company processes.
- The need to extend teams to enable complete end-to-end development and exploitation of the ML models.
Where we are now
The evolution of Machine Learning, both from an industry perspective and a company perspective, has influenced the current setup of our ML team in InfoJobs. Here’s what the team looks like:
Our Data Scientists:
- Lay the foundation by designing and training the ML models.
- Are usually close to product and business teams because they need to translate product and business needs into a predictive model solution.
- Usually hold Mathematics or Physics degrees. They have then learned software development best practices and integrated them into their day-to-day work.
Our Machine Learning Engineers:
- Are responsible for translating ML models into production-ready code, deployment of the models, maintenance and monitoring of the services in production.
- Are a relatively new role at InfoJobs and the industry in general. This means it is sometimes hard to set the right scope for the role, and frequently new, complex challenges appear.
- Usually come from a Software Development background, but they have had to expand their knowledge into the deployment of ML models including optimisation, serving, monitoring and more.
Our Backend Developers:
- Are responsible for acting as the bridge between the MLE deployment of models and the Backend platform of the company.
- Perform a similar role to Backend developers from traditional product teams, but they have had to gain some sensibility around ML.
As Machine Learning Product Manager I am responsible for:
- Bridging the gap between the technical team and the product and business strategy. I am close to the user pains, work on strategy and roadmaps, and ensure product delivery and a positive impact on our users.
- I don’t know many ML Product Managers, but in my personal experience, I have a background as a Data Scientist before moving into the PM role. This meant expanding my knowledge with Product (discovery, delivery, value, strategy etc.) and gaining experience around the MLE and BE work.
Our ML team also draws on support from other roles, including:
- Data Analysts: who help in understanding data, performing specific analysis, assessing impact from our solutions and setting up dashboards to monitor our products.
- Data Engineers: who help with specific ETLs (Extract, Transform, Load Data) to generate the training datasets for our initiatives and ensure model outputs are exploitable from an analytics perspective.
- UX Researchers: who I like to call “the voice of our users”. They ensure our solutions align with user needs and expectations and help understand the impact of our solutions through qualitative data.
How do these roles work together?
The best way to understand how all these roles fit together is to go through each step of the Machine Learning Lifecycle. This is the typical way a ML project is structured in phases. The set-up will vary depending on the company or the initiative at hand, but let’s look at the six phases of the ML Lifecycle:
- Product / Business understanding: understand the product or business context and define the user problem or pain point this initiative should help solve. This step is led by the Product Manager, in collaboration with the UX Researcher (to bring all the user context needed) and the Data Scientist (to start considering solutions and feasibility).
- Data understanding: understand what data is available for the given initiative. This can include some analysis from the Data Analyst (volume of data, quality, quantify opportunity for the solution etc.), together with the Data Engineer (the point of contact for the data and logic behind the analytics tables).
- Data preparation: produce the dataset that will serve as training data for the ML model. This work usually includes ETLs dealing with a big volume of data and is led by the Data Engineer.
- Modelling: preprocess the data and start training and evaluating different types of models for the task at hand. This step is led by the Data Scientist.
- Model evaluation: traditional model evaluation was done only by the Data Scientist with offline data, assessing performance metrics like accuracy, RMSE etc. If the true goal of the initiative is to positively impact users, these offline assessments are insufficient. Data Analysts are included in case some experimentation or A/B test can be done to assess impact. At the same time, UX Researchers can also obtain qualitative feedback from users.
- Deployment: the final step of the process, is where ML Engineers deal with the deployment of the model itself. Simultaneously, Backend Engineers deal with the integration of deployment and predictions into the Backend platform of the company.
Note: this is a very simplified way to understand the different phases of an ML initiative, as a means to illustrate the interaction of the different roles. In an ideal scenario, teams work with an Agile mindset, which translates into thinking about quick wins and MVPs (Minimum Viable Product) to demonstrate value quickly. I talk about this in another post “When ML meets Product: Less is often more”, in case you want to learn more!
Wrapping it up
The Machine Learning team in InfoJobs is a great example of an ML multidisciplinary team. Each team member brings a unique set of skills and expertise, but all share a clear mission: to develop high-impact, high-complexity ML solutions for our company.
Because of our team setup, the value of ML in our company has multiplied, as the number of use cases and types of user problems that can be solved with true end-to-end ML solutions is huge. From highly-accurate recommendations helping candidates find relevant offers, to normalisations helping assess the matching between a CV and an offer, the InfoJobs platform uses ML solutions successfully across a broad range of tasks.
As the field of Machine Learning continues to evolve, these types of teams will continue gaining more and more importance, as they will help ensure companies can unlock the full potential of ML.