Developing future risk management capabilities using statistical models
Hi Arne, please tell us something about the project you are working on.
As visitors to our company website may have seen, we offer various services to our customers, such as monthly invoice or Afterpay. In these services, we try to reduce the risk for us, the merchants and their customers using statistical models based on vast amounts of data on past transactions.
I would like to illustrate this with a specific example. Let's take our payment service Afterpay. AP is a payment method that allows online stores, for example, to allow their customers to buy now and pay later. The merchants then get paid right away, while Arvato takes on the risk of the customer not paying.
Unfortunately, if you run an online store, you are dealing with customers who are not willing or able to pay. Also, there are some scammers out there who try to cheat you. For the merchant, using Afterpay as a payment solution solves this problem by forwarding the risk to us. Then our job is to identify the payers and non-payers by analyzing the data available to us so we can offer delayed payment only to customers who are likely to pay.
In practice, this has to happen within milliseconds during the checkout process. If we detect that a customer is likely to pay based on the data we have, our 'Afterpay' service appears with other common payment methods. If our product detects that the customer is unlikely to pay, our service does not appear as a payment method and the person has the option to pay using alternative payment methods such as a debit card instead.
In order for the user experience to be as smooth as possible, we build statistical models based on previous transactions so that the decision-process is invisible to the customer.
Can you give us an example of the data you process?
Sure. To find out what kind of person is currently in the buying process with our customer, we analyze the data we have on previous transactions. For example, it could be the composition of the shopping cart or the customer's purchase history that makes the statistical model recognize a customer as a low-risk customer. It could also be the information about which the device is being used, previous transactions and so on. Based on how these characteristics have played out for similar customers in the past, we can create statistical models that recognize payers and non-payers.
By showing the algorithms the patterns in the data, we can automatically see how likely a customer is to pay or not.
However, we are still in the early phases of our machine learning efforts, so for the time being, we also have a complex system of rules and filters to recognize the non-payers. Human intelligence, in other words.
What stage of the process are you currently in and what does the success of the project depend on?
Right now we are in the development phase, building the infrastructure to iterate fast and make all the data available to the machine learning models, both for training, inference and monitoring. The next phase, which we will soon enter, is the improvement phase, which will take up most of the project's time. We have vast amounts of data, many merchants who offer our payment solutions to their customers, and several countries in which we operate, so over time, we have many ways to improve the usefulness of our machine learning models.
The key factor of our success is having the right people on board to work on the data and build the infrastructure, programs and models we need. Finding people who really know how to work with data is not easy! That said, for people who know how to create gold from data, our data has what it takes to become gold! We have lots of historic data on transactions, we have decent data quality and a willingness within the organization to put the data to use!
What makes this project so special and how does it work?
It is the complexity of this business case that makes the project so fantastic and challenging. We work alongside the business side and try to be forward-thinking in terms of the tools we use. The infrastructure is based on Azure for this project and our main programming language is Python. We also use Spark through Databricks, Delta-lake, SQL, MongoDB and other tools at the moment. Currently, we are in the process of figuring out which tools are best suited to solve each problem, and that means spending a lot of time understanding which problems we need to solve. And that makes this project very interesting.
For example, we haven't yet decided which tools to use for monitoring, and may need to develop our own tools for testing and deployment. We have some very interesting and challenging times ahead, if you ask me!