FIS Toronto 2024

Enhanced tech capabilities makes reinforcement learning viable

L-R: David Bell and John Hull

Computing power has advanced to the point that the once-impractical process of reinforcement learning is now a viable tool for asset owners, the Fiduciary Investors Symposium has heard. 

Reinforcement learning trains software to make decisions by mimicking trial and error and is used in investment decision making to generate the best potential result. 

John Hull, Maple Financial chair in derivatives and risk management at the Joseph L. Rotman School of Management, told the symposium that reinforcement learning has several advantages and outperforms simpler modelling approaches. 

“It gives you the freedom to choose your objective function – it’s a danger with some of the simpler hedging strategies and so on that you’re just assuming good outcomes are as bad as bad outcomes,” he said. 

“You can choose your time horizon, tests indicate that it’s robust… and gives good results during stress periods and there’s a big saving in transaction costs. Why are we talking about it now? Well, because computers are now fast enough to make it a viable tool.” 

Hull said reinforcement learning techniques can reduce transaction costs by as much as 25 per cent compared with traditional hedging approaches. 

“It’s a way of generating a strategy for taking decisions in a changing environment – you’re not just taking one decision, but a sequence of decisions,” he said. 

“Perhaps you’re taking a decision today and then you take another decision tomorrow, and so on. Let’s suppose you’re interested in a strategy for investing in a certain stock and say what’s a good strategy for this stock – I think it’s going to work out okay, but it may not. What strategy should I use over the next three months. What do you do?” 

Hull said normally a stochastic process – which assesses different outcomes based on changing variables – would be used to assess a stock. 

“It’s uncertain how the stock price is going to evolve and you might use a mathematical stochastic process, you might use a historical data on the stock price behaviour, something like that. You have some model for how the stock price behaves,” Hull said. 

“Then your problem is defined by what we call states/actions/rewards.” 

Hull said the aim is quite simply to decide what action should be taken in each possible state to maximise the expected reward.  

“You’d say okay, we don’t know how this stock price is going to evolve but it will evolve in some way, and so there will be certain states we find ourselves in. We should take a certain action, and that’s what we’re trying determine, and there will be a certain reward,” Hull said.  

“In other words, you’ll make a profit or a loss. The way I think about it, it’s just sophisticated trial and error.” 

This means by starting off with having “no idea at all” about what a good action to take is and to then try different hypothetical outcomes. 

“It works well or it doesn’t work well, then you try a different action and so on and then eventually you come up with what seems to be the best action to take when a particular state is encountered,” Hull said. 

Hull said reinforcement learning traditionally is computationally expensive, takes a lot of computation time and is “data hungry”, but that’s not the case these days. 

“But fortunately, the other thing that’s happened that makes this a viable tool… is that we can now generate unlimited amounts of synthetic data that’s indistinguishable from historical data,” he said. 

“You collect some historical data… maybe a couple of thousand items of historical data [and] you can generate as much synthetic data as you want to that is indistinguishable from that historical data.” 

Hull said that while his experience has mostly been in applying reinforcement learning to the hedging of derivatives, he noted there’s many other areas where it can also be applied. 

“Because really it can be applied in any situation where the goal is to develop a strategy for achieving a particular objective in changing market,” he said. 

“There’s something out there that’s going to change in a way you don’t know, and you have to model that.” 

Financial Innovation Hub, or FinHub for short, carried out the research that Hull prestned to the symposium. 

Hull said one of the distinctive features of FinHub is that it’s not just academics within the Rotman School of Management that work on its projects, but also practitioners and the university’s engineering faculty. 

Reinforcement learning is just one of the projects FinHub has been working on, with Hull explaining the centre has also been doing work on natural language processing, amongst other initiatives. 

“We’ve worked with the Bank of Canada on monetary policy uncertainty,” Hull said. 

“We’ve done work on modelling volatility services and using natural language processing to forecast different market variables.” 

Join the discussion