Podcast Episode

Challenging AI Assumptions in Forecasting: Navigating Human Insight and the ML Conundrum

Listen to Episode

Never be out of the loop on what’s going on in finance, tech, and business.

Podcast

Social

Transcript

Joseph Drambarean

All right, welcome to FinTech Corner. I’m really excited about today’s episode. It’s a special edition. We’re kind of in between seasons right now in FinTech Corner. And we’ve decided to do a controversial topic, one of the first ones we’ve ever kind of delved into on this show. And I decided to invite someone who will be perfect for this particular topic. So I’m joined by Yanni, who’s on our data platform engineering team.

And he has a really interesting background and the topic today specifically is forecasting. Dun dun dun. The one topic in financial analysis where there are a lot of opinions and there are a lot of different kind of ways to do it and a lot of hot takes about what is a good forecast. So, Yannis, thank you first of all for joining me for this topic. It’s funny. This is a trial by fire topic. It’s the you chose the hardest one we could possibly do. But I respect it. I respect it.

So I thought maybe to kick things off, tell us a little bit about your background. What have you done in the realm of forecasting and how has that kind of transitioned into your role here at Trovata?

Yannis Katsaros

Yeah, sure. Definitely. So yeah, thanks for having me, especially for this controversial topic. Great to throw me into the fire on this one.

But, my background kind of, I got my undergraduate in petroleum engineering. So I actually started my career in the oil and gas industry and specifically in like reservoir engineering, which, you know, I’m not going to get super into the weeds on that, but it involves petroleum economics. So evaluating what your wells will do performance wise, how much they’re going to produce and of course how much money your business, your company’s gonna make. And so that was kind of my first foray into effectively forecasting in the industry.

And so from there, I kind of transitioned to more of like data analytics engineering and eventually moved into a data science role in HEB, which is a Texas largest grocer and had some interesting experiences there with forecasting and I can definitely get into some of the details there before making my way back into again, energy and commodities trading, where once again, there is a central theme of trying to understand what’s going to happen in the future so that will guide your decision making and before finally ending up here at Trovata in more of a software engineering role.

Joseph Drambarean

So deep in forecasting and it sounds like you have an interesting mix of software programming melded with statistical techniques, you know, to take advantage of the data sets that you’re working with, which is a unique angle to be attacking this problem from, because especially from a finance analyst perspective, typically, you know, you’d probably be working within a manual forecast model of some kind. Maybe it’s an Excel, maybe you’re doing everything by hand. It could still be leveraging.

you know, advanced statistics, but it might not be using automation in the form of a computer program of some kind. Like maybe it’s using scripting of some kind, but not in the way that you would approach the problem space. And what’s interesting, I guess, to kind of kick off this problem, we have delved into forecasting at Trovata for years at this point. It started with an approach that really tried to almost create a black box experience for the end user to try to simplify their approach to leveraging the concept of machine learning when approaching a forecast.

From there though, what we have found is that typically users want to get more insight into what’s going on behind the scenes, especially if the predictor is going to be actually computing values that will be in the forecast, not just a trend, but specific values that will be in the forecast. So we’ve kind of evolved from an experience perspective to be more transparent to the end user or get closer and closer to transparency and give the end user a lot more knobs and controls to be able to modify their values. And I think that’s where the interesting first topic that I wanted to delve into was when is the right time to use the big keyword machine learning in this.

And is there ever a right time? And I’m curious to hear your take on this. Just having had the opportunity to attack this problem from a variety of different vantage points, right? The petroleum side, the big box retail side, all of those different flows of data, you had an opportunity to look at it from different angles. How has machine learning played a role, if at all? And what is your take on it?

Yannis Katsaros

Given my background with starting in petroleum, moving to grocery, which is two totally different industries, specifically with regards to forecasting and trying to understand how you model the future. In the petroleum engineering space, you’re dealing with physical systems. At the end of the day, you’re producing a physical commodity, right? Natural gas, oil, and you have physical models that you learn about how to characterize the reservoir, how much it’s going to produce. And ultimately you use those in your forecast of how much volume you’re going to produce.

Now, there’s a lot of uncertainty and you’re going to hear me talk about uncertainty a lot throughout our discussion, because that’s kind of the key, the key aspect of what I really want to hone in on. So the biggest, you know, the biggest thing that you see is a, as a form of, um, you know, as you, as you call it earlier, maybe a hot take in forecasting is I can sit here and look at historical data points and I have a physical model, I have physics that I can use to describe what I expect to happen into the future.

Yet you and I and 10 other engineers could come up with 10 totally different forecasts that are seemingly valid. They go through all the initial data points. But then when we look at where that forecast goes into the future, we end up with wildly different outcomes. Some are way, way above, some are way below. And ultimately, when you tie that back to your economics,

You end up with totally different decisions. You know, should we pursue this opportunity? Should we drill this well? Should we drill this group of wells? And so going from a space where you have, you really have the foundations of physical models of how to even do your forecasting. You can already tell that there is a huge amount of uncertainty. Um, how do these things that we seemingly know, we know, we understand how to model, how do they evolve over time? And we’re not really sure because there’s a lot of noise of what could possibly happen. Now we take that where we have a physical model and I transitioned into a role in groceries, right? Where there is in essence not really a physical model that describes how consumers are supposed to behave.

It’s largely very empirically observed and driven on, you know, what demographics might drive customers to buy certain types of groceries and what drives particular types of trends in different areas. And it’s really hard. You don’t really have a specific type of, again, a physical driver. There’s no physics involved in how humans really behave or at least that we can capture yet. And so you have this like total, almost like a dichotomy of these two different approaches. And yet at the center of it, you have forecasting. And not only do you have forecasting, but you have the subject matter experts want is the ability to understand why a forecast is forecasting what it is. You know, why, why is the forecast saying you’re going to produce, you know, this much oil or gas? Why is the forecast saying we’re going to sell 30 bags of popcorn on Saturday, but 10 on a Monday?

And again, coming back to all of it, I think the most important part to drill into this is that I’m personally of the belief that you need to use statistics and probability as a tool, as a mechanism for helping you quantify uncertainty. And to me, that’s the most important aspect. It’s not necessarily about just throwing. And in fact, I would urge you not to just throw data into a model and expect that you come up with any kind of reasonable outcomes, because first of all, you’re not going to really know why is this forecast producing this?

And you hinted at this earlier that, you know, at Trovata before my time, we were more of the, you know, black box model approach where throw some data in and see a forecast that could be useful definitely in certain scenarios. But I think most often you have subject matter experts wanting to understand why, why is the model saying this? I don’t, I don’t believe it. I’m an expert in this. I don’t, this doesn’t make sense to me. Why is the model saying that I don’t trust it? And so I personally, my, I shouldn’t even say philosophy, but from what I’ve seen so far in my experience is that I think the sweet spot really exists where you can have subject matter experts really have control and understanding of what their model is. The model might just be Jim in finance is putting together some numbers and he’s the expert and that’s your model, right? That’s a model. Or your model might just be, we put our data into this Trovata thing and it spit out some numbers and we’re not sure why, but that’s our model.

I think the sweet spot, once again, is really where you can have probability and statistics, machine learning, whatever you want to call it, however advanced you want to get as a tool for quantifying your uncertainty, those aspects of like, where could this, where could our predictions really start to diverge? But really have something that you can tie down as the foundational kind of driver behind what the model is estimating, because then you have a better sense of why it’s making predictions.

And I use the example of, you know, Jim in finance, cause you may not necessarily agree with what he’s doing or, but you might just say that, look, he knows the space better than anyone else. He has 30 years of industry experience and he is our machine learning model. Right. And you really want to be able to lean into that and not throw all of his expertise out the window. But you would, in my ideal scenario, augment that and say, okay,

Here’s our base model of what our belief of what is going to happen is. How can we play with this forecast and understand, you know, is there any upside? Is there any downside? What’s the probability that we actually achieve this model? How does this compare to historical data that we’ve seen and observed?

Joseph Drambarean

When I think about this problem, it’s always been on assumptions, variables, kind of backing, or I guess that’s the aperture by which I look at this problem because.

The assumptions that you’re entering into the problem with will very likely drive a lot of the outcome of what you see in terms of the values. Let me give some examples. I don’t know why when I was thinking about this, I got the thought of Photoshop in my mind. And let’s say you start with a photo in Photoshop and you want that photo to look a certain way and you start to apply layers and the first layer, maybe you brighten the photo. The second layer, maybe you denoise the photo.

The third layer, maybe you add some contrast. You think about those layers, each one has a level of sophistication in terms of what it did. The brightening one, very basic. It just took a simple curve, which was the light, and said, go up. Go up in a linear fashion so that it’s brighter. The second layer, which was applying a denoising, that was complex because I have no idea what it did, right? It used some sort of

combination of statistics to look at every single pixel and say, you should be this and you should be this depending on factors. That analogy struck me because when you’re kind of approaching a forecasting problem, there are layers of assumptions that you might be introducing of which the sequence could play a huge role, right? The obvious ones that are in your control, the first initial layers are the controlled assumptions, like for example, you know, Bob or Bill or Jim, you know, in finance tells you, hey, we know from the trends of sales so far this year, we expect growth, right? That’s not a complex predictor, but it’s a high level one that should tell you that the outcome of whatever values I’m gonna look at should be going up. They shouldn’t be going down, right?

Because I have this hard, assumption that has been given to me by a human, right? Let’s say we introduce another layer. Maybe the next layer is a little bit more complex. Maybe we’re going to throw a statistical analysis in the second layer. Maybe we’re going to use a regression analysis or maybe we’re going to apply a stochastic curve to this particular data set. And that has its own assumptions, right?

We know what a stochastic curve typically looks like when you look at a field of data. It will do a certain thing, right? It’ll be a smooth curve, right? It won’t have weirdness to it. And that’s an assumption because the sales could be very lumpy and they could be going all over the place. But now we’re applying this thing, the statistical approach that smooths that out, which means that whatever my outcome data will be has that as an assumption.

And then let’s say you throw one more layer on top, the black box, the, I’m going to take an off the shelf open source library. And this library is good at taking random data and trying to make sense and create a curve out of that random data, maybe random forest, right? And then I think when you think about that, that kind of layer set.

I had a couple of obvious ones, I had a statistical one, and then I had one that I have no idea how this works, right? And on the description of this model, it said it uses training data to create the outcome. So I have to input training data for it to be able to even do its thing, right? So when I look at those layers, what I kind of walk away from is that in degree of difficulty and complexity, I have kind of separated myself from the possibility of understanding that outcome, right?

And the closer that you get to those advanced techniques, the more comfortable you have to be with, I will be losing control ultimately of my ability to, in a satisfactory way, understand the predictors that are at play. And this is why, from a machine learning perspective, this has always been one of the challenges in educating

folks that don’t understand what’s going on behind the scenes. Because a lot of these tools, frameworks, off-the-shelf open source libraries, things that might work in certain circumstances like stocks or flight data, things that are very well -sourced in terms of datasets, it might work great in that context. But in the context of your data, it might be…

not well suited, right? There might be a variety of reasons why it doesn’t work well. And then apply even further the assumptions that you introduced, your source data, et cetera, et cetera. Then you have effectively the Wild West, which is why I think when we talk about machine learning, understanding the assumption variables is such a key concept in everything that you’re doing. I just, I’m curious, is this something that you ever had to deal with kind of when

establishing a model, right? Because you’re in control. You’re the person that can design that model in any way that you want. And what tools would you be using in any given scenario?

Yannis Katsaros

Yeah, that’s a really great point. And I hadn’t actually thought of forecasting in that light of when you made the analogy to Photoshop. And actually, I really liked that because I was thinking about it while you were going through the analogy. And I think that that’s a really nice analogy. And I might steal that actually, because the aspect of of the example you gave is

It’s important to illustrate that you have a pipeline, you have layers and in some particular aspects, you’re okay with kind of hand waving. Um, like I don’t really know how this works, but I, the outcome, I can see that it’s, it’s good and I don’t care how it works. So an example of this is that you said, you know, maybe I just do the lights and I change how the light, you know, it is going up or down fine. But then the denoising aspect, right? Like I may know nothing about how Photoshop is doing that denoising algorithm.

But when I look at the result, I’m like, this is awesome. This is great. I’m going to get this in my prediction pipeline. And so it’s all about, again, I’m going to keep saying this, quantifying and understanding the uncertainty that you have, not only in your, in your data, but also in the forecast and what you’re okay with, um, with what kind of uncertainty you’re okay with. And so tying this back to the original discussion we had about having subject matter experts, experts, or like,

You know, at any company, you’ll have the long lived spreadsheet that is like the God model that says, this is how we do, this is how we’re forecasting our sales or this is how we’re forecasting the number of cars we’re going to sell or the widgets we’re going to produce. And it’s very important to take that seriously. I’ve made the mistake earlier in my career to, you know, get too excited about, oh, we can just use machine learning, throw this into the model. And you very quickly see that you end up with garbage because It’s all up. Machine learning is great at picking up patterns, but you need to be inputting data that inherently has signal and not as much noise. So if all you’re doing is putting in noise, you’re just going to be fooled by randomness. And so you really need to understand what are the signals that I’m looking for? What are the driving factors in my business? What are the driving factors in what’s actually producing these numbers that I’m looking at?

So, um, that’s where I think the wisdom of your people, your employees, your subject matter experts. And even if it’s something as simple as a linear regression that they’ve just kind of hand tuned over time and they kind of have this custom made thing that outputs data, but they trust it and they think it’s reliable enough. Those are really important aspects because then you can, going back to your analogy, start layering in aspects that you think can help you.

Right. You know, let’s say this, this model produces data that might be kind of noisy, like, okay, maybe I can use an off the shelf denoising or smoothing algorithm, something as simple as like, you know, a running, uh, mean or running median, or I can do quantiles to capture kind of like upper or lower. You can get more fancy, you know, Coleman filters or whatever, you can go crazy with it. And you can just, depending on the type of subset of problem that you’re solving, be able to look at the results or be able to.

compare them to what you expect to see and very quickly understand if you, if you trust this or if you don’t trust it and if you want it as part of your pipeline. And I think the ultimate forecasting workflow is where you have the tools in place. This is what I’ve seen. This is kind of how I ended up in software engineering is that I was spending 90% of my time, 95% of my time, either getting the data prepared and ready for data engineering or building tools, software engineering be able to just to do the thing that I originally set out to do. And that’s where the exciting part is just having a really good tool at your hands. Again, Photoshop is that example that you gave that Photoshop doesn’t tell you this is how your picture should look.

It gives you a set of tools and very fine control. And in some ways, you know, you can just tell it like, make this picture look better and just wave the magic wand. And if you’re okay with that output, maybe it’s fine for some cases, but maybe in other cases, that’s not what you want to do if you’re selling, you know, your wedding photography to someone and that the results look awful and you’re like, well, Photoshop made it that way, sorry, right?

No one’s going to accept that. You need to have control over the tools that you use and become an expert in using them effectively. And so I think having those at your disposal can really help you craft and understand and lead you to better forecasts and decision -making ultimately, which is what the whole point of forecasting should be in my opinion is how can we use this to better understand our business or to make better decisions moving forward for our business. It’s all about better decision making in my opinion of why forecasting matters.

Joseph Drambarean

So what’s interesting, I guess from a Trovata perspective, we’ve been talking a bunch about treasury innovation over the last few years and it’s a real focus going into this year. And one of the cool things about the podcast is that we get to talk about concepts that we believe will play a role in the innovation in treasury over the next year, few years, et cetera. And machine learning is a topic that is obviously very hot, not just in treasury, but in finance in general. And I think a lot of the reasons are driven by the emergence of obviously generative AI tools that are impressive from a language predictor perspective.

But maybe less so from a mathematical perspective. And there’s a lot of, I guess, intertwining of concepts when folks approach the notions of forecasting and kind of the belief that AI will solve a lot of problems. AI will be able to predict things, mathematical things, things related to words, things related to photos, et cetera. And I think what would be great in kind of our discussion is to provide some tools to our listeners and especially in the treasury space to navigate machine learning specifically when it comes to forecasting.

And one of the things that I heard you say, which I thought was a great takeaway, is the importance of measurement in the whole process, right? Not trusting effectively anything, whether it’s your model or a machine learning model, but you know, this general kind of practice of having a good scientific approach to measurement and quantifying how effective was my forecast and what was the variance and why was there variance? Was there a specific part of my forecast, maybe the part that was driven by machine learning, maybe the part that was driven by my own assumptions that caused that variance or in the inverse, this was a lot closer than I even expected it to be.

And I must be onto something. There must be something going on here that is driving that. And one of the things that would be great to discuss is when using machine learning as an approach, what is it specifically that is the best kind of use of machine learning? And you already talked about a couple of examples. One that you mentioned was, well, machine learning tools are really great at identifying patterns, right?

And pattern analysis is excellent when kind of trying to figure out, I’m looking at a set of data and I don’t know very much about this data. And I want to just try to identify what is this data saying? Now, you also said that in the process of using these tools, they’re very good at fooling you, right? Just like AI, you know, when we were playing with ChatGPT early on and we were asking it questions about, you know, known knowledge, right?

Things that you could find on the internet in an article or something like that. And because ChatGPT in some cases would respond with such authority, you would just believe, you know, that it is fact because of the way that it phrased the answer and, you because of how verbose it might have been crafted. In the same way, these tools are really good at kind of tricking you, if you will, because they’re designed ultimately to try to find a pattern.

Right? That’s, that’s what it is optimizing for. What most people don’t realize, and I guess this is maybe where we can segue, is these tools have built in effective kind of tool sets that help you understand how accurate they are. Right. And those play a really big role in deciding whether or not that data analysis is worthwhile, whether or not you should even use it. And I’m curious, have you had any experience in that regard with regards to tuning machine learning frameworks and when would be the right time to even use it? And what would you be looking for?

Yannis Katsaros

Yeah, definitely. You gave some of the examples with LLM labs and I can’t say I have much of any experience tuning LLMs or evaluating them, but I definitely do have some examples to share with like just more classical machine learning or even statistical regression examples. And you actually mentioned this earlier. It’s a very simple and very good test of how good are my predictions. Is comparing what you predicted to what you actually saw. I think that’s such a simple and underrated one that like, if you’re gonna do one thing, right, is have a mechanism, have a workflow, have a tool set that lets you capture what am I predicting is gonna happen in 30 days? Save that, don’t cheat yourself, right? Don’t revise what you said you were predicting.

And then when the time has passed and you actually observe the data, compare that and really be honest about what you’re seeing. Are you wildly off? Are things just like consistently above and consistently below? Or are they just like all over the place? Because there’s a lot of value in doing that, in looking at how well can we, how well are we, or how good are we at even predicting? And…

I can give you two examples of this again, tying back to my experience in the past. The first one in oil and gas, um, actually had to do with a mistake that many, many different companies and operators were making in that essentially they were looking at early time data and they were using these physical models and also empirical techniques as well to fit models to the data. And they were getting obviously fantastic fits and looking at the projections into the future.

And allocating capital to then, you know, and I’m talking, you know, billions of dollars of capital allocated in making some decision to go pursue a field. And then five, 10 years later, they’re looking back and realizing these are not economic wells. Like what, what happened? What went wrong? And only in doing like, uh, what’s called a hind cast where you’re essentially forecasting in hindsight, like knowing that you have all of the data.

It became apparent that it’s very easy to fool yourself into over predicting. And I think that was a very important learning for the industry in that you have to learn to quantify your uncertainty and you have to be able to tie your models and your forecasts to some expectation of what reality is, of what you would expect to see given the information that you’ve learned historically.

Another example of this where, you know, it’s being able to, to forecast and understand how far off you were is, uh, an experience I had in like the grocery space, right? Uh, I was working at, uh, essentially building regression models to try and understand the effects of different confounding variables when certain decisions were made at the store level. So for example, uh, we have these 50 stores, we decided to put additional signage for salsa or something to see how much better salsa would sell.

And that’s kind of tricky to really pick that apart and say like that if you saw an addition in revenue, it was specifically because of the salsa signs, right? Like you can very easily be fooled by just looking at the revenue and being like, oh, why not? Must be the salsa. Must be the salsa signs. Whereas there may be very many other confounding factors that you weren’t considering such as like, do you just have more customers come through this time of year?

Or is this just a higher, you know, a higher volume time of the year, or are these stores just like very popular with salsa to begin with and they just happen to sell better, right? You have to always have a, an expectation and be able to also check that against reality. And so going back to the basics, have your predictions, make them be able to store them and look at them over time and compare them to actuals.

And that’s actually something that you can do with Trovata in specific, right? If you have a forecast, whether it came from Trovata or not, or you brought your own forecast, I think one really good aspect of what we do is that you can compare them against actuals. Like what is my forecast saying against my actuals? Because that gives you a really good sense of how far off the mark am I. If my model is consistently slightly above, that’s actually not a terrible thing because you know that, okay, this model kind of consistently over predicts.

Sure, I could do some other things to correct that or maybe, you know, remove some of that over prediction, but at least I know that I’m consistently being off. Whereas if I have a model that’s like reality went up, model set down, model set down, reality went up and it’s just like all over the place. You have no trust necessarily for what the model is telling you and effectively how is it helping you make better decisions? I would argue that it probably isn’t.

So yeah, just look at historic historical data and see how good are my model predictions. And there’s a lot more you can do from that, but I’d say start at the basics.

Joseph Drambarean

Yeah, you know, it’s interesting that you mentioned two things that I think are awesome takeaways for our audience. Number one is that the environment in which you’re operating the tools specifically that you’re using play a huge role in you being effective as a forecaster.

I guess this is kind of removing all of the religion from this topic, right? Because we could sit in a room and we could have a group of machine learning experts, data scientists that would kind of argue emphatically, this is the best way to forecast and it’s because, you know, we can use the most advanced, you know, training capabilities to get to an outcome. Then you might have a set of academics on the other side of the table and they would say,

You’re wrong. We have hundreds of years worth of mathematical facts to say that there are established techniques from a statistics perspective, from a grading perspective to do forecasting and here’s how you would do it. And then you have people in the middle that might say, well, I can use both. It really depends. And I’m happy to use both. The thing that would be common across all of them,

dependency that they would all have is managing and maintaining their datasets and doing that in an efficient way so that they don’t introduce the error, so that they have ease of use when having to do things like grading their forecasts and evaluating whether or not, hey, was this forecast effective? What was the variance and was it, you know, consistent or was it wildly inconsistent? Well, you know, your whole process of doing forecasting becomes infinitely more complex.

if you have to add the workload of maintenance, right? And I think that that’s something that if you’re kind of thinking about forecasting and maybe, you know, you’ve been doing it by hand and you’ve been doing it in Excel for years, this is one of the things that is a big takeaway, right? You know, whether it’s Travata or any other tool, it really doesn’t matter. It’s do you have a tool that is giving you the ability to efficiently navigate this data and not have to overwork, right?

Yannis Katsaros

And I was going to add that like, it’s, it’s no surprise that you’re seeing this happen in, you know, the machine learning space and machine learning engineering in general, when you look across the industry, right. And this isn’t just a problem that let’s say non -programmers are having no matter, you know, the experience you’re seeing this problem across the board. You know, you’ve heard the term machine learning engineer, ML ops pop up so much in recent times. And that’s not necessarily a problem in that we don’t have good models. Like we have.

incredibly good models that you can go off the shelf and use from, you know, PyTorch, Hugging Face, TensorFlow model, Scikit-learn. Like there’s an abundance of models I can go and with a few lines of code, I can train a model and make predictions. It’s really incredible. The hardest part is all of the engineering and the workflow and the tab keeping around that. That’s really what’s, what’s difficult. And to extend and generalize that, that’s not just for machine learning engineers, that’s for

Any person anywhere, whether they’re writing a little bit of VBA in Excel, or they’re just writing Excel formulas and managing different Excel workbooks with CSVs, like no matter how complex or basic your workflow is, it’s the hardest aspect, let’s say, of doing forecasting correctly and understanding how good your forecasts are and not lying to yourself is that workflow and having good tools. And that’s really key. I think having good tools is a huge step up in in helping you do all of this and keep you focused on the business.

Joseph Drambarean

Yeah, and I think that the other big takeaway that I heard you saying, maybe this is your hot take. And I apologize if I’m putting words in your mouth, but it’s don’t trust anyone that says machine learning will solve your problems. And I guess that it’s a very simple reason for why, right? If there’s any software, if there’s any, you know, homepage of a website that says our machine learning AI will predict the outcome of your business.

Yannis Katsaros

Yeah, I agree.

Joseph Drambarean

It’s false. And the reason for that is because of what we were talking about. There’s no replacing the human operator to grade whether or not the thing that is being predicted is logical based on the assumptions that have been kind of introduced, right? Because a machine has no idea, right? Just looking at a data set is not informing a machine about anything. All it’s being informed about is whatever that data says to the varying degrees of completeness that that data has, right? Whereas you as a human operator, using that Photoshop example, you know what a good photo looks like, right? You have a target. Like if you’re the wedding photographer, you have a spec, right?

The client wanted their photos to look dreamy and the video to look cinematic and to have a certain effect. And you’re looking at your screen and if the colors are wildly red and the light is blown out and everything looks crazy, that’s not right, right? And you’re evaluating it based on what you see. And I think that that’s something that a machine maybe over time will start to learn the habits, and characteristics of what a certain persona of a human would do in a certain scenario. But that is so not the case right now.

Yannis Katsaros

Yeah, I agree. And that’s one of my biggest takeaways in historical experience that I’ve had working in data science and prediction roles is that like, I can spend four months in my room trying to figure out like trying to tease out signal from noise, right? I could spend four months, five months, and I could maybe emerge victorious.

But then I would show that to an SME and they’d laugh at me and be like, dude, you could have asked me that. And I told you in five minutes. And this has happened. Luckily, not to me, thankfully. I avoided that. I, I enjoy talking to these people cause they really know where to drive you in that right direction. But I have heard of this kind of thing happening, especially in complex industries where the driving factors are very nuanced and you have to really understand what you’re modeling from a business and from a you know, fundamentals perspective before you can even put machine learning into the equation. Cause I’d say, yeah, that’s really the, you called it a hot tick.

I think it’s kind of reasonable of like, you can’t expect to throw just numbers and, and have some meaningful outcome. Don’t get me wrong. I think there’s absolutely room for machine learning and you know, probability and statistics to help. And I’ve, and I’ve seen that happen and, and, um, have experience with that being the case, but you always need to start at the fundamentals, whether that’s like a physical model or just someone that knows the industry so well.

And you can start there. And so you couple that person with a really good set of tools and maybe then having, you know, additionally, uh, a technical person or the tool is good enough that they can start to hold their hand and help them on aspects that they don’t really know how to do themselves. That’s the recipe for success because, um, I think that’s what’s really the driving force to get models and forecasts.

Joseph Drambarean

Yeah, so I guess to put on our own predictor hats for a moment and just think about where is everything going, right? Because it’s the use of machine learning and the use of AI, it’s not slowing down, right? So we’re kind of talking to our audience and the takeaway hopefully should be is that you need to have still that human touch, but the tools are getting so much more advanced.

And there are use cases for machine learning. And I think that some of the ones that are the most exciting in terms of the future is the degree and the complexity of which these tools can be looking at data that has all kinds of variability in it. Right. Because what’s awesome about infinite capacity cloud compute married to a highly complex and capable models is that you can throw vast amounts of data at these these stacks.

And you know, you would never be able to do that as a human, right? If you sat down with an Excel workbook and had a thousand tabs and all those tabs were filled with millions of rows of data, and you tried to do it by hand, you would never be able to do it. You wouldn’t have a laptop that’s powerful enough to do it. It’s impossible. It’s just not possible. And I think that’s where the promise is, is that marrying that human subject matter expertise with the incredible capacity of our computer systems and our modeling capabilities.

It creates this marriage where the human outcome is so much higher than what could have been done on paper. And I think that’s an area that I’m excited about. I’m just curious, what have you seen in the industry? What excites you?

Yannis Katsaros

Yeah, that’s a great point. And a lot of where we’re going with this is that it kind of raises it raises the floor, not the ceiling, right? It raises the floor into like the ability for someone to get a lot further than they would have in the past with much less. And going off of that point, like really, like you said, to put on our prediction hats, humans, as you’ve probably mentioned, you know, understood by now are great at forecasting, but they’re also terrible at forecasting. We didn’t really talk about the flip side of this, right? And this is where I kept talking about like,

where probability and statistics are really useful in quantifying your uncertainty and really measuring what your bias is. And when I say bias, I mean that from like a statistical sense, right? If I feel very strongly about some outcome, being able to compare that and see like, how does this actually compare to reality? And so while there’s, I think, really incredible and exciting things ahead,

You mentioned the large amounts of data, right? That these models are increasingly data and parameter hungry. Like, you know, these 32 billion parameter models, language models, these like, you know, massive, massive models. And right now, of course, you know, transformers, LLMs, they’re kind of all the hype and rightfully so they’re very exciting, but there’s many industries and there’s many, many, many types of roles and jobs where you don’t even have a hundred data points to work with, right?

A team of five data scientists and be like, all right, guys, here’s the data. And it’s an Excel spreadsheet with 32 data points. And you’re like, what are they supposed to do with that? You know what I mean? Like that, this is where you have to recognize where you have the ability to really leverage machine learning. And in the case of like, you know, tying back to you talking about LLMs, we’ve taken corpuses of text.

and been able to tease out some of the relationships and how our language is constructed. And now we have tools that we can augment our workflows in incredible ways. I can, you know, very quickly be able to look at all these documents that my company has and try and find something very, very vague. And I have tools at my disposal now that make that kind of a search much faster and more effective than I did in the past. But then again, if I have 10 data points and I’m trying to like build my own model or build my own statistical model or machine learning, you get in trouble very quickly if you try to do that with these very data hungry types of models.

And perhaps, I mean, this has been talked about for sure, you know, in looking at what the tech industry does as a whole and looking at, you know, Google and Apple and Amazon. And we have to keep in mind these companies and the types of predictions that they’re making. They have petabytes of daily data that they’re ingesting and being able to through their incredible compute and, and, you know, top of the line systems to crunch those numbers and tease out data and build these models. And not only that, they’re having to constantly retrain them with new data as it’s coming in, because these trends just are too volatile to be captured, right? They don’t just take data for a month and been like, all right, we understand all of human, uh,

Emotion we understand all of human capacity now like we don’t need to do anything else We understand how humans behave like it’s quite the opposite like these things need to be tuned daily hourly like you know, perhaps even even faster because they’re just a You know, they’re just a prediction obviously, but they’re they’re a imperfect understanding of how we are going to behave and what we might do and so

Again, in many scenarios, businesses just don’t have that luxury. You have 10, 15, 20 data points. And you as a subject matter expert need to understand how to make the best use of that data and have good tools at your disposal to be able to start somewhere and track and see how am I doing? How is our business doing with these forecasts that we’re making? And then start to layer in potential auxiliary uses of machine learning into your workflows.

Joseph Drambarean

Yannis, thank you so much. This has been one of the most fun conversations that I’ve had on Fintech Corner. Definitely the most nerdy and it was awesome. I appreciate your time. And that’s it for Fintech Corner for this time. We’ll see you in season three, which is coming soon and we’ll see you guys later.

Back to episode home

Hosts / Guest Speakers