← Previous · All Episodes · Next →
#26. Bridging Gaps in Data Science with Matt Dancho Episode 26

#26. Bridging Gaps in Data Science with Matt Dancho

· 30:41

|

Bridging Gaps in Data Science with Matt Dancho

Sheikh Shuvo: Matt, you've had such an interesting career with lots of different things that you've excelled at. Can you take a step back and tell us about what that journey has been overall? I'm curious to hear what some of the inflection points were along the way that led to where you are now.

Matt Dancho: So just to get started, right now, when people see me, you know, I've got a company, Business Science, that I started about five or six years ago. I've got Quant Science, which is my newest company that I co-founded with a business partner. But that's kind of where I'm at now, and it's been quite a journey to get here.

So, just a little bit of background about myself. I started in data science, which is a field that actually back when I started in it, was around the 2012-2013 timeframe. And I read this book called "Moneyball" of all things.

Sheikh Shuvo: Oh, it's one of my favorites.

Matt Dancho: Yeah. Yeah. It's amazing. And you know, like within probably the first two or three chapters, once I started to get into the stats and how they started applying it to baseball, a light bulb went off in my head. I was like, "Whoa, what if I could do this for business, right?" And that's where this journey kind of kickstarted. So, to fast forward, I've been doing data science since 2013. Some of the inflection points, I think one of the biggest things as I started to go forward was understanding how to really do stuff for my company, either for my company or for my side hobbies. I call these, you know, a lot of us learn the same way: we kind of dive into a project or we dive into a problem and over time we try to solve it. So, one of the biggest things that I found through my own journey, an inflection point, was when I created the software library called TidyQuant. So, I was doing this project. It was actually a side hobby. So in 2008, I got into investing, and this was way before data science, of course. I was listening to this guy, Jim Kramer, Mad Money, if you've ever seen that show.

Sheikh Shuvo: Big personality.

Matt Dancho: Yeah, big personality. And I followed a couple of his picks and he made me some money. But then, I ended up, you know, around the 2008 financial crash, I invested in one of his picks, Washington Mutual (WaMu), and I ended up losing about $30,000, which was like my life savings at that point. So, fast forward to the 2015-2016 timeframe. I started to get a little bit more dangerous with data science. And I was doing some investing analysis and I had made most of my money back by then and some more, but what I wanted to do was...

Sheikh Shuvo: Did your opinion on Jim Cramer change at this point?

Matt Dancho: Yeah, back in 2008, when I lost all that money, I was like, "Yeah, I got to quit following this knucklehead. I got to smarten up, right. I got to get a little bit smarter about what I'm doing. Make some better decisions." And that's kind of been like the general theme through my career progression, honestly, is "How can I make better decisions?" Usually, there's pain involved, which causes me to react and reflect. And then, you know, I figured out a better solution.

So, in 2016, I started to develop this software called TidyQuant, which was... I started out as an R programmer and I still program with R to this day. And I made this R package to help me just smarten up with financial analysis. So that was kind of the key for me, and I learned so much through just building this project. It was way more than any course I'd ever taken before. It was way more than, you know, the courses I was taking, some through Johns Hopkins University and some through Coursera at that point. The courses were all right, but I wasn't learning, it was just like in one ear out the other.

So, it wasn't until I started building this software package that really made a difference. Within a few months, I think that software package took off and people started downloading it. I open-sourced it, and people started downloading it. To this day, it's racked up over a million downloads. And it's used by a bunch of finance companies, all the big companies. If they're using R, they're probably using this software called TidyQuant. And it was good for me because it was, you know, "Hey, now I'm able to analyze a lot more than I can do in Excel. I can do my financial analysis."

So that was one of the big inflection points. I think another one was starting my company, my first company, Business Science, and that was a really important point in my career, because that's where I really started to branch out beyond just the boundaries of the company I was working at. I'd been doing data science, without an official data scientist role, because my company didn't have a data scientist back then. But I was getting a lot of results for that company. And what I wanted to see was... I started to get some inquiries when I released this TidyQuant package, and companies were actually reaching out to me, saying, "Hey, can you build me an application that uses the software inside of it and does something?"

And it was mainly these small banks that just wanted to impress their customers. They wanted to be able to pull up a web app while they were trying to persuade a customer to invest money with them. So, I started to get some of these inquiries, and at first, it was just a thousand bucks or 2000 bucks here and there.

At that point, I was having so much fun, I would have happily done it for free, but it was cool to be able to charge for it. But at that point, once you start creating a business, consulting and starting to do some of that stuff, I wanted to limit my liability, and I created Business Science.

So that was a big point in my career when I started to branch out and actually make money with data science beyond just working at a company doing some data stuff for them. And then, back to my company too, one of the pivotal points, and I think this is important to understand, was the company that I had worked at, which was a company by the name of Bonney Forge. It doesn't surprise me if you've probably never heard of them, but they're actually kind of a medium to large-sized company in the oil and gas space.

What they used to do is they would manufacture small valves that go into oil refineries, and they also do fittings and all of this stuff that's in oil and gas. Well, about that timeframe in 2016, the price of oil dropped.

Sheikh Shuvo: Oh, I remember that. Yeah.

Matt Dancho: Yeah, it went from, in 2014, it was like a hundred dollars a barrel. Everyone was complaining about gas prices. And then within the span of two years, it dropped down to 25 a barrel. So, the company I was working for in oil and gas, I mean, their bread and butter basically, the revenue was very correlated to the price of oil. And the revenue started to drop. One of the cool things was that data science actually helped us to focus on the right customers at the right time. So back then, I got a huge win with that company by doing this thing called lead scoring. And it's now called lead scoring, but back then it was just like, "Hey, how do we figure out which customers to focus on?" And we were getting all of these inquiries through that sales group.

So the cool thing was, when I started to get my small group, this ragtag group because we were in the middle of that oil recession. We had downsized from like eight people down to like four people or so. So I had very few people, and it was either feast or famine. We'd get tons of requests, or we had no requests. So when we had tons of requests, we didn't have enough people to handle all of these requests. And we couldn't hire people because we just went through like our fourth round of layoffs, right? So, if I went to my boss and said, "Hey, can we hire another person?" he'd be like, "Go take a hike." So, at that point, I implemented, I got them entering some information into just an Access database at that point. Nothing too fancy. And then I started to pull that data into R and be able to crunch some numbers on these, and actually calculate hit rates and do some beginnings of machine learning on it.

And we started to score the customers, and before you know it, stuff started working. We took that small group, it was probably 2014. We did like 3 million in revenue. That's not profit, that's revenue. And we ended up taking that by the 2015-2016 timeframe to about 15 million dollars of revenue. So my company was, in the worst period of my company's history, like in the past two decades of their performance, we ended up five timesing their sales with that small group. The CEO saw this, other people in the organization saw this and that's when the promotions started coming and all that stuff.

But, you know, it was pretty cool. So, those are kind of the inflection points, the big things that happened in my career that really, like, I would say were huge wins with data science that really kind of helped get me to the point where I am here today, you know, running two companies now and having a lot of fun.

Sheikh Shuvo: Taking that experience and leaving Bonnie Forge to do your own thing, sounds like your version of Billy Bean initially rejecting the Red Sox.

Matt Dancho: So I actually had applied to companies like Facebook, or Meta. I got rejected by them. And that was like a big ego hit because I was like, "Man, you know, I just made this awesome software. It's being used everywhere. Like, it's crazy." Every company's different, but you don't need to work at like a huge tech company to be like, you know, you don't have to work at the Boston Red Sox to have an impact at your company. In fact, a lot of times the smaller companies, the bar is lower and it's easy to stand out. That's gonna help you with your career.

Sheikh Shuvo: Going back to when you first started TidyQuant, after you open-sourced that, sounds like the growth was just explosive and that's a huge win within those five months. How did you get people to start? What did the initial marketing look like to generate that traction?

Matt Dancho: I got lucky. I released it on the last day of 2016, so December 31st, 2016, I published my first library out to the world. And it was kind of like crickets right then. I was like, "Oh my God, it's out there. What are people going to think? What, you know, all this rush of emotions, cause I've been working on this thing a couple of months." And it was pretty raw back then. I mean, it was just basic, it's not what it is today, certainly not all the functionality. But, what was cool was, back then, this company called R Studio, you've heard of R Studio before?

Sheikh Shuvo: I'm familiar.

Matt Dancho: So, they're now called Posit. They rebranded here about a year or two ago. But, they have this big conference and it was the biggest one in the R space. And the guy, his name is Hadley Wickham, this was back in 2017. At that point in time, they had their conference in January. So the end of January, Hadley Wickham, who's like their chief data scientist, a big name in the R space, he gets up there and, during his keynote, he mentioned my TidyQuant package. And just that little mention there, because that thing was being streamed live by tons and tons of people. And, you know, that was enough to kickstart the trajectory. Yeah. So I got lucky. I mean, not going to lie. It wasn't like I went out there with a big marketing campaign, purchased a lot of Facebook and Google ads, and you know, no, wasn't like that. It was all organic, and it was just a lot of good fortune, and being in the right spot at the right time.

Sheikh Shuvo: You found the right R influencer to work with.

Matt Dancho: I didn't even know he was going to, honestly, I didn't even know he knew. It was crazy.

Sheikh Shuvo: Oh, that's awesome. Now, looking at a lot of the content that you produce, Matt, it's very much geared towards empowering data science professionals to think differently about their career, how they progress, as well as different tools to use to make things better and faster. Looking at when a data scientist is starting their career, either at a medium or big-sized company, what do you think are some of the mistakes that data scientists make early in their career?

Matt Dancho: There's really like three mistakes when I sat back and thought about this, nobody's data science journey is perfect. And I'll be the first person to say this. My data science journey, it took me probably three or four years to feel comfortable. I don't know if you felt this way, I certainly did. I suffered from this thing called imposter syndrome. So when I first started learning data science, it was always like two steps forward, one step back, maybe sometimes two or three steps back. And that was the cycle that I kind of went through. And then I've had a lot of times in my career where my ego has taken a hit, where then I start to feel confident. And then, like I mentioned earlier, you know, apply to Facebook and then they reject me. And it's like, "Oh my gosh, am I ever going to become a data scientist?" Oh, and by the way, the reason that they rejected me was because I didn't have a PhD. And that really messed with my head because, I had just finished up my MBA at that point.

So this is like the third degree that I have now. So I got my bachelor's, I got an industrial engineering degree, a master's in industrial engineering, and then an MBA now. And I'd spent so much time getting schooling and now I think, "Oh my gosh, I got to get a PhD if I want to become, my dream job, which is becoming a data scientist."

Sheikh Shuvo: It's only five to seven years of your time.

Matt Dancho: I had already spent tons and tons of time in school, so I was comfortable with that. But my wife, who also had a child on the way, you know, we had a child on the way. She was not comfortable with that; that was not an option. So, the mistakes that I ran into, and then this is probably, things that I would venture to guess a lot of people run into, the first one is not building.

So when I look back at my own path, the things that held me back were just taking courses without ever really applying them. And that was a big mistake, I think, and it took me a while to figure that out. But once I did figure it out and started building something, that's where things started to change for me and started to turn.

So for me, it was TidyQuant. I found, you know, my hobby, which was financial investing, and I really focused in on how can I just do something for me. Like, how can I make this easier? Because I was doing a lot of stuff, like translating code from different systems that are in R, one's called XTS. But I would have to then go to the Tidyverse, and use data frames, and then go back and forth all the time.

So TidyQuant kind of solved this problem for me. So I could do financial analysis faster. So that mistake was not building at the beginning, like finding a project or a problem that you're interested in, invested in. I think that was pretty important. Mistake number two:

So I'd always go on LinkedIn or I'd go on Twitter or something like that. And I see people using all of this cool stuff and doing all of these things and saying, "Hey, you know, you need to learn this." And, or "I'm learning this and it's amazing." And a lot of those would be things like deep learning.

Alright. So back then it was deep learning. Nowadays, I see it on LinkedIn, it's like LLMs, right? Everybody wants to dive into AI. And don't get me wrong, AI is super, super duper powerful, especially when you're starting out. It's kind of like drinking out of a fire hose, right?

You're trying to go a little bit too far too fast. So back then, it was me and deep learning. I remember at one point in my career, I literally spent six months with the lead scoring model that I developed. And I just had a basic working model in, which was just a logistic regression.

Right. And I'm like, "Okay, deep learning, this thing's gotta be amazing. Right. I see people predicting cats and dogs, images, and doing all this stuff online. I'm like, oh my gosh, if it can predict cats and dogs, it's gotta be good for lead scoring."

Right. So I spent like six months and I really never got to a point where it was even better than my basic logistic regression model, which was like super simple. And I put that together in like less than a day. So I see that with a lot of newer data scientists today. My fear is that you dive too far into all this advanced stuff and you skip some of the basics, and you just need to focus on getting a working solution first. Keep it simple, and then add that complexity a little bit later on. There's one other mistake too, and this goes back to applying what you're learning. So I see a lot of people, myself included, not applying things. People love math and love the theory of everything. And don't get me wrong, that stuff's important, but at the end of the day, you get paid for getting companies results, actually applying the tools to your job and not just talking about it. And that's the differentiator.

So once I kind of flipped the switch and said, "How can I apply this?" that helped me a lot. So that's what I would say for that one. Hopefully, that

Sheikh Shuvo: Yeah, no, that's awesome. Well, in terms of thinking about the latest and greatest, generative AI is on the top of many people's minds. And you're working on a course right now for how data science professionals can use ChatGPT, which doesn't get a register for, but can you talk a little bit about just how you're using Gen AI in your own workflows and what your recommendations would be for someone who hasn't to start?

Matt Dancho: So ChatGPT is a big component of what I do. There are a couple of different ways that I'm using it, and I'll also explain some of the mistakes too because I think there are some things that beginners, when they just start trying to use ChatGPT, think that it's going to get you from point A to Z real fast and you end up getting into a bunch of trouble. At least that's what happened with me when I first started. I got really

Sheikh Shuvo: I usually get stuck around E or F on the path to Z.

Matt Dancho: Yeah, yeah, exactly. So, ChatGPT is so great. And then, you know, taking it one step further, when you actually start integrating AI into applications, like you can actually build with OpenAI's or any one of a number of different LLMs now, but I think the starting point is just trying to improve your productivity with ChatGPT. And that's where I've seen the biggest benefit in my own daily work. So I started using it at first, just as kind of like a replacement for Google or Stack Overflow. And it's really helped out with that. A lot of times, instead of searching for the right solution, now I'm just kind of having a conversation with ChatGPT and asking it like, "Hey, I just ran into this error. What's the easiest way to resolve it? This is what my code looks like." And a lot of times it'll help out there. Some of the mistakes that I've made, though, when I first started using it, I tried to have it build just like a big web app, you know, right off the bat with machine learning and all the stuff in it. And then I spent like seven hours debugging it and that is not a good use of your time. It probably would have taken me seven hours or five hours just to build it. So I probably lost a net two hours. Right. So that's kind of the mistake. That was really frustrating when I first started because I saw the ChatGPT, as soon as you type something in, it generates all that code for you. And my eyes just got real big. I'm like, "This is the future." And then you got to run that code and nothing works. So what I found was taking it step by step through the process, and that's really important. If you start with your data science process of just doing some exploratory work at first, you know, have ChatGPT help you with that rather than just going right for the web app right off the bat. Have it just like, "How should I evaluate this dataset for outliers?”

What graphs do you think are appropriate for numerical data to really get the idea of whether or not my customers are buying and which are our top customers, those types of things. I think you can walk it through the process. Whether it's exporting to R for data analysis, cleaning the data, pre-processing the data, machine learning, making the machine learning models, all the way up to then once you have the working kind of analysis, then moving into your web app, that's a much better strategy. And that's what I found was probably one of the biggest benefits. Nowadays, what I do is I do build web apps quite frequently. I do it a lot for my learning labs, but back in the day, I was also doing that a lot for customers.

So, I can see it for people who are interested in either consulting or internally to companies building proof of concept web apps, you can do that much faster, too. So I'm having a big benefit there again. You want to walk it through the process though. Don't just go for the bat. Wait till you have a working model and then at that point. That's when you can start having ChatGPT help you build that web application much faster. So, yeah, that's kind of where I'm using it now. Oh, and by the way, the ChatGPT workshop, I do have a workshop where I actually demonstrate how to do this. That's that link that I gave you.

Sheikh Shuvo: Yeah, I'll make sure to include that in the show notes.

Matt Dancho: Yeah. So anybody who wants to check it out, see what this actually looks like in a project that I put together, I show you my full process and you get to see me build a web application with ChatGPT, which is pretty cool.

Sheikh Shuvo: Nice. And it takes less than seven hours.

Matt Dancho: It takes actually about 35 minutes. But that's because I'm explaining stuff. But in order to be able to, probably for like, if you're just starting from scratch, if you already know the data set and already have a working model, it'll go a lot faster, but yeah, figure an hour or two, if you're just doing this from scratch, you can build a basic web app and it's a big productivity enhancer.

Sheikh Shuvo: From this latest workshop you're putting together, traveling back in time a little bit, looking at the very first course you made on data science fundamentals, do you think a lot has changed since then? Are there any areas of it you would go back and revise?

Matt Dancho: In terms of the courses that I've developed, so I started off developing courses. The reason that I started developing courses was to kind of close the gap between data science and business. It was about the 2018 timeframe. I actually, it was about six months before I quit my job. So I was working at Bonney Forge. I had done a bunch of great things there. I also had this consulting business that was growing and starting to actually overtake, the money I was making from consulting was overtaking my day job. So, the problem that I started seeing though when I was working with these companies, I'd now graduated from working with kind of these smaller companies, like just, you know, mom-and-pop shops in finance to actually some Fortune 500 companies.

And, I developed the courses really because of some of the stuff that I saw there. Truth be told, the first Fortune 500 client that I had was a big manufacturer of ingredients, and they made products for any company that makes, like cereal or, you know, the raw materials that go into that stuff.

So I got this job, and it actually did not go that well. It's kind of like you learn from pain, and they're the biggest educators. So I took on this job, and my problem was, it was a lot different than working with these smaller finance companies who really knew what they wanted. But this was just like a small project. This was the first time I was working with a director and a VP, and they really didn't know what they wanted. They just wanted to dive into AI, dive into data science. First off, it took forever to get the data set from them.

It took, probably, so we had a deadline that they were working towards because they wanted to be able to show a report for their management during the next big review, which was happening in October. So it took like four weeks. We started this about a month out from the review, and we never got permission to get the data set until like the Friday before, and they had to present on Wednesday. So, they wanted the report back on Monday. And so we had to work all the way through the weekend. And I actually hired a guy to help me out because he was an expert in HR data, which I wasn't very familiar with. I was just, you know, going to run it through my normal process, but I wanted somebody to help who actually knew that type of data. So, anyways, we worked all the way through the weekend, gave them a report on Monday. I thought it was fantastic. You know, it showed lots of machine learning, showed lots of data science, and at the end of that report or the presentation, they were speechless. Right. But actually, that's not a good thing because I heard crickets at the end. And I'm like, uh, and then like the next day I was pretty much fired. And I was really upset at that time because, you know, the fact is, is that like, I knew I was a good data scientist.

But, I didn't know what they wanted, and there were a bunch of problems with that project, and it was like, you know, pushed into crunch time, and then all of a sudden, you know, I give them something and they didn't know what they wanted. They didn't know anything about AI. They didn't know anything about machine learning.

And I didn't know anything about like, what this presentation that they were giving. So I just like, tried to find some cool insights in the data. So, after this project happened, I really kind of stopped taking on some consulting gigs for a while, and I just took the next 2 to 3 weeks off and just really wanted to focus on what went wrong, what I could have done better. And that's really what created this framework that I developed. It's now called the Business Science Problem Framework, after my company, because this is the framework that we took. We took another popular framework called CRISP-DM, which, you know, a lot of people have heard of. What this did was it really took you step by step and showed you exactly what you need to do through every step, like where to get stakeholders involved, what questions to ask them, how to make sure you've got the right people. Once you identify a problem, how to cost it. We never even thought about like costing, like, is this data, is there a problem that we're trying to solve here?

Like, we didn't even have that. And that's like step one, right? So, all the way through the machine learning and then how to monitor and how to take the right steps through each of these steps. So that course that you had mentioned, the first one I started out with was actually Data Science for Business Part 2. It's now called Part 2, and then I eventually made a Part 1, which is kind of like the foundational stuff that you need before you take that. But that second course is actually really important because it teaches you how to walk through that framework. And that's really what, like, if you go into a job interview for data science and you basically can show that you can follow this process. The same thing is true in consulting; companies pay for processes. They don't want skills. They don't care that you can do all this fancy stuff. What they want is somebody who can walk them through a process and get them from where they are now to where they want to be in the future. And that's what this framework did. And in my consulting, I started implementing it on newer consulting projects and the satisfaction went up, it actually shrunk down the project times because we could do a lot of stuff faster. Now, now that they saw, you know, where to get them involved, when I need the data, when we need to have the cost information by, when we need to, you know, get the experts involved that are familiar with that process.

So, I wouldn't really change that. I think that's, you know

Sheikh Shuvo: Sounds timeless and evergreen.

Matt Dancho: Yeah, yeah, that type of framework. I mean, it's going to, no matter what happens with AI, you're still going to have to walk companies through this type of process. And I think that's really important.

Sheikh Shuvo: Talking to people and convincing them to do stuff spans every career.

Matt Dancho: And it's tough because they don't know much about data science and you don't oftentimes know much about their particular problem because they're the experts in that. So how do you meet in that middle ground and start to make progress? And that's what the framework does. It's really whether you realize it or not, as a data scientist, you are a consultant because you are working with diverse teams and trying, and you're never going to be the expert in those domains. You're going to have to rely on people to help you understand certain things.

Sheikh Shuvo: Oh, that's awesome. That's great advice. The last question I have for you, Matt, is, so many of the things that you've done across your career have been about finding problems that you have, or others have, and solving them. Sounds like the founding story for TidyQuant, but looking at the data science world, where it is right now, what types of things that you're doing, what are some data science tools that you wish existed? What are some startup ideas and inspiration you can share for all the listeners out there?

Matt Dancho: With generative AI, it turns out I'm advising another company called Quantex. What we're doing is working on time series problems and developing tooling for, and we're actually focusing on supply chain. So, things like inventory, to chat basically with your database and get the forecast, get the things that an analyst needs in order to make better decisions.

So, with this startup, one of the things that I try to do in an advisory role, where I'm not the key technical person building the thing, is really to try and get them to understand what problems exist out there. To get outside of the walls of where they're at now and talk with customers or potential customers, talk with people who have these problems.

And, once you start to understand their pain points, you know, that's actually the opportunity. So, for us, it was focusing on supply chain, because there's a lot of supply and demand disruptions. You think COVID, you think recession, you think, you know, a supplier from one day to the next may not be as reliable as they used to be.

So, there are a lot of things that can impact this, but any type of business has problems. And the closer you can get to that, the more people you can talk to, you're going to start to uncover these things. And that's really where, what I would suggest doing is, you know, uncover opportunity and really seek to solve those pain points. Any pain points are an opportunity.

Sheikh Shuvo: Awesome. Super, super tactical, actionable advice. Matt, thank you so much for sharing about your journey there. Looking forward to seeing what you're working on next.

Matt Dancho: Awesome. Well, thanks so much, Sheikh. Really appreciate it. Thanks for having me, and I hope you guys learned a lot, uh, a lot from my mistakes and also from some of the things that worked for me.

Sheikh Shuvo: Awesome.

View episode details


Subscribe

Listen to Humans of AI using one of many popular podcasting apps or directories.

Spotify
← Previous · All Episodes · Next →