Episode 32

#32. Exploring Data Engineering and Responsible AI with Vino Duraisamy

January 9, 2024 · 29:11

Sheikh Shuvo (00:00.943)
Fino, thank you so much for joining us. I'm really excited to dive into your background.

Vino Duraisamy (00:05.879)
I'm happy to chat. Excited to be here with you.

Sheikh Shuvo (00:08.764)
So, if you know, the very first question I have for you is, how would you describe your work to a five-year-old?

Vino Duraisamy (00:15.502)
But it's an interesting question you ask because I do have a five-year-old niece. And every time I go there, I'm like constantly trying to explain her what I do and everything. months, I just like, just put the word software engineer to her. And she's like, what does an engineer do and everything. And now we've reached the phase where I'm telling her, okay, so now I work with data, like, you know, trying to narrow it down a little bit further. And then now this, right.

Sheikh Shuvo (00:19.203)
Wonderful.

Sheikh Shuvo (00:38.667)
Okay, that's progress.

Vino Duraisamy (00:44.374)
So now we have a toy shop story where it's like, you know, remember you go to a toy shop and you want the specific Mickey and Minnie toy and then she also likes a bunch of dinosaurs and spaceships and everything. I'm like, imagine going into the shop, what if they ran out of World Such toys, right? You will be sad and the babies will start to cry if they don't find their favorite toys in there. So the toy shop owner, the shopkeeper, needs to know how many toys to keep of each.

Sheikh Shuvo (00:54.051)
Hehehe

Vino Duraisamy (01:12.194)
So when the babies come to buy, he will have enough toys for everybody. So like what I do is something similar to that. You know, you collect data saying, oh my God, babies like Mickey and Minnie so much. So I need to keep more of Mickey and Minnie toys or more of spaceship toys. And you know, doing such things will make sure the babies don't go back home crying. And that's very similar to what I do at work too. And like, that's where we are at. And I feel like that's probably the abstraction

Sheikh Shuvo (01:41.135)
Ha ha ha!

Vino Duraisamy (01:42.382)
I can probably, you know, build around what I do. But yeah, that's pretty much it.

Sheikh Shuvo (01:48.064)
I like how you introduced urgency and scarcity in there to like drive fear into your child, into your niece.

Vino Duraisamy (01:56.958)
Yeah, for them, I guess she is obsessed with, you know, the Mickey and Minnie toys. And every week I go there is a new name and there is an imaginary friend and everything. And I also tried the other way, you know, having what if this imaginary friend can talk to you and trying to build these, you know, AI assistants into toys and making a pastoral with it. But I don't think I got anywhere with it. But this one really helps.

Sheikh Shuvo (02:19.055)
Well, at the next birthday party, then you'll have another chance. Awesome. Cool. Looking at your career, you've done so many cool things in the data engineering world. Just taking a step back, what sparked your interest in the field?

Vino Duraisamy (02:37.25)
Oh, okay. That's a long way back, but it's super interesting. I was working as a software engineer at the time and I was with NetApp and I was an entry level person, right? I was just two years into my career, right out of my college. And there was an analytics project that came up and like, nobody in my team were, you know, excited to take that off because there was something on top of what they were doing already. And I don't think software engineers are excited about building Tableau dashboards, you know, on one sprint.

And I, I being this into cutlet, you come from usually when you're out of school, you're like, Oh my God, I want to do this. I want to do that. And you, you sign up for everything. And I, I did sign up for that. And that kind of changed everything for me. I mean, we were working on like customer support data as to, you know, what are the customer issues that are coming in and then categorizing into different product modules and presenting it to the product team saying, Hey, we see a lot of customer issues from.

Sheikh Shuvo (03:15.032)
Yeah.

Sheikh Shuvo (03:18.308)
Hmm.

Vino Duraisamy (03:32.918)
these specific modules and do you want to prioritize hardening of these modules before we build new features and things like that, the product team. At that time, I just took it up because, why not kind of an attitude? But then when I actually presented it, when I got the opportunity to present, I was presenting to the senior executives and the senior product leaders in the company, which I would otherwise have no reach to. Why would I randomly go present to, literally,

skips that are like four levels above me and five levels above me, right? And that's kind of when it like really struck me that, Oh my God, so there are certain projects that would have this kind of visibility and significance to the company and to everybody else. And it is like super cool to work on that. And, you know, that's kind of, I would say like the starting point for my data journey was I'm like, okay, so data is important and this is what gets a lot of.

Sheikh Shuvo (04:02.799)
Hehehehe

Vino Duraisamy (04:29.79)
attention from like literally everyone. And I, at that time I was also like trying to learn Python on the side. So that all came together like very nicely for me. And I think a couple years down the line, I decided to go for my master's, you know, focus on machine learning, data science and stuff, and that, you know, kind of took off from there, but that, that initial starting point was just a random analytics project that I just said yes to.

Sheikh Shuvo (04:36.059)
Mm-hmm.

Sheikh Shuvo (04:54.479)
So it's the first dashboard that changes the rest of your life. Yeah.

Vino Duraisamy (04:57.254)
Right. That's super interesting. Like you would not expect at all for an entry level person to be presenting to such senior folks and like seeing the value they got out of just a simple dashboard, I'm like, wow, okay, this is super interesting.

Sheikh Shuvo (05:13.968)
In that presentation to the executive team for that project, do you remember any of the questions you were asked? Any things you still think about?

Vino Duraisamy (05:23.186)
Ah, okay. I don't think I remember to that extent. I don't think so. But for me, it was like, it was just a two week effort for me. It's just like, oh, really? Is it that valuable? Like, looking at their reception to the presentation was like a big shocker to me. Because like, as a software engineer, right, you're working on sprint after sprint, trying to release new features and this, it's a long process.

By the time you see that your feature is popular, what is the adoption for a specific feature is like, it's almost at least five, six months for you to see the impact of what you did, right? And like this random two weeks of work is getting, okay, interesting. It feels more like.

Sheikh Shuvo (05:58.916)
Mm-hmm.

Sheikh Shuvo (06:04.183)
That's awesome. Yeah, well, hopefully it led to a nice bonus check that year.

Vino Duraisamy (06:13.425)
It is, of course.

Sheikh Shuvo (06:15.367)
Well, you mentioned that led to you getting your masters eventually. So during your time at ASU, I saw that you spent some time as an explainable AI researcher. Could you share a bit about what led you to that and how it might have influenced your future work?

Vino Duraisamy (06:36.826)
Yeah, I guess at ASU, I was working on a bunch of projects. One of them, like a bunch of them being like, you know, spanning different industries. One was on the supply chain optimization front. I was basically exploring the applications of different data tools that people are learning. And everything was application oriented and it was like, oh, how do we take this and put it to supply chain, how do we take this and put it into a language model and everything.

And this one, when I just looked at it and I was like, I have never really done a research, research per se. And when you're learning, like formally all these machine learning methods as well, when you go from these traditional methods like logistic regression and tree-based methods to CNN, suddenly everything starts to become fuzzy. You don't fully understand how these CNNs work or these neurons work. And there is no, it was difficult for me to.

form a mental model of how does this CNN work? How does this fully connected layer work? Because everything else is super clear for you to see. For a linear regression, you just have a straight up algebraic equation and you solve it and you get it. Like it's super straightforward. But when you're thinking about CNNs and the advanced, you know, deep learning models, it became a bit challenging for me to understand how they do. And then...

When I came across this opportunity where one of my professors who were teaching us machine learning was working on this explainable AI problem, I was like, I would love to sign me up and where do I sign up kind of thing. And that actually, it very much helped me understand in a CNN model, for example, how do these different neurons get activated and...

What are the challenges basically when you're working with a FSI model, like a non-deterministic CNN model, like everything? And we explore different methods to understand these neural activations. For example, when you feed an image to a CNN model, how does it actually classify if it is a dog or a cat? Or we used these most popular visualization tools, like saliency maps, that would help you understand, for example, when there is an image and the model has classified this image as a...

Sheikh Shuvo (08:26.864)
Mm-hmm.

Vino Duraisamy (08:51.082)
cat, how did it classify? So in the image pixels, like, is there an importance of seeing you look at the eyes and you'll be like, okay, so this makes you think that this could be a cat. And, like, saliency maps actually helps you to understand the importance of pixels in an image and tell you because of these different pixels here, I think this is a cat image or a dog image.

Sheikh Shuvo (08:51.835)
Mm-hmm.

Vino Duraisamy (09:17.918)
Exploring that really helped me understand, like more than, you know, solving a research problem or anything, it helped me form a better understanding of how a model really worked and everything. And then, like, we tried to do this reverse effect, which is also using another visualization technique called activation and maximization, which is you keep feeding different images to these neurons to see which neurons get these maximally activated values for what images. And can you actually say something?

Sheikh Shuvo (09:23.34)
Mm-hmm.

Sheikh Shuvo (09:37.829)
Mm-hmm.

Vino Duraisamy (09:48.018)
as to, say, for example, if you feed in, let's say, a cat versus dog classifier, CNN model, right? Can I identify specific set of neurons in the fully connected layer and say, these neurons are identifying the eyes, or classifying the eyes. They are only looking at these eyes. And can we build these feature maps for every feature in your image so all this put together will then

actually make a reasonable classification. It's just really like, you know, peeking into the black box and trying to understand like what actually happens. It was very exciting to actually go through that research and also like more than it being a research problem. At later stages of my career, I really understood how you need to understand how a model works to actually deploy in a real world situation.

Sheikh Shuvo (10:42.339)
Yeah. Can't just be a black box.

Vino Duraisamy (10:45.75)
Yeah, so all of this came really handy much later for me when I actually started with industry.

Sheikh Shuvo (10:53.027)
Oh, is that a space you still follow now? There's such importance around it, especially with the EU AI Act coming out soon, and explainability seems to be such a critical part of the regulatory scope there too.

Vino Duraisamy (11:10.038)
Mm-hmm. Yeah, I think for sure this whole, you know, safety, AI safety, responsible, like responsible AI, explainable AI in these, you know, peripheral fields, along with just these AI stuff, was going to take a lot of importance in the coming years, for sure. Because when you put together data privacy laws with LLMs and the AI regulation acts.

It's just like there's tons of work that needs to be done in the space. Like that is around AI these, you know.

Sheikh Shuvo (11:45.315)
Yeah, absolutely. Now, in addition to your core data engineering work, you've also, over the years, been a prolific writer, as well, just talking about your journey creating tutorials and content for people. I'm wondering, where do you draw inspiration from your posts?

Vino Duraisamy (12:06.138)
Ah, okay. I think when I started writing really, for me it was more like I just needed to share my experiences with other folks coming into the US. Like if you go to my Medium and see the first few blogs, it was about how to identify the right university, how to identify the right degree and how do you prepare for your masters before you even come here and like, you know, it's mostly about trying to help the fellow folks because I did not do a lot of

Sheikh Shuvo (12:27.803)
Mm-hmm.

Vino Duraisamy (12:32.662)
preparations the right way and try to like figure things out on the way. And it started out like that. And I think one of my biggest, even more successful blogs is up today was setting up a Hadoop cluster. Like it was the big data engineering course, and we were using one of the cloud solutions for the course and everything. And I was like, I had one of my mentors tell me that

whichever tool you work with, if you don't fully understand how they work, it's always going to be hard for you to use them. And I was like, so you just click a button and I get a Hadoop cluster that worked for me, but then how does it internally, you know, like how does that Hadoop cluster work internally? How is it all connected? And you would never understand that if you don't actually install these different things within your system yourself. I did that and I was like, oh my God, this is insanely cool.

Sheikh Shuvo (13:23.983)
Hahaha

Vino Duraisamy (13:25.066)
Why do I have to do that? Then I'm like, well, I anyway did it. I might as well put this out for folks who are trying to do that as well, because it might come handy for someone else. I just put together my notes and everything. If I remember correctly, I was struggling with this entire setup for more than a week just to install a couple of things. Like it was insane. And then I started putting that blog out and I got in like...

superhuman response. Like, I think even today after all the blogs and everything, that is the most popular blog. I mean, I can only imagine how hard it was probably even till today. And for me, that became more of an inspiration because, oh my God, okay, so I went through this. I just shared my, you know, thought process and like journey in some way. And I see that it helps a lot of people and

Sheikh Shuvo (13:55.588)
Hmm.

Sheikh Shuvo (14:00.687)
Hehehe

Sheikh Shuvo (14:15.822)
Mm-hmm.

Vino Duraisamy (14:22.262)
Part of my reason why I wanted to write was because one of my peers in my master's program, she started writing medium blogs and I was not even familiar with medium at the time. And like partly thanks to her that I started out on medium and started writing about it. I used to be a Cora person and I think I had tons of things that were going on Cora for a while, but then I feel medium is a little more.

Sheikh Shuvo (14:41.499)
Mm-hmm.

Vino Duraisamy (14:48.638)
I guess, organized and also it has a lot of these technical audience, not as much as oral. So it kind of helped for the topics that I wrote, it was very relevant to write on medium.

Sheikh Shuvo (14:53.211)
Mm-hmm.

Sheikh Shuvo (15:00.857)
Throughout all the content that you've produced there, has there been any piece that's been most fun to write? Was it the one you just mentioned?

Vino Duraisamy (15:13.046)
Ah, okay. The most fun, I think it was about one of their data engineer roadmaps. If it, to be honest, it's a big story. So when I started out, I was trying to focus, specialize in machine learning and everything, right? So I focused on this like computer vision models and also exploring a bit on the language models.

And I was working on both of them, you know, side by side, explainably AI focused on the object detection. And then I also had a capstone project focused on working with building, you know, natural, well, name and detail recognition with natural language models. So I was just trying to explore both of them simultaneously. And coming from the software engineering background, one thing that always, you know, stood out in my head was that full stack software engineering was quite the thing and I'm like.

Um, if the full stack ML or full stack data is going to become the next big thing too, and I need to prepare myself for that. And so I consciously took roles as data engineer and then ML engineer, and even did a bit of ML ops and trying to build this full stack ML engineering kind of an experience, and once I had that and I'm like, all right, I'm ready. So industry, like when, when are we going to become the next big thing in the industry kinds, but then it's, funnily the industry went in a whole different direction.

Sheikh Shuvo (16:22.947)
Mm-hmm.

Vino Duraisamy (16:35.006)
And now there was like data analysts and business analysts, data scientists, applied scientists, machine learning engineer, like it just like went in an entirely different direction. So instead of a full stack role, you know, we have. 100 different roles with overlapping titles and responsibilities. And it doesn't look like it would unify anytime soon, at least. So for me, I'm like, all right, anyway, we're not going to do this full stack role any like any soon.

Sheikh Shuvo (16:40.228)
Hahaha.

Vino Duraisamy (17:02.806)
But I might as well put together my journey of, okay, so when I wanted to consciously get data engineering experience, how did I do that? Or what are the tools that you need to learn? Or when, when you want to get into ML engineering, what were the tools that I needed to learn, what was the process was like? So one day I just thought I might as well treat them as separate entities and just write content around both of them. It took me probably like less than half an hour to just write a data engineering roadmap because

I was at the time a data and ML engineer at Apple and I saw tons of my, you know, like college juniors and my friends who wanted to get into the space asking me. And I'm like, instead of typing 10 WhatsApp messages separately, I would just put together a blog and then send it all to them. And that was, I guess, the most fun to write because we were just like pulling together resources from different places to just build a roadmap, then share it with like my closest circle, not necessarily for everyone out there.

But then that also ended up being one of the most useful blogs for people out there. So for sure, that's one too.

Sheikh Shuvo (18:07.063)
You found the personal pain and documented your own. That sounds awesome. No, that's exactly what I'm looking for. Now, in terms of looking at that progression of roles, right now you're at Snowflake and you're working as a developer advocate. Could you explain to people who might not be familiar with that term what a developer advocate does?

Vino Duraisamy (18:12.404)
Yep. Sorry for such a long-winded answer to a very small question, but that's horrible.

Vino Duraisamy (18:34.174)
Oh, okay. Um, I guess the developer advocate is more of a mix of roles. I wouldn't call it just like one specific thing. And I would say it, there are like three main pillars to developer advocacy. One is, I would say product. Second is content. And the third is community. And in, in the product space, what I mean by that is you constantly are in touch with the developers in conferences and meetups and these different forums, Reddit, Slack, or like Stack Overflow, like all the different

forums and then take the insights back to the product team about, you know, how was the user experience for this new feature or what new features are folks expecting from us and, you know, everything around the product and in a very subtle way, helping the product team prioritize the features and helping them their roadmap and everything. And then comes the content pillar, which I was anyway, you know, doing already. So it kind of helped augment that as well.

It's about creating content. It could be in any, you know, so many different forms, writing a simple blog post as to how to do this or an entire tutorial end to end about how to build a specific data project, or even like, you know, dating the documentation, creating YouTube videos about different features of the product and company and everything. So all sorts of content creation, for example.

And the third is community, which is constantly listening to the community about what do they think about the product, and not even just about the product per se. Just keep a pulse on what's going on in the community, what's going on in the industry, where are people heading to, what are the new things that are happening, and just keep a pulse on what's going on in the industry and the community per se. So it's like three different aspects of it. I guess the closest I could.

you know, make up a title that would make more sense for people will be developer marketing. I guess that's easy for folks to, you know, digest understanding like one go.

Sheikh Shuvo (20:29.247)
Interesting. Yeah. With so many different tools and platforms out there, especially now with the recent AI boom this past year with so many companies trying to create and nurture open source communities there, are there any mistakes that you see some of these companies making in terms of how to genuinely create and manage a community? Things you would advise against?

Vino Duraisamy (20:58.43)
Yeah, I mean, I guess the biggest, when it comes to like these different tools for me, I feel personally that one of the things that companies could be doing better is when you think about data or AI or ML, there are just tons of tools every day that's coming up. And every company has their own Slack channel or discard server where they expect you to join and like be part of the conversations that are going.

Sheikh Shuvo (21:20.225)
Mm-hmm.

Vino Duraisamy (21:25.738)
I mean, it makes sense for the company to have a community space for their users to come hang out. But when you think about it from a developer's point of view, I'm like, just because I'm playing with some tool today, doesn't mean I get, I want to be part of another Slack. Like imagine if I have 10 tools in my ML stack, you expect me to be super active in all the 10 Slack channels or 10 Discord servers. I mean, it's a big thing if I'm like super responsive to my actual work Slack, forget about another hundred.

Sheikh Shuvo (21:53.775)
Hahaha!

Vino Duraisamy (21:55.87)
impossible to expect from a developer, right? Especially in times like these where we have tons of new tools and features bombarding every day, every week. So I feel putting a lot of emphasis on their own first-party community channels. That's probably not a great expectation to have, for sure. And the other being, I feel the biggest takeaway for me, which I had personally also, was that to reach developers where they are.

Sheikh Shuvo (22:01.956)
Right.

Vino Duraisamy (22:25.058)
For example, I am constantly on the Reddit data engineering thread, like sub-Reddit, you know, trying to like, just see what's going on in the data engineering space. What's everybody else doing? And what do people talk about? And just what is top of mind for everyone? And I think just being there where usually developers hang out and instead of expecting them to come to you is probably going to be a biggest differentiator and I think, yeah, for sure. And another thing is I personally like absolutely love Reddit because

enables folks to give you like unabashed honest opinions about anything. Thanks to the power of anonymity. It's, I mean, it's going to be in extremes too, you know. Yeah, right. I mean, if you don't want to, but hey, you want it honestly, you're going to get it if it's in favor for you or not. But it's like super interesting because when you are talking about a new product, new feature, new company, it definitely, you know, pays to get honest opinions of your

Sheikh Shuvo (23:06.351)
gets too honest at times.

Vino Duraisamy (23:25.81)
like users or consumers. So that's one of my favorite tools too. I mean, discard is anonymous too, but for some reason I've never been able to like the discard UI, just, I don't know why, I don't know. Yeah, it is for sure. So like Reddit so far has been a super useful tool for me.

Sheikh Shuvo (23:38.135)
It's an acquired taste.

Sheikh Shuvo (23:45.083)
Well, along the lines of community then, Snowflake just had its big annual developer event recently. What are some of the announcements coming out of Build that you were most excited about?

Vino Duraisamy (23:58.946)
Mm-hmm. Ah, OK. So we had Build last week. And most of the demos and the outputs that we saw from Build were basically from our summit announcements. So whatever we announced at the summit, six months down the line, Build was an opportunity for us to actually put our features in public preview or actually show the working demo of the features and everything. So the developers or the builders can actually get an idea of what

like we promised earlier in the year. And for me, I was personally presenting a couple of topics around data engineering, one around iceberg tables. And the other is what are the new features in data engineering that have come up. Like, you know, dynamic file access, external access and latest Python support and more. But the biggest highlight for me was the LLM bootcamp where we actually got hands-on experience trying to fine tune.

Sheikh Shuvo (24:42.373)
Mm-hmm.

Vino Duraisamy (24:54.85)
the Lama 7b model with Snowpark container services, the GPU compute and everything. It was the most fun, like super, I guess, if I have to call out like one highlight of the entire build summit, it would be that. I would highly encourage folks to check it out too, but unfortunately it won't be a hands-on lab anymore, but you can still go watch the demo and it is super fun.

Sheikh Shuvo (25:21.103)
Oh, it sounds like a blast. Along those lines then, looking at ML ops, there are lots of companies right now that talk about evolving that into LLM ops. I'm wondering between those two, are there any key data engineering differences there, or is it largely marketing?

Vino Duraisamy (25:47.015)
I guess there are some overlaps for sure between MLOps and LLM Ops, right? The way we look at it, if you're already doing MLOps, LLM Ops is probably a specialization of it because LLMs are just another type of models that you're operating as part of your whatever workflow that you have. So in terms of the infrastructure, it's just that LLMs need GPU compute and just the scale of the LLMs are kind of...

It's not just like a few MB model that you're going to put it in a job lib file and then use it for inference. It's just like kind of scaled up version of normal classical ML models. But for me, when we think about the LLM ops, there's like tons of new challenges. I guess we don't even have solutions for some of them, right?

LM, after you deploy the model in production, there is this concept of you have to keep monitored and observe what's going on with the data drift and like the model drift because the model performance does degrade over time. And there are also concepts like retraining your ML model over a certain period of time because say you're building a recommendation like a marketing recommendation, a product recommendation from the marketing team and

If a user says their personal data needs to be deleted from your system, you delete the data. And then in the next retraining of the model, the deleted user's data is not gonna be part of it. So it's all cool. When it comes to LLMs, again, because of the nature of LLMs themselves and since we cannot, so what happens, we trained your LLM with the user's data and then tomorrow the user wants to delete the data.

I'm like, yeah, but how am I going to do that within the LLM? How are you going to remove the user's data from an LLM that's already trained, which is completely fuzzy, and it's impossible to make any changes to the model itself? So there are some challenges that are not resolved at the research level. It's not even an ops problem. It's more like an LLM research problem. I guess very similar with evaluation, too. There are no one.

Sheikh Shuvo (27:36.912)
Hmm.

Vino Duraisamy (28:01.014)
set of metrics. I mean, we do have Helm as like Helm benchmark by Stanford as the unified evaluation framework that we all use. But then the evaluation is still like an ongoing problem. Like it's not fully solved research problem, but then how am I going to deploy my LLM models in production if the evaluation is kind of still not fully done? So there are like

Sheikh Shuvo (28:07.844)
right.

Vino Duraisamy (28:28.834)
tons of challenges that come with, you know, operationalizing LLMs. And it's just gonna be a lot of interesting days, you know, ahead of us to just see how that's gonna be. But the infrastructure level, I feel it's very similar to ML Ops, but then at the conceptual and the regulatory level, there's just gonna be tons of new things that we need to kind of realign ourselves to.

Sheikh Shuvo (28:50.223)
Right. Oh, well, with all the innovation and changes in what those new things are, looking ahead, are there any aspects of AI tool and platform development that you're following particularly closely?

Vino Duraisamy (29:08.546)
Well, I guess all of the LLM ones are specifically around the responsible AI stuff, because I'm really curious to see how that is going to evolve as a field. I know we do have these regulations and everything come up, but still, say for example, we all were familiar with BERT by Google. And for a while now, I think some 2017, 2018 was BERT.

Sheikh Shuvo (29:28.792)
Mm-hmm, of course.

Sheikh Shuvo (29:35.331)
Sounds about right.

Vino Duraisamy (29:38.678)
but was capable of reading an entire book of 500, 600 pages and comprehend and make sense of it. And that was a huge thing at that time. And for OpenAI to come up with a new model and just release it to everyone, literally just release it out to everybody, kind of makes you question, I'm like, okay, so there was this company who also had very similar or very advanced ML models or AI models.

But they were very cautious about what they release it out to the world. Right. But then there are also now because of the disruption, thanks to OpenAI, now there's just new models that are coming out every day, but what is okay to put it out to everybody and what is not okay to put it out for everybody's access and this whole conversation about regulation and the safety of AI models is just. Can it be super interesting, but it's also something that I'm constantly thinking about.

Because for me, to be working in a company, sitting in San Francisco, it's easy to think about, you know, disruption and like the advancement that comes with the AI models. But sometimes I look back and I'm like, but how is this gonna affect my parents? Because I read just a random news story about some old lady somewhere in Phoenix getting a call that was, of course, the voice generated from one of these, you know, generated AI models. Yeah, and I'm like, yeah, those parents, I mean, they are gonna...

Sheikh Shuvo (31:01.775)
the spam bots.

Vino Duraisamy (31:06.882)
They're not going to be familiar with all the fancy stuff that we do at work. And they're going to think it's easy to scam people like them. So I'm constantly thinking about how can we make sure it is safe for everyone really. So that's constantly on top of my mind for sure.

Sheikh Shuvo (31:24.587)
Absolutely. Well, Navina, the very last question I have for you is that let's say you're just graduating school right now and you've been studying data engineering. There's so many different ways to evolve that career track and lots of cool companies to be at. What advice would you give to a new data engineer as they're evaluating where to go? What type of things should they be looking forward to?

know that they'll have a good experience as a data engineer at a company.

Vino Duraisamy (32:01.682)
I think it should be like a two-pronged approach when they think about like getting into data engineering, for example. One is to get the foundations right, whatever it is. Say, forget about the tools and the stacks because there's going to be a new company every week that's going to solve one specific niche data engineering problem. But then foundationally, like are you good with functional programming? Can you write Python or SQL or even Scala code?

just get yourself familiar with the foundations and like be super strong in that. But then the second aspect is also keeping a tab on what's going on in the industry. For example, today as a data engineer, I'm only building data pipelines. And let's say if you are familiar, right? Like we started out with just traditional SQL and then we went on to like this NoSQL databases, DocumentDVs and they were quite a thing. Then there was graph databases and whatnot.

And as a data engineering person, depending on what kind of data your company is working with, you will need to get familiar with these different tools as you go. But now with LLMs, say vector databases, right? You would have to get yourself familiar with vector databases too, because it will not be too long before these companies will have their own LLM training pipelines. And as a data engineer, it will be on you to create these embeddings in like...

create this training data for your AI models. So you will have to keep a tab on the industry, what's going on, how is it relevant to me, how does it affect me, and constantly kind of upscale from there. So I guess the second part is to, on this solid foundation that you've built, how can you build these extra skills that will kind of set you apart? And like, how can you, I guess.

All of us can treat ourselves as our own startups. Like you could be a one person startup, because if you run a company, you cannot just be like, oh, you know what, I'm making so much money today. I have good customer base. I'm just going to run with it for the rest of my life. It's not possible, right? You're watching the industry. Where is it going? And am I getting obsolete? Am I getting irrelevant? How can I pivot myself into what's going on today? I guess it's very similar to us, you know, even as persons too.

Sheikh Shuvo (34:05.723)
Hehehe

Sheikh Shuvo (34:18.699)
Oh, that's a perfect analogy. Well, Vino, this has been a super fun chat. Thank you for sharing about your background and your vision there. Excited to see what you're cooking up next. Thanks.

Vino Duraisamy (34:30.812)
Thanks for having me. It's always fun kind of looking back and you know, think through all those days and see where we are today.

View episode details

Listen to Humans of AI using one of many popular podcasting apps or directories.

← Previous · All Episodes · Next →

#32. Exploring Data Engineering and Responsible AI with Vino Duraisamy

Subscribe