← Previous · All Episodes · Next →
#08. Mastering Machine Learning and Beyond with Ali Shahed Episode 8

#08. Mastering Machine Learning and Beyond with Ali Shahed

· 41:32

|

Mastering Machine Learning and Beyond with Ali Shahed

Sheikh Shuvo: Hi everyone. I'm Sheikh, and welcome to Humans of AI, where we meet the people behind the tech that's changing our world. Today I have a really special guest, Ali Shahed, who started his career as a telecommunications researcher, but for the past 10 years has been leading ML teams around the world. Ali, thank you so much for joining us.

Ali Shahed: Thank you for having me.

Sheikh Shuvo: The very first question I have for you, Ali, it's something I ask all of my guests, but if you had to describe your job to a five-year-old, what would you say it is?

Ali Shahed: That's a good question. My son asked me a lot and he's not five, but he asks me a lot and I struggle sometimes to explain what it is. But if I were to explain to a five-year-old, it's that around us, data is everywhere. And if the five-year-old asks me what data means, I can tell them that whatever we do as a human society, we create. Whether we create products or services, we generate numbers and we measure things with numbers. We invented this as a human society to measure things. The main reason is that we want to make sense of what's happening around us. For example, we measure temperature outside. We do that because it's not enough for us to say that it's cold or hot outside. We want to know more accurately what the temperature is. Is it 56? Is it 66? If it's 70 something, I can go swimming, but if it's 60 something, I cannot go swimming because it's too cold.

So, we have these numbers everywhere, from temperature to the hours in a day; we divided the day into 24 hours. We create numbers and in recent times, maybe 70 to 80 years ago, we started saving them everywhere. We just put them there and these numbers help us understand what's happening in the world. Sometimes we want to understand what's happening right now, sometimes what happened before, and sometimes what will happen. And this is one of the biggest things that we do. So my job is this: I look at these numbers that human society generates and try to make sense of the world. If there is a problem, I look at numbers and figure out about them. If something is not a number, I turn it into a number and understand it. So that's how I can describe it.

Sheikh Shuvo: That's a great answer. It's one of the more macro philosophical ones I've heard. So, it must be both a blessing and a curse to be your son, then.

Ali Shahed: Yes, because he will probably come back and say, "I didn't understand what you said."

Sheikh Shuvo: Oh, cool. Well, Ali, you've had a super diverse career, working in various industries and parts of the world. Can you tell us about your career story and how you landed where you are?

Ali Shahed: Some career trajectories, if you ask a person how they did stuff or how they ended up where they are, they may have a more deterministic way of saying it, and it's replicable. Some people say, "Okay, I went to this high school in New York, graduated, and then my dad was a Harvard alumnus. So I went to Harvard, then dropped out. I'm talking about Mark Zuckerberg, for example. I just dropped out, started a company, and here I am now." Yep. Mine is not like that. This is typical for people who were born outside the U.S., especially in the Middle East and such places, because they have to figure out how to do most of the things they're doing. So, I...

Sheikh Shuvo: Sounds like my life too.

Ali Shahed: Yeah, so in many periods of my career, I decided what I wanted to do. But in some stages, I had to do things that I didn't even like, but I had to do them. In the end, it all accumulated to the point that I'm here now. So, the reason I say that is if somebody listens to what I did, just don't try everything that I did, because some of them were not really smart decisions. But anyway, my background is in electrical engineering. I finished my bachelor's in Iran, focusing on power systems. Then I went to Finland, got accepted there for signal processing as a master’s, and then pursued signal processing in telecommunications for my PhD.

Then, when I finished my PhD in Finland, which is a really interesting country because everything was free up to that point, no student loans and such. And even when I finished, there was this system in Finland, the Ministry of Education had this PhD pool. The idea was that you send your thesis, papers, and everything, and each year they choose 15 to 20 people from the whole of Finland. They give them a one-year grant to go outside Finland and do research. You have to find a place that hosts you. So I went on a long journey, physically, because I had to go to different countries to find the places I wanted to talk to people in Berkeley, Argentina, UCLA. In the end, I talked with my supervisor and decided to take the one in UCLA. I stayed there for about one and a half years, did some research, published some papers, then, like 90 percent of people in academia, I had to decide whether to continue or not.

When I came to the U.S., I decided this is the place to be if you want to do something, at least for me. I looked for academic positions, but without a network in the U.S., it was hard to find a good research position. So I turned to the industry. I landed a position in a company making chips for Wi-Fi and cable modems. My work from my PhD was particularly attractive to them, and they wanted me to help build a crucial block they needed. I worked there for about two and a half years, made some contributions, then decided to move to machine learning and data science.

During my PhD in signal processing, we had courses on neural networks. At the time, it wasn't as cool as it is now. But it was really interesting to me, a memorable one. Around 2013-2014, when I was looking for the next step in my career, deep learning started making noise, and I found it really interesting. There were lots of changes from when I took that course in my PhD to the rise of deep learning.

Ali Shahed: So then I thought, you know what, I really like this. I did some research like this and decided to start looking for a job. Initially, I knew I didn't know Python. I knew a lot of algorithms, which is transferable because of signal processing, but I didn't know Python. So, I started learning Python, took some online courses, and then I landed a job as a really junior data scientist in the first startup I worked with. I didn't have much experience, and it was my first time working in a startup. It was pretty fast-paced and somewhat chaotic.

I continued and then moved to another startup. That startup was better for me as I had become more acclimated to the startup mentality. We worked there, but the startup couldn't raise money. So, after about a year, just when I was transitioning from a data scientist to a senior data scientist, the startup folded, and I had to find another job. Then I went to an insurance company, worked there in the research department, and after that, I moved to a company in the financial sector.

My last job was at Meta, which was related mostly to my telecommunications background but also a bit to data science because that's what they needed. And that's when everything started happening. What I've done in the last five to six years is more of an exploration for me. The common theme I always carried with me was natural language processing. If someone asks why I chose natural language processing as my passion, I don't have a really attractive answer. The main reason is that the first startup I worked with, while learning Python and machine learning, wanted me to work on natural language processing. I started doing it, and gradually, I just realized that it was really interesting. Even when I was in insurance, I always thought about natural language processing, its limitations, and that's why when I learned about GPT-2, GPT-3, and especially GPT-3.5, I tested it and realized that someone had blown a hole in the wall I was always hitting with natural language processing. Now it's time to go back and try all the things I couldn't do before because of the tools that were out there. That's what I've been working on for the past eight or nine months, focusing on my own ideas.

Sheikh Shuvo: Well, tell us more about that. My understanding is that your startup, which is still in stealth, is building consumer products using Large Language Models (LLMs) extensively. One of the questions I have about that is, with so many different, great open-source model architectures out there, what was your decision-making process in choosing which open-source model to use?

Ali Shahed: That's a really good question. At least for now. And just a disclaimer, whatever I'm saying here, don't try at home because LLMs are pretty new in general, and everyone in this area is experimenting. So don't take what I'm saying as advice because two months later, the landscape might completely change.

But for now, what I decided to do, which is much easier for me, is that I'm not really concentrating on open-source models right now. The reason is that I realized there's so much exploration needed in how to build something with LLMs, not the LLM itself. What do you want to build with LLM? That means taking that and solving some societal problem with it. I don't have time to look at open-source models or wrestle with them, like questioning whether they are good or not. So, from an MVP or proof of concept perspective, I limit myself to small, closed-source models that I know perform at a top level as a benchmark. Because of that, for now, I'm building everything almost exclusively with the ChatGPT API. And for explorations outside of it, I try the cloud from Anthropics.

Probably my next step will be looking at the same models or similar ones on GCP or AWS platforms, because now Vertex AI and Google are coming up with that. So at least for this stage, to build the POC, I'm working with the closed model. But I'm almost 100 percent sure that in maybe at most six months, we will have models that you can efficiently host yourself.

One of the problems right now is the cost of hosting open-source models. Even small ones are extremely expensive compared to ChatGPT. I don't know how OpenAI manages to keep things low. Maybe it's the scale they have. But if you want to have something like Falcon or Lama 2, it's getting really confusing because each of these models comes out, they come with a benchmark, and they say, "Okay, we are better than this model. We are better than that one." But then when you test them in practice, you realize that not all the things they claim are exactly correct. Everybody's claiming to be better than the other one, which is really confusing. And the benchmarks we had for natural language processing don't work anymore because we are way past those. Those benchmarks were for traditional machine learning models. But so far, I realized that these benchmarks don't make much sense.

For example, Lama 27, the 70 billion parameter version of Lama 2 from Meta, is a really powerful model. Its performance is good, not exactly like ChatGPT-4, more like ChatGPT-3.5. But then you realize that when you want to host this, it's so big that even for an MVP, it may not make sense if you want to build something, make an API, and connect it. It just doesn't make sense.

Because of that, I am waiting for now. Like, for example, I know that Bedrock on AWS started hosting Lama 2. It's still expensive but kind of manageable, not equivalent to ChatGPT but still workable.

My approach to open-source is that even these large language models, although they say they're general and can do whatever you want, like write code or translate, not all of them are equally good at every task. For example, for code, I always go to ChatGPT because the codes are less painful to get and then make them actually run. My experience with BERT from Google for code wasn't that good, but for writing and editing, it's really effective.

You can use some models really well for documents. If you want to chat with a document, I always go to BERT because they have this option where you can upload your documents and just talk with them. It's really powerful from that perspective. So that's my approach for now.

But probably in the next couple of months, I will choose some of the tried and tested models. I'm waiting for everyone else to test them and let me know which one is best because I don't have time to test them all. And there are some winners emerging, like there's news about this company, I don't remember the name, but they have a model called Mistral. It seems really powerful and small, which is interesting. Then there are some fine-tuning methods now with quantization, which means you can get a really small model or a large model and make it small through quantization. Some people even say you can fit these models into edge devices.

So, I'm waiting for this transition phase to be over because now everyone with hundreds of GPUs is training large language models. I'm waiting for the filtering to happen. In the meantime, I'm trying to figure out what I want to do with a large language model, and then I'll go from there.

Sheikh Shuvo: If only there was a waitlist function on HuggingFace, I think the lines would be gigantic.

Ali Shahed: Yes, actually, HuggingFace is also one of my go-to places when I want to... And one thing I'm not sure your audience knows about, but it's really good for them to know, is that there are lots of models there. Whoever puts the model there always includes a small demo that you can test your things with. So, if you have a particular application in mind with a specific performance, you can always go there and try it out. Google also did that for Vertex AI, so you can go there and write prompts to test a couple of models. It's really consolidating and becoming easier for people to decide what models to use, but it's still too expensive.

Hosting an LLM with reasonable performance is quite costly.

Sheikh Shuvo: To stay up to date on the latest and greatest things, are there any particular sources you like to follow, like blogs, papers, or anything along those lines?

Ali Shahed: Yeah, I think you need to be like a Hoover, like a vacuum cleaner for me. So currently I don't have one resource. I have a couple of podcasts that I follow. One of them is from New York Times, I think it's Hard Fork. I just like it. These guys, New York Times journalists, do tech, and every week they have this really interesting one. And mostly dominated by LLMs and AI, because that's what's happening.

And there are a couple of other, more in-depth podcasts that I kind of go to from time to time. I browse the topics, and if something interesting comes up, I listen to them. Then the other thing I follow normally is Twitter. I use it a lot for AI and ML. I have a list there for AI and ML and I check it every day to see what's happening. I recommend for people to go there on Twitter and find a couple of famous people like Hinton and others. Then from there, you can find other people who are really famous and gradually make your own list.

Then you realize some of these people are theorists, they like to talk about theory, and then gradually you find builders. Builders are really interesting because they are early adopters, cutting edge. So when they say something, it means that you're probably the second person who actually knows about it because they just built it.

Twitter is really effective for that. And then, of course, I also subscribed to some Substack channels. From time to time, I get emails from them about what they wrote and I read that. So these are my ways of keeping up with what's happening. I used to look at arXiv, but it's overwhelming. It's so much, and not always good for builders because what you want is to get something and build something with it. Everything there is not peer-reviewed, so you don't know what you're seeing.

Ali Shahed: Actually, it's correct or not, unless you spend a significant amount of time to read through, or you have really good knowledge of that particular area. But from time to time, I read that. For example, when Mistral came out, they said, "Oh, we have this model, which is, I think, 7 billion parameters or 13 billion parameters. But it is as good as something like Llama, which is 70 billion." And I actually saw it on Twitter, someone from that team mentioned it, and they provided a link to the arXiv paper. So I went to arXiv to see what they did, because how is it possible to have a really small model work like that?

Then you go there, and I realized, okay, this sounds really interesting, especially the attention method they had. And of course, I didn’t go deep, I just skimmed it. The good thing is, if you follow famous researchers on Twitter, there's a good chance that when that arXiv paper comes out, in a matter of weeks, one of those experts will read it and write a small blog about it. Then you can go there and understand what happened, because when you read the paper, you might not get it, but when someone explains it, it becomes clear.

And YouTube, of course, there are a couple of YouTube channels I follow. Mostly they are channels that skim through all the interesting things that happened, they take a couple of papers, and then explain them in a simple way so you can understand. If I'm really interested in that one, then I go and read the paper.

Sheikh Shuvo: Those are great tips. I really like the analogy of needing to be a vacuum, just sucking it all in. But the challenge of being a vacuum is it also relies on you having a really good filter, unless you just get way too stuffed. So, with that perspective, the last question I have for you is for someone who's just graduating college right now and wants to get involved in the world of AI, there are so many things out there and so many different voices. What advice would you give them, or even someone mid-career looking to get into AI? What are some ways to filter job opportunities and where to start?

Ali Shahed: Okay, there are two different pieces of advice because some people are graduating from a master's program or they're coming out of college with a major in data science and machine learning. So, they probably have a better idea of what to do. Maybe I'm not the best person to give them advice because I wasn't in that situation. When I entered machine learning and data science, I had some ideas from an algorithmic perspective, not the job market.

One thing I can tell in general is, first of all, don't commit to any technology. It means don't say, "Okay, this is what I want to do with this particular tool, and I want to do it for the next 15 years." It's not going to happen. Whatever you choose, when I say don’t commit to tools, tools are like, for example, when I started, everybody was using scikit-learn. Then TensorFlow came, and then there was this kind of rival, I forgot the name of it. These tools are coming and going, but the most important thing right now is there are some basic stuff you need to know. Like, for example, if you know about classification.

If you know the concept of classification, if you know what regression is, then there are many tools out there you can use and build with. If you know what Random Forest is, or even XG Boost, which is a bit different because XG Boost is just a tool and the method is the same. Tools are not important, the knowledge is important from that perspective. So if you know the concept, then you can find tools to build with it. Even though Python is the king now, I'm not sure how long it will be because things are coming in from the pipeline. Every six months you see some company making something and saying it will dethrone Python. It hasn’t happened yet, but let's see what happens. This is from the tools perspective.

The second part is doing some exploration. As I said, data is everywhere. You can take data from anywhere and build things with machine learning and data science. Find a topic or numbers you care about. For example, if you're interested in education, explore what you can do with data science tools and machine learning in education, or if it’s healthcare, then focus on that. Don’t be disappointed if you go into an area and realize it's not what you want because that's exploration. Do some exploration first for maybe a couple of years, and then gradually you'll figure out what you want. That means you will find the area you really care about. Some people already know what they want to do when they come out, and that's fine. This is more for people who are completely fresh and have no idea.

And the other thing is being ready for the dark side of data science engineering. When you are at the beginning, there's this concept that data science is a sexy job, but when you get into it, you realize there are lots of challenging tasks. Most of the time, you are dealing with real-world data. From outside, it might look like 80% of the time is spent building models and 20% on data, but

Ali Shahed: And then you just do it. But it's exactly opposite as 80 percent of the time, even in big companies, 80 percent of the time is spent on the acquisition of data, cleaning of the data, and putting it in a sort that you can get something out of it. Because if you have bad data, your model will be bad, right? The output decision will be bad, so you need to work on that. That's the part where 80 percent of the time a data scientist is actually trying to acquire, clean, and format data so they can build a model with it. So don't get disappointed when you go somewhere and realize that you have to spend like six months building the data, and then the model is actually just four lines of code.

The interesting part is that this was before LLMs. After LLMs, I don’t even know if those four lines are necessary because now you have things like ChatGPT. You can take your data, and when open source models become more manageable, you can have them on-prem or on cloud for yourself. You can give all your data to an LLM and say, "Okay, now build a classifier for me." But you need good data; that 80 percent is always there. Building models is becoming easier and easier to the point that maybe you don't have to be a machine learning expert to build it. Of course, you need to test whether what the LLM produces is correct or not, but we are getting there.

Actually, I did one example a couple of days ago as a test. I uploaded a dataset and said, "Make a regression model for me to predict something," and it did. I didn't go through and test it properly, but it built it. The code was there, it trained it, and even gave me the quality parameters.

Sheikh Shuvo: That concept of blood and tears in data science sounds like another great podcast episode right there.

Ali Shahed: Yeah, that's a name. 'Data Science with Blood and Tears.'

Sheikh Shuvo: Well, Ali, thanks so much for taking the time. Lots of great advice from your life story for our listeners. If anyone wants to get in touch with you, what would be the best way to get in front of you?

Ali Shahed: Oh, I have a Twitter account. My Twitter account is MLHobbyist, like machine learning hobbyist. I mostly use it in Persian, but I have an English account, which is mlhobbiesen, in English. And of course, they can send me an email. I would be happy to respond. My email is alishive@gmail.com.

Sheikh Shuvo: Yeah, I'll make sure to include the link to your Twitter there. Well, that's all I got. Thanks again, Ali. Much appreciated.

Ali Shahed: Thank you very much for having me.

View episode details


Subscribe

Listen to Humans of AI using one of many popular podcasting apps or directories.

Spotify
← Previous · All Episodes · Next →