· 26:08
The Evolution of Linguistics in AI with Karin Golde
Sheikh Shuvo: Hi, everyone. I'm Sheikh. Welcome to Humans of AI, where we meet the people that build the tech that's changing our world. Today, we're diving into the world of language models and linguistics with Karin Golde, Founder of West Valley AI. Thanks for joining us, Karin.
Karin Golde: Thanks. It's great to be here, Sheikh.
Sheikh Shuvo: Absolutely. Now, you have such a cool background, and I'm going to dive into that shortly. But before we start, Karin, if you had to describe your job and what you do, to a five-year-old, what would you say?
Karin Golde: Yeah, that's a good question. I think I do two things. So one thing is that I find ways for humans to communicate with computers using everyday language. But in order to do that, I have to do something that's also very challenging, which is find ways for humans to communicate with each other about how to find ways to communicate with computers using everyday language. That's probably a much bigger problem to solve. Yeah, definitely.
Sheikh Shuvo: Oh, awesome. I love that.
Sheikh Shuvo: Now, you started your career as a linguist. Can you share your career story and how you got where you are managing large language models?
Karin Golde: Uh, yeah. So, I got my undergraduate degree in linguistics at the University of California at Santa Cruz, and then went on to get my graduate degree at Ohio State in theoretical syntax and semantics. And all of that was really stemmed from a deep interest in language that started early on. And I just love the fact that language is such a deep part of our humanity. It's our wisdom, it's our personality, it's our identity, it's how we connect. So that's really what it all was about.
And then when I was at Ohio State, I was working on using a theoretical framework developed by my advisor, called HPSG. It was a kind of framework that really lent itself to being implemented in computational systems. And so there was a startup in Mountain View called YY Software that was using this parser to automate responses to emails. And so my advisor recommended me for a job there. It turned out I was employee number five. I had to sort of figure everything out on my own, with the help of a really great team of engineers and entrepreneurs. So that was how I started, and that got me hooked on startups.
So that's just how I always saw myself as someone who could just pick things up and run with them. And so I went on to join four more startups after that. And there's sort of a common thread there of building software that is driven by natural language processing. So, with that, I eventually worked my way up to executive roles, and that also added another dimension, which is bringing me closer to the business side, to understanding customer needs. And so I really started to enjoy that challenge of being the bridge across the range of teams, that communication angle that I mentioned. Teams that have various degrees of technical and business depth and helping them understand each other.
Awesome. So, yeah. So just to close out the story, from there I went on to my most recent role at Amazon. I was there for a couple of years, leading a group of about 20 linguists on the AI data team at AWS. A great team responsible for providing the training and evaluation data sets for all the various language technology programs that you can use with AWS.
And it really is this data side of things. Getting the data right is really, I think, at the heart of any AI operation. It determines what the models learn. It determines how we decide whether they're working. So, again, like that was a really fascinating and rewarding position.
Sheikh Shuvo: And I saw that you recently founded West Valley AI. What are you up to there?
Karin Golde: Yeah. So West Valley AI is my strategic consulting firm. What I want to do is help companies who want to adopt AI in a way that's clearly aligned with their business values, and their business objectives.
So, the services I provide are really focused on language data because I'm a linguist. And as I mentioned, these projects really begin and end with the data. So I can describe to you a little bit more about exactly what I mean by all of that. So, part of it is about evaluating whether it's possible to achieve your goals with the data you have.
Suppose you're trying to create some kind of bespoke sentiment analysis and you want to do it on product reviews, but all you have is Twitter data. Is that going to be good enough? Or are you going to have to get a different mix in there? How can you leverage the data you have and identify and overcome any kind of risk that might be involved with it not being quite the right data? That's a really common scenario.
The other thing is designing the workflows for data labeling and human evaluation. There's really no way to avoid humans in the loop. I know these large language models seem almost human already, so aren’t we done with humans having to evaluate things? But honestly, even with the most advanced generative AI, you need to have someone who understands the goals and can do that kind of evaluation. There are, however, a lot of really interesting techniques to automate and scale up these workflows, including using large language models as one of the ways of evaluating whether your model's output is correct. But I think there's a lot of really interesting science behind all of that.
And then also just defining the metrics you need to calculate if your project is on track. You might think, well, once my product's in production, I can see, based on how people use it, how well it's working, but you want to know ahead of time whether it's on the right track or not. So what metrics do you need to measure things like accuracy or appropriateness of responses, and how do you make sure they're well understood by all the stakeholders? I think it's important to have some kind of dashboard or regular reporting that keeps the business and technical teams aligned. Continuous alignment across teams is just so critical. The most common issue I see is surprises cropping up late in a project because there was some kind of misunderstanding about what we mean by accuracy or something really fundamental like that.
Sheikh Shuvo: That's a great point there, along those lines of keeping teams on track and making sure that everyone's communicating. You mentioned different types of dashboards to unify the metrics. Are there any particular types of tools or frameworks that you'd recommend that teams use to accomplish that?
Karin Golde: Um, yeah, I mean, as far as dashboards and things like that, you know, anything like Tableau or whatever will work fine for you. I think there's a lot of really interesting companies out there right now who are working on things like democratizing access to the data or enabling subject matter experts to get closer to the data. I personally use just something simple, like a data labeling platform, and there's a lot of them out there. I've used Appen a lot. There are others like DataSore, and I'm not advocating for one over the other; they're all kind of good in their own way. But just having the subject matter experts get exposed to the data, doing their own labeling, and then kind of seeing the error analysis on where the models are going wrong. Any kind of reporting or visualization that really makes sure what's going on is really well understood by everybody.
Sheikh Shuvo: Makes sense, makes sense. Taking a step back a bit, as it's impossible to learn about what's going on in the world of AI and not come across all the concerns around it from an ethical perspective. And a term that I see a lot is responsible AI. As you're advising different companies on what their strategies should be, what the limits of data are, how do you evaluate and develop product roadmaps to tackle biases that might be inherent in language models and the underlying data sources?
Karin Golde: Yeah, so it depends a lot on the product itself, but I think a lot of the recent concerns have been around large language models in particular. I will say these kinds of biases have existed in all kinds of AI systems for quite a while now. We've seen this in facial recognition, for example, where black people are more likely to be falsely arrested using facial recognition technologies because their faces are not represented as much in the training datasets. This kind of thing spans a lot of areas. But I think it has been even more of an issue with large language models because they do just sort of vacuum up the Internet and internalize whatever kinds of biases are out there.
So, when you're thinking about a product and product roadmap, you need to think about what kind of data your models have been trained on. If you're using large language models, what sorts of biases do they have that could affect your workflows or your customers?
Let me give a simple example, or one aspect of this. Statistically speaking, doctors are more likely to be men and nurses are more likely to be women, at least in the United States. So, if you take a generative model, like the ones used for ChatGPT, it deals in probabilities. If you ask it to make up stories or scenarios involving doctors and nurses, it will go with the most probable scenario of making the doctors men and the nurses women. That might not be a problem for any given story; it reflects an aspect of our life. There is bias in the real world. But in aggregate, it means that you're not going to get the kind of diversity in your output that you're looking for. So if you think about that kind of bias repeated across all different kinds of areas, that's how it's going to amplify and reinforce whatever is out there and converge to something that's monolithic in a way that the world is not really monolithic.
And there's another effect that this bias has, which I think is really interesting, which is on accuracy. Suppose you have some text where the doctor is female and the nurse is male, and you ask a large language model to summarize that story for you. Summarization is a very common use case for using language models like this. That story will likely contain male and female pronouns referring to those people.
But because of the model's preference for understanding doctors to be male and nurses to be female, it will be more likely to have accuracy issues there and get confused about which pronoun refers to whom and what actually happened in the story. So, when you think about how biases sort of insert themselves, there are all kinds of repercussions that you have to think about. There are datasets that have been developed to probe for things like gender bias, and you can apply these datasets off the shelf to see whether another thing that's come up in the past.
And I'm not going to attribute this to any particular model that's out there right now, but because of certain ethnicities being attributed certain characteristics in the world, you end up having the model start to learn that there are more positive connotations for some and not for others. This has led to systems in the past, for example, that do sentiment analysis where you have what seems like a neutral sentence like, "I ate Mexican food for dinner last night," and you compare that to "I ate Italian food for dinner last night," and the Mexican food example is going to get a lower sentiment score. It'll be more likely to be interpreted as negative.
Yeah, so, these kinds of things, there are, like I said, people are aware of this, and so they've developed datasets to try to probe for that. But you also need to think about what's going on in your particular use case, and you'll probably want to develop some custom evaluation metrics for detecting bias like that.
Sheikh Shuvo: Interesting, interesting. You mentioned a dataset to help identify gender bias, but are there any other, say, industry-accepted, like a bank of prompts and unit tests, any tools out there that help to identify bias? Or is it very much as you're saying, developing your own tests there?
Karin Golde: I believe there's a lot out there now. I can't name all of them off the top of my head, but it's certainly been a very active area of research and they're constantly evolving. I will say, though, with anything, it's just really important to think about how. Back to my sentiment examples of product versus Twitter reviews, things that seem at a very high level to be the same thing might not really be the same thing. So you need to leverage what you can off the shelf, but think about how that might be different from what you're actually using.
Sheikh Shuvo: Nice. Awesome. Shifting gears a bit, looking at the field of AI as a whole, it seems like there are two groundbreaking announcements every single week, and the pace of changes is only accelerating. Since linguistics is such a core component of it, how do you see the field of linguistics as a whole evolving as AI advances?
Karin Golde: Yeah, really interesting, because there's been a long history of, you know, it hasn't always been called AI, right? So that goes in and out of fashion. It's a fashionable marketing term, but when you say AI, I interpret that as meaning machine learning in general. That's how it's usually applied right now. And I will say that in some ways, linguistics, as I learned it in grad school, theoretical syntax and semantics, it's a pretty broad-ranging field, but it tends to be focused on symbolic representations of structure and meaning.
So you're trying to develop a system that predicts whether a given sequence or structure is possible and what meaning that sentence might have. Whereas machine learning techniques take more of a brute force approach, looking for statistical patterns and learning from the data itself. This serves a probabilistic approach, and there's been a lot of pushback from a certain segment of linguists on that, saying it's just statistics and doesn't tell us anything about linguistics.
But large language models are making people reevaluate what those limits might be. I think there is a balance where a lot of linguists agree that there is some element of statistical probability in how humans learn language. We do try to predict the next word, or where things are going. We learn from probabilistic patterns ourselves. At the same time, the kinds of errors that machine learning models make, like the pronoun confusion I mentioned before, reveal the ways they're not really encoding meaning in the same way that humans are. They're not evaluating truth conditions and saying, "This sentence has this meaning because these are its components.
And when I build them up in a certain way, this is the result of what it means. So, I think there are strengths and weaknesses in both approaches. If we're looking to make natural language processing work, actually, most AI systems in production today do incorporate a combination of machine learning and rules-based systems. So, you'll have some symbolic systems to do what they do best and some machine learning to do what it does best, and you combine those into a hybrid system. That's really more what's going on under the hood for most of the systems you're using today.
And, regarding how the field of linguistics is evolving, even if you're a really staunch advocate of symbolic systems, there are still a lot of ways in which machine learning offers to process and understand language data that can help bootstrap or provide new insights into theoretical fields of study.
Sheikh Shuvo: More reason to collaborate.
Karin Golde: Yes.
Sheikh Shuvo: The last question I have for you is, let's say I'm a linguist just starting my career, and I'm interested in getting involved in the world of AI. What are the roles I should be looking for, and how would you evaluate different jobs?
Karin Golde: Yeah, I think one basic thing to think about is if you're more interested in research or in applied science. I personally have always been more interested in application. It was kind of a relief when I finished grad school and everything felt so hypothetical, armchair, and on paper. But at the end of the day, I was like, what did I accomplish? I put another paper out in the world, and that's just me. What I want to do is go out and build things that people use, even if they don't work perfectly from a theoretical perspective, they serve a purpose.
So, figure out where your inclinations are and start figuring out what direction you want to go in from there. Even if you're looking at more applied science and you want to join a company where you're really building products, different companies work at different paces. So think about what kind of pace you want to work at, look at who your colleagues will be, who you want to collaborate with, who you want to learn from, and what kind of mark you want to make on the world. AI has so much potential right now to go in so many different directions, and linguists are a really important part of that conversation and development. So think about what your contribution will be to the kind of world that you want to see.
Sheikh Shuvo: What are the types of job titles and things like that, that usually signify having value for a linguist?
Karin Golde: Um, yeah, so if you're going a more technical route and if you're going into computational linguistics, which is a really hot field right now and has been for a little while, you could be an applied scientist or research scientist. Those are a couple of standard titles. Let’s see, you could be, if you get even more technical, a machine learning engineer.
I'm going to get a little water here. Of course.
At Amazon, the role that I had and the people who reported to me, we were language engineers. As I said, those are the people who construct and evaluate datasets. It's a great title. Yeah, yeah. At Google, I believe those same people are called analytical linguists. So, there's kind of a range of titles. I think you want to search on skill sets. The titles are a little bit hard to nail down.
Sheikh Shuvo: Makes sense. Makes sense. Cool. Well, Karin, if any listeners want to connect with you online, what's generally the best way to get ahold of you?
Karin Golde: Yeah, find me on LinkedIn. I respond to direct messages. So, yeah, DM me and I'll respond. I'm in the process of getting my website up. Hopefully, it'll be up by the time your listeners hear this, but you should also be able to check things out there. I also do speaking engagements and I'm on various podcasts. So just keep an eye out for more of that.
Sheikh Shuvo: Awesome. Well, thanks again for joining today and sharing more about your world. Thanks, everyone.
Karin Golde: Thank you, Sheikh.
Listen to Humans of AI using one of many popular podcasting apps or directories.