· 20:34
Behind the Scenes of AI-Driven Manufacturing with Zeeshan Zia
Sheikh Shuvo: Hi everyone, I'm Sheikh and welcome to Humans of AI, where we meet the people building the magic. Today we have a very special episode. I'm joined by Zeeshan Zia, co-founder and CEO of RetroCausal, a really interesting AI company in the manufacturing space. Thank you for joining us.
Zeeshan Zia: Thanks for having me, Sheikh. I really appreciate it.
Sheikh Shuvo: Zeeshan, one of the first questions I have for you before we dive into the details is, if you had to describe your work to a five-year-old, what would you say?
Zeeshan Zia: We make computers see. We build algorithms that allow a camera and a computer connected to it to understand procedural human activity, like building a Lego toy or other physical activities. Our camera understands what you're doing. If you make a mistake, the system guides you and offers alerts. For example, if you're building a rocket ship with Legos and put the wrong piece, the camera looks at what you're building, and you'll get an alert saying you're building it wrong. AI can also help you come up with different rocket ship designs based on what you're building. It's like an extra pair of trained eyes that guide and assist you through a process.
Sheikh Shuvo: You should start selling that to parents. I think I could use that with my two daughters when we're doing projects.
Zeeshan Zia: My PhD was partly funded by Qualcomm. At that time, Qualcomm had an augmented reality business unit called Vuforia. I contributed a lot to the Vuforia product, especially our work with Lego. There's a product called Lego Fusion. It's what we do; you take out your smartphone, and if you're building the right thing, the system guides you or gives you alerts.
Sheikh Shuvo: Does this mean you have some pretty cool Lego kits at your house?
Zeeshan Zia: I do have an 8-year-old and a 5-year-old. I love playing with Legos, especially building Lego Mindstorms, robots with Legos. I enjoy playing around with their augmented reality features and other stuff as well.
Sheikh Shuvo: That's super fun. I have a six-year-old and an eight-year-old, so I imagine we face a lot of similar challenges. Okay, I mentioned your PhD work in computer vision. It seems like you've been working in the field far before it was cool. Since then, you've worked in a very diverse range of teams. Can you tell us more about your career journey and how you got to where you are?
Zeeshan Zia: Absolutely. I did a master's in electrical engineering at the Munich University of Technology. I spent two years at a robot vision lab, building systems for collaborative robotics, where the robot can see what you are doing and build something with you. Specifically, we were working in kitchen environments, trying to see if robots can collaborate in making things like pancakes, just for fun. From there, I went on to do a PhD focused on computer vision, human activity recognition, 3D object pose estimation problems. This was before the days of deep learning, from 2007 to 2014. We were building a lot of handcrafted rules and statistical models. During my PhD, I worked with Qualcomm, won awards from Microsoft Research, Best Paper Awards, and such. Some of this work was applied to self-driving cars, predicting human behavior, both the driver and pedestrians. I then did a postdoc at Imperial College London, UK, extending this work to real-time systems that could operate at the edge with very low latency. After about a decade in Europe, I moved to the U.S., first in the Bay Area working at an industrial research lab, and then at Microsoft in the HoloLens division. HoloLens is a head-mounted display device, an augmented reality device. Afterwards, I founded RetroCausal.
Sheikh Shuvo: What was the inspiration to go from Microsoft and HoloLens to founding RetroCausal?
Zeeshan Zia: At Microsoft, I had the opportunity to work on the HoloLens. It was, in my opinion, a revolutionary device and the pinnacle of computer vision technology at the time. We saw it as the next generation of computing. My role involved contributing to the science of HoloLens, building new computer vision algorithms, and also working with early enterprise partners. I interacted with customers, identifying their problems and informing the research team on how we could improve HoloLens or plan for the next two to five years. Insights from these customer conversations led to the founding of RetroCausal. I convinced my colleagues at Microsoft to join me, and off we went.
Sheikh Shuvo: What types of customers are you serving at RetroCausal?
Zeeshan Zia: At RetroCausal, we use cameras, computer vision, and generative AI to assist manual assembly processes, optimizing them. A camera observes an assembly process, like in a Honda factory where a worker is tightening a bolt or putting a cover on top. If an operator makes a mistake, we offer an alert in real time to help them fix it. This eliminates defects and helps untrained workers get up to speed with sophisticated build processes. We also use generative AI alongside computer vision to help industrial engineers optimize these processes, making them more comfortable for workers and optimizing head count and quality. Our focus is on high volume, low mix manufacturing, such as in automotive, medical devices, and hardware sectors.
Sheikh Shuvo: Interesting. Are most of your projects on the edge, and have you encountered challenges with the size of the models, especially for the generative AI work?
Zeeshan Zia: Our platform is hybrid. We perform tasks on the edge, especially for very fast builds, like completing an engine assembly in 35-40 seconds, where real-time feedback in under a second is crucial. This requires edge devices. However, when an industrial engineer or a production supervisor needs to understand the process, gain insights, or use generative AI to improve the process, they don’t need real-time insights. The live piece collects data and understands what's happening, while the generative AI piece is served through a web portal. We serve generative AI models through the cloud and even conduct edge model trainings there, but the live video processing happens at the edge.
Sheikh Shuvo: Makes sense. With such a global customer base and new EU AI regulations, has that impacted your work?
Zeeshan Zia: We are working on our GDPR certification and are close to getting it. We haven't faced problems so far, and I don’t foresee any. When training our generative AI models, we are careful to use data we have rights to, like expired copyrights or public sector data. We curate our own internal datasets, allowing us to comply with regulations.
Sheikh Shuvo: As a state-of-the-art company, are there any particular areas of ML research that you follow closely?
Zeeshan Zia: We focus on multimodal foundation models and unsupervised machine learning. My co-founder, Dr. Kwakuetran, a PhD holder, regularly publishes papers at major conferences. He leads a research team in our company contributing to unsupervised machine learning and generative AI, particularly multimodal aspects, combining video and language data. We actively follow and contribute to this field, publishing papers at top peer-reviewed conferences like CVPR and NeurIPS. Those are the broad themes we keep an eye on.
Sheikh Shuvo: I assume that the core vision models you have are developed by yourselves, but for the generative AI components, are you using models from big companies and fine-tuning them? What's the split between using big tech models and your own?
Zeeshan Zia: Yes, we utilize resources like Meta's Lama 2, which comes with good commercial licenses. We take Lama 2 and pre-train it for the language piece using open-source data with proper permissions. Then we modify it, using only its text data processing head, and attach our own video and sensor backbones to it. We have a few trainable layers in between. This combined model is then applied to our own datasets focused on factory process improvement. So, we benefit from Lama 2's extensive training and investment but also add our own industrial engineering-focused text data and multimodal capabilities.
Sheikh Shuvo: Interesting. If I'm a recent college graduate focusing on computer vision and interviewing with different companies, what questions should I ask to know if it's a good place to do computer vision work?
Zeeshan Zia: That's a great question. You should understand the maturity level of their existing AI technology and their product roadmap. Often, companies build products around technology that's a decade old. While these can be valuable, they may not leverage recent breakthroughs. You might end up working on wrapper software rather than contributing to AI development. For instance, in our space, some companies use open-source models for person tracking and build wrappers around them. But if you're modeling an assembly process, you need to understand not just how the human body is moving but also how it interacts with objects and the assembly itself.
Zeeshan Zia: In assembly processes, the scene can change and is not standardized. You can't just assume that a certain hand position or pixel boundary in an image indicates a specific action. It's crucial to discover new objects in the scene, which are different from common objects like cats, dogs, or cars found in public academic datasets. Understanding the interaction between the human, the tool, and the assembly is key. The real solution to industrial problems is often more high-tech than just picking the lowest hanging fruit. Businesses might not like this approach, as it goes beyond just addressing the immediate customer problem. However, embracing a paradigm shift is often necessary to optimally solve these problems. Like Henry Ford's insight about faster horses, it's about seeing beyond the immediate request. Engineers should understand the broader vision of the company they're joining, its product roadmap, and whether it's focused on simple solutions or building something truly revolutionary.
Sheikh Shuvo: Cool, Zeeshan, this has been a super fun conversation. Thank you for sharing more about your world.
Zeeshan Zia: Thank you, Sheikh. It's been a pleasure to be on your podcast.
Sheikh Shuvo: Cool.
Listen to Humans of AI using one of many popular podcasting apps or directories.