Episode 35
Jeff Bier
Edge AI and Vision Alliance

 

Jeff Bier, Founder of the Edge AI and Vision Alliance and President of BDTI, joins TECH B2B Marketing’s Winn Hardin and Jimmy Carroll on the Manufacturing Matters podcast to discuss the latest in all things related to edge artificial intelligence and computer vision. Topics included emerging applications and key technology trends, including vision-language models, which use language to improve visual perception, and multi-modal models, which meld inputs from multiple sensor types. Jeff also shared highlights from the Alliance’s recent survey of vision application developers, and he provided a preview of the annual Embedded Vision Summit.

edge_logo

Edge PG.mp3: Audio automatically transcribed by Sonix

Edge PG.mp3: this mp3 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.

Jimmy:
Hello, everybody, and welcome to this live broadcast of the Manufacturing Matters podcast. My name is Jimmy Carroll. I have the pleasure of being joined with my colleague Winn Hardin and Jeff Bier, the founder of the Edge AI and Vision Alliance and president of BDTI. Jeff, first, I want to thank you for taking the time to spend with us today. I really appreciate it.

Jeff:
Oh, thanks for having me, Jimmy. It's good to be here.

Jimmy:
Thanks so for those who don't know out there, I guess it'd be helpful if you could give us a little bit of a background on the Edge AI and Vision Alliance and what you're most excited about, and then maybe BDTI as well.

Jeff:
Sure. So very briefly, the Edge AI and Vision Alliance is an industry partnership founded about 13, 14 years ago, focused on accelerating and facilitating the use of embedded computer vision and other forms of machine perception to solve real-world problems in industry. And as a practical matter, we do that through a lot of educational programs, including an annual conference and trade show called the Embedded Vision Summit. This industry alliance is an outgrowth of an engineering consulting firm called BDTI that I've been running for many years. And for the past 10 years or so, BDTI has also been focused on embedded computer vision. We provide contract engineering services to help companies who want to incorporate embedded computer vision into their products. We especially work in industrial and consumer applications.

Winn:
You know, one of the things that's most exciting to me about the Embedded Vision Alliance all these years is that Jimmy and I generally come from more of the industrial space, the industrial vision, AI, motion control, robotics, etc. But you and your team and your segment represents kind of the spreading of traditional technologies into consumer-based products, into new markets that are way beyond the plant floor. So I've always looked at you guys as being one of the vectors for really driving these technologies beyond into new spaces.

Jeff:
Yeah, it's super interesting, right? If you go back 20, 30 years, manufacturing inspection was really I would say the most important commercial application of computer vision and where a lot of the innovation was happening in the technology because it was the most important commercial application. And there really weren't very many other commercial applications at that time because the technology was limited in its capabilities. It was expensive, big, complicated. So, yeah, if you're manufacturing, you know, a million dollars of something a day on a production line and you could improve your your yield by 5% with a computer vision system, you would do that. But for most applications where computer vision would have provided benefit, could have provided benefit, it was just out of reach. The technology was too complicated, too limited, too expensive, power hungry, big and bulky, and so on. Now that's changed dramatically in the last 30 years, especially in the last 10 years. And the pace of innovation is accelerating. So now things have flipped. And now, although of course manufacturing inspection continues to be a really important application of computer vision, the center of mass of innovation has moved elsewhere. Who would have thought, right, consumer electronics companies like Apple are ahead of manufacturing companies that have been at this for decades because they have such scale that they can afford to have 1,000 computer vision engineers, or actually many thousands of computer vision engineers, and invest in developing the technology.

Jeff:
And so you could look at that as, oh, isn't it a shame, other sectors are ahead of industrial in terms of their innovation and their adoption of computer vision, but I prefer to look at it as an opportunity. Now, if you're an industrial, it's not all on your industry to figure everything out in advance all the technologies. You can look and say, Oh, look over there. That thing they did in the iPhone, who would have thought we could adapt that and use it in manufacturing? Oh, look over there. The thing they did in that car for driver safety, we can adopt in manufacturing and adapt to our needs. So it's actually a fantastic . . . it's a little bit of an inversion. Instead of homegrown innovation, it's an opportunity to adopt innovation that's being done in other sectors of the economy. And I think it's actually a tremendous opportunity because so many billions of dollars and so many brilliant people are being invested in advancing this technology that the pace of innovation, like I said, is accelerating almost beyond imagination. And so companies that are smart about plucking the right bits from other industries, from other kinds of applications, have a tremendous opportunity to really leapfrog and advance what's possible in industrial.

Jeff:
I'll give you one concrete example, because this is all pretty abstract so far. You know everybody's heard, of course, about this really scary incident with the Alaska Airlines Boeing plane where the plug door blew out in flight. And now it looks like the reason was some bolts were not properly installed when the plug door had been removed to enable some repairs on other parts of the aircraft and then reinstalled. Now, if we were talking about a printed circuit board going down an assembly line, you would expect to have machine vision inspection such that if bolts were missing or chips were missing or whatever, that would have been immediately flagged and that piece would have been put aside for rework or just discarded. But for this kind of manual task, these bolts are put in by hand, of course we don't have machine vision inspection systems because it's been too hard. It's been beyond the standard technology. This is not looking at still images under very carefully controlled conditions. This is looking at video to understand sequences of operations. And those operations are being done by humans who move in infinitely varied ways, wear different-shaped, different-colored uniforms, have different-shaped bodies, and so on.

Jeff:
And so to have a computer vision system that can monitor and oversee that manual inspection process and say, Hey, wait a minute, guys, we forgot some important steps here. This aircraft is not ready to move on. That is now within reach. Not primarily because of work that people have been doing in manufacturing inspection but primarily because of work that people have been doing in other fields, like sports analytics. When you see how the pro sports events, they have real-time computer vision understanding, for example, for soccer, which team has more possession, percentage of passes that get intercepted, and so on. It's the same kind of technology. It's very messy human movement under kind of varied and challenging conditions. And it's video. You can't understand this from images. You have to understand the sequence of images. So that is coming into maturity now, that capability, and it's going to come back into manufacturing such that hopefully mishaps like this Boeing issue with the missing bolts don't happen because, yes, humans will always make mistakes. Having the machine supervise is I think an important way to catch those mistakes so they can be corrected early.

Winn:
So what you're saying is Vegas should be giving some kickbacks to the rest of the industry right? because, you know, this last Super Bowl was, what, $165 million in betting I think went through Vegas on that particular day, which is all dependent on stats, you know, and online betting largely, to a large extent. So Vegas, we're waiting on the phone call whenever that comes. You know that makes me also think, and it's not just in operational maintenance modes where you're completely uncontrolled. We've got friends over in Israel, Kitov, for example, who are using AI with standard inspection to be able to look at a server stack with literally hundreds of different wires connecting all the different ports and everything and validate that these very organic elements, which are going to be laying all kinds of different ways — from the connector to the connection itself, the cable connection itself — are present in the right location. I mean literally hundreds of inspection points that we would have never thought about, which could be seen in so many different ways from different perspectives.

Jeff:
Yeah and that's because of deep learning. That's really the essential AI breakthrough that has enabled us to create these visual perception systems that can deal with these very messy, unstructured, complicated environments, like you were talking about a nest of cable connections, and understand: Yeah, okay, all the cables are present and they're connected properly or not. Another great example of that, something that really wouldn't have been practical 10 or 20 years ago but now is becoming increasingly deployed, is agricultural applications out in the fields. So, for example, Blue River Technology, which is now part of John Deere, they have this precision spraying technology where those giant agricultural machines with the giant, 100-foot booms that are spraying herbicide or pesticide or fertilizer, instead of blanketing the entire field and having 90 or 95% of the sprayed material wasted, they use cameras to say it's basically a friend or foe situation. Is that a weed or is that a crop seedling? And then they precision spray whatever it is, the herbicide, pesticide, fertilizer, only where it needs to go. And so they're able to by an order of magnitude reduce the amount of herbicides and pesticides and fertilizer that they're using. This never would have been possible with classical computer vision techniques because differentiating like a weed seedling from a crop seedling is just too hard. There's infinite variation there. But thanks to deep learning, now we can do that reliably.

Jeff:
So, yeah, deep learning has really been a huge, powered a huge advance in what's possible now with visual perception. And what's very exciting to me is, as amazing as that has been over the last let's say six or eight years in industry that deep learning has come into widespread use, what's coming, starting now, in the next six to eight years is going to be even more amazing, is this positive feedback cycle where largely a few giant companies and universities and research labs have been putting so much effort into advancing the state of the art that the machine perception capabilities are advancing at an incredible speed. So, for example, I was mentioning being able to interpret video, what's going on in a video versus still images. I would say that largely is becoming practical today. And you see it deployed in certain applications, including automotive safety, largely due to this just massive investment in academia and a few very large corporate research labs over the last five or 10 years, which literally is billions and billions of dollars and thousands of brilliant people working on this. And so it's a little bit hard to contemplate. You know, things have changed dramatically in the last few years. They're going to change even more dramatically in the next few years.

Winn:
Ag-tech is one of the most exciting things to me, so I hope we can suss out through some of the rest of our conversation some of those future-looking applications that you're talking about, so we can go into more detail. Jimmy, your turn brother. I keep dominating.

Jimmy:
Yeah, no worries. Thanks, Jeff. You mentioned the last six or eight years in terms of AI and the growing adoption over that time period. I think it's fascinating because at least on the industrial side, like at the Vision Stuttgart show I think it was eight years ago, the marketing hype cycle kind of came out around deep learning, and it was being promised as this tool that's going to change everything. And initially it came out and it seemed as if, at least on the factory floor, the manufacturing side, people weren't really sure what to do with it or how it would add value. And certainly it was not going to replace traditional machine vision or computer vision. But it does seem to me, and Jeff, correct me if I'm wrong here, but it does seem to me that in the last couple of years, there's been just a growing number of cases where people seem to really have found a way, like a really novel and useful way to have these deep learning technologies augment existing computer vision, machine vision systems to completely create a new set of opportunities and capabilities. Like you're saying, both on the factory floor and beyond. Is that kind of your perception of it too?

Jeff:
Yeah, I think so. I mean, six, eight years ago people in industrial and manufacturing applications were saying things to me like, Oh yeah, deep learning is cool, but that'll never work for manufacturing inspection because these algorithms are only 95% accurate and we need 99.999% accurate. And I can understand where those people were coming from. But the thing you gotta keep in mind is, don't bet against deep learning. And the reason is, traditional computer vision or for that matter any kind of traditional algorithm field that you want to name — compression, cryptography, you name it — that is powered by humans, there's a limit to what humans can do. There's a limit to individual human ingenuity and productivity. And there's a limit to how many humans can effectively collaborate to create one thing. These limitations do not exist for deep learning. Deep learning thrives on data, and as long as you can keep aggregating data and keep throwing compute at training the networks with those data, it can get better and better and better. So a great example of this is, if I have some strange, difficult-to-diagnose health situation, and I go in to see my local primary care doctor, that primary care doctor probably won't be able to get a correct diagnosis because she's probably never seen this condition before. So she might refer me to a specialist in the nearest big city.

Jeff:
I travel a few hours. I go see the specialist. Now I have a better chance of getting a correct diagnosis. But let's say my situation is very unusual for some reason and that specialist is also stumped. Well, maybe now I have to fly across the country to the really top, top specialist. Well, the world we're getting to in deep learning is where my local doctor is going to be able to put my symptoms, my labs, and so on into an AI system that is going to aggregate the expertise of not one but all of the top experts in the field. And it's going to very quickly come back with a proposed diagnosis. It still needs to be validated by an expert. But the point I'm trying to make here is that deep learning allows us to aggregate knowledge because of being able to aggregate data and train these networks. And so I would never bet against deep learning in terms of being able to do better than humans at any kind of algorithmic problem. It may take a while. And one of the things that's been a key limiting factor has been the availability of data. So, for example, in manufacturing inspection, I've talked to a number of companies who say, Yeah, there's a problem here because we make these vision inspection systems, but many of our customers will not allow an image to leave their factory.

Jeff:
They're like contract manufacturers for high-profile brands. And the brands, those places are locked down like a maximum security federal prison. Nothing can go out. So how are you going to train your deep neural network with no data? Well, the answer is you're not. However, part of this coming wave of visual AI or perceptual AI that's starting to emerge out of research labs and come into industry is offering ways to circumvent the data bottleneck. And I know one of the things on many people's minds these days in connection with AI is generative AI because of how ChatGPT burst onto the scene and and has become such a phenomenon. So let me talk about that, because that's part of what's reshuffling the deck here. So people look at ChatGPT, a form of generative AI, a large language model, and if you're in computer vision you think, Wow, that's amazing, but it doesn't really have anything to do with computer vision. That's what I initially thought too. But digging in deeper and talking with real experts in this area, what I've come to understand is actually large language models are going to fundamentally change how we do computer vision. And one of the reasons why they're going to do that is that language really embeds conceptual understanding of the world in a way that images and computer vision don't.

Jeff:
So let me make this concrete with an example. Let's imagine that I've given you the assignment of creating an AI system that recognizes airplanes flying in the sky. Well today what you would need to do is collect a whole bunch of images of airplanes. And depending on my specifications, you might have to collect a lot of images, different airplanes from different angles and different lighting conditions, different liveries from different airlines, etc. You might have to collect thousands, tens of thousands, hundreds of thousands of images in order to train a current-generation deep neural network like a YOLO or something like that to reasonably accurately recognize airplanes. And if you did a diligent job, you'd be able to get very good results, probably 99 point something percent accurate recognition under the conditions that I specified. But what if you don't have access to those images or it's just prohibitively expensive to get them? In fact, what if you live in a parallel universe where you've never seen a flying machine at all? You don't know the concept of an airplane. Okay, now you have a real problem in terms of training a classical deep neural network. And this is in fact the problem that occurs often in manufacturing inspection applications because, for example, we're looking for scratches on the case of this laptop.

Jeff:
Well, what does a scratch look like? Infinitely varied right? But going back to the airplane example, if we merge models that understand language with models that understand vision, you have the ability now to essentially say to the model, Well, you've never seen an airplane, but imagine a metal thing in the sky with something that looks like outstretched arms. Two axes. Exactly. See what you can find. And because due to its understanding of language, it's not merely recognizing words and matching words, it actually understands the concepts. The model probably will be able to identify airplanes. Now it probably won't do perfect on the first try. Like it'll accidentally classify some birds as airplanes or some utility poles. And you'll say, Ah, no. If it's more vertical than horizontal, it's attached to the ground, it's probably not an airplane. If it's flapping its wings — well now we're getting to video understanding. But okay, airplanes tend to have linear shapes; birds tend to have more curved, organic shapes. And so you kind of have this dialogue the way you do now with ChatGPT. If you use ChatGPT as I do, often on the first try, it doesn't get me what I want. But I go back and I say, That's not quite what I meant.

Jeff:
I meant this, I meant that, and after two or three tries, often I get what I want. So one of the things that is super exciting about this: Okay, this merging of language understanding and image understanding I think will open up possibilities to create application-specific visual AI that doesn't have to be trained with tens of thousands or hundreds of thousands of images, but it has been trained with millions of general images of the world. And so it understands, for example, what an airplane is. And now you can say, Well, I'm looking for a commercial passenger airplane that is flying, and it'll have a pretty good idea of what you mean on the first try, and then you'll have a dialogue with it and you'll refine it. And so I think that's super exciting because that offers a hope of breaking this training data bottleneck. One of the activities of my industry group, the Edge AI and Vision Alliance, is we survey developers who are creating these visual AI systems every year. And one of the things we asked them about is, What's hard for you? What's making your life difficult in terms of developing these systems? And very consistently, issues related to training data are at the top of the list. Obtaining the training data, doing quality assurance on the training data, doing labeling on the training data.

Winn:
I thought you were going to bring it back to tagging, actually, when you were talking about the interaction between the LLM or the language models and inspection, because in the initial baby steps, you have to have that tagging element. You have to have consensus among different trainers and everything. But you're definitely talking the next level. I wanted to ask if the competitive advantage that leveraging AI deep learning models in today's operations, if that's also pushing these companies to be less proprietary about their datasets, maybe more willing to share, I mean, realizing that the benefits actually outweigh the potential liability risks and therefore maybe we're seeing more sharing? But it doesn't sound like that's necessarily the case.

Jeff:
I haven't seen it. I think there's a lot of paranoia. For example, another industry where this is a big problem is medical.

Jeff:
I've literally had people where I've been trying to get them to come give presentations on some amazing innovations they're doing in computer vision for medical applications, and they're like, Well, I might be able to do it in a private invitation-only event where there's no recording allowed and you don't distribute my slides, because they're really locked down because of privacy rules, and it's appropriate. But sharing data with a vendor so they can train a network, like really hard. On the other hand, the flip side is if you're the company that has one of these flywheel business models where you have a natural way of capturing the data, you're in a great position. So one example I really like is there's this startup called Hayden AI. I don't know if you guys have spent much time in Manhattan or other big cities like that, but Manhattan in particular I'm familiar with. And one of the things that's so frustrating about Manhattan is their bus lanes. But taking the bus is slower than walking because New Yorkers do not pay attention to the bus lane rules, and people will drive in the bus lanes, they'll park in the bus lanes, they'll park their truck in the bus lane and unload it. And the chances of getting a ticket are very low. So they'll take that gamble. Well, Hayden has a system that goes in every city bus. It has a forward-looking camera, and if you park in the bus lane, you're now going to get a ticket because the camera is going to recognize your vehicle's parked in the bus lane, it's going to read your license plate, and you're automatically going to get a ticket, same as if you went through like a toll bridge without the the correct tag.

Jeff:
So now the probability of getting a ticket from abusing the bus lane goes from near zero to near 100%. So one would expect that is going to significantly drive behavior change because people are going to rack up thousands of dollars in fines, they're going to get their licenses revoked, and so on. And so now these guys have the potential to completely unblock the bus lanes in Manhattan and make riding the bus actually a thing that works. It's faster than walking, which then causes fewer people to drive and has this fantastic effect potentially on the economy and the environment. But the point that's really so relevant here is every mile those buses drive, Hayden AI collects thousands of frames of data of images that they can then use to improve their algorithms. And when something goes wrong, let's say somebody gets a ticket that shouldn't have because there was some weird situation where a delivery truck was carrying a mirror and it looked like this vehicle was in the bus lane but actually it was in the next lane and the license plate was reflected or something bizarre like that, well they get that feedback and they're like, Ah, okay, now we know that that's a thing and we can adjust our algorithms to watch out for that case. So companies that manage to create a business model and get some initial deployment such that they can continuously collect more data and use that to improve their algorithms, that's really the ideal situation, but often challenging in manufacturing-type applications, medical-type applications because of concerns about protecting privacy in the case of medical applications and proprietary . . .

Winn:
Investors . . .

Winn:
HIPAA requirements or investor concerns, I'm sure J&J and Merck and Pfizer don't really want people going out there sharing the next billion-dollar idea on a regular basis.

Winn:
Right.

Winn:
And yet, I've worked with a number of companies during COVID, for example, that were applying deep learning models to PCR, large-scale screening machines and getting much better results as a result. And we all know the mammography deep learning models that are helping to differentiate micro-calcifications from cysts and really help to not scare people with false diagnoses but make radiologists so much more effective. Well, I'm just hoping that over time that risk–reward equation balances out a little bit more because that will just continue the acceleration that you talked about early on. I mean, data acquisition is so difficult in every space.

Jeff:
Yeah. And I think this next coming generation of models that brings language understanding, it merges it with vision, is also going to change the game because I think, like I mentioned, it's going to enable creating effective models without having massive quantities of training data. In a sense, it's more like the human experience where if you had grown up on a planet where there were no airplanes, but I said to you, Well, it's kind of like a bird, but it's made of metal. It's about the size of a school bus or two and very linear in its design. Oh and by the way it tends to make a roaring or whining noise when it passes overhead. You'd immediately recognize the first airplane you saw. And that's where this next generation of AI models is going to take us. And by the way, the addition of the audio aspect that brings us to what's often called multimodal models. You know, humans we have multiple senses for a reason, right? Nature evolved eyes and ears and so on because they they complement each other. Vision is incredibly powerful for humans. But we have a limited field of view. So if a predator is sneaking up behind us, we're much more likely to hear that before we see it. And then we use the audio cues to then direct our vision towards the thing to better understand, Oh, that was just a branch falling or oh my gosh, it's a tiger. Time to split. We haven't seen many implementations yet out in the commercial world of AI systems that effectively use multiple sensors together. Probably automotive safety is the best example of that today.

Winn:
Yeah.

Jeff:
Even ADAS in mainstream cars today, where typically you'll have radar plus vision working together, the radar is more reliable at detecting, Yeah, there's actually a solid object there, and it's exactly this distance away, and it's moving at this speed. Radar can do that better than vision. On the other hand, what the object is is hard for radar and typically much easier for vision. So there are a few examples like that. But those are the exceptions that prove the rule. There isn't a lot out there in terms of commercially deployed systems that are using multiple sensor types together to improve machine perception. But in a loosely similar way to how I'm talking about this merging of language understanding with image understanding, we're now seeing more and more models that merge multiple diverse sensor types. So it could be camera, radar, lidar, time-of-flight audio. . .

Winn:
Um.

Jeff:
. . . ultrasound are able to use all of the available data to improve the quality of perception. And I think where you're going to see this in more and more industrial applications is, well, many places. One is safety, similar to the automotive safety thing where you have machines moving around near people, whether they're robots or material-moving machines or whatnot. And you have some potentially dangerous situations, and you need to keep the people safe. It's hard to do that totally reliably with just one type of sensor. And the problem is, if you're too conservative and you have a lot of false positives, false alarms, and the system shuts down unnecessarily, people get frustrated and they bypass the system. Obviously, in the other direction, if you're too liberal and you allow dangerous situations to pass, then people may get hurt or killed. That's even worse. So the accuracy is really important, and having multiple types of sensors is often going to be the way that we achieve that in many different kinds of safety-critical environments. The other place where I think it's going to pay dividends is in inspection systems where having multiple ways of looking at a thing . . . for example, some people will use hyperspectral imaging for looking at produce. And damage to a piece of produce that is not really visible in the visible spectrum is often completely obvious.

Jeff:
You know, with a hyperspectral sensor, I think we're going to see more and more of that where it's visible spectrum, hyperspectral, it's ultrasound, it's radar, it's lidar, it's even audio. There was a paper published a few years ago that was fascinating, where researchers took a microphone inside a car. And just from the signal from that microphone, they could tell if the road was wet. Think about that. Sure, of course they could. I can. I know the difference in the sound of my car tires in contact with the road and also the spray flying off the tires. Okay, but the fact that you can now have a machine do those kinds of things, to me it's suggestive that there's a whole lot of latent data available that we haven't been using because we haven't had a mechanism to harvest it. Like, what else could we learn by listening to things? And of course, in factory and other industrial environments today, people are increasingly listening to machines, either audio or vibration, to say, Oh, you know what? Something's not quite right here. Actually, last time we heard this pattern of sound the bearing was about to fail. So we better shut this machine down and call in for maintenance.

Winn:
Traditionally that's been the way you evaluate motor health. I mean, back in the old day, it was actually taking the screwdriver and putting it to your face so you could feel the vibration, which you couldn't hear in a noisy environment. Those guys have pretty much retired at this point, so definitely need new mechanisms. And it's interesting when we talk about the multi-modals, most of the time we're talking about different parts of the electromagnetic spectrum. If we're talking about audio, that's actually a physical force. But do you think that combining that information with the LLM capability will — I think you've already answered the question for me — that it's going to help to train models and different applications that haven't even really been envisioned yet? I mean, if it has a large enough dataset, DL or an AI system is going to be able to extrapolate the possibilities, whether it's micro behavior or in cellular structures or other things where we have a dearth of visual data to be able to see every potential variable or condition that could exist in the cellular environment. But maybe by leveraging millimeter wave ultrasound, infrared, and different parts of the spectrum, the computer can tell us eventually what to look for, without us having any real clue at all.

Jeff:
Yeah. I think a place where people are working on this quite extensively already is in medicine, because in medicine, a lot of the data comes in textual form. Patient reports symptoms, doctor makes notes, lab tests come back, including interpretation of lab tests. And this all has to be integrated along with things like X-rays and ultrasound images and so on. Also, going back to this idea of when you go see your doctor and you have some difficult-to-diagnose problem, you'd like your doctor to have read every recent research paper related to your ailment.

Winn:
Settle with my medical record, frankly.

Jeff:
And your medical record, sure. But no doctor can do that, right? No individual human can do that. But AI can do that. AI can do that and say, You know, there were 17 papers published in the last three years on this topic, and three of them suggest that people who show this symptom may have that condition. And therefore the following tests are indicated. In an industrial setting, you could see this same idea being very powerful. For example, if you read that NTSB report on the the Boeing plug door issue, one of the key clues in reconstructing the sequence of events was text messages among the Boeing workers, where they were talking about the sequence of events and what happened, what was happening, and what needed to happen. So if you imagine the visual inspection system that's supervising the manual assembly task of reinstalling this door, but it also has access to messages among the team that might be written or might be spoken. That's going to be additional valuable context that, if properly used, I think will make those systems that much more effective. So I think we will see systems that. And we already are starting to see systems that take advantage of written language as well as images and other kinds of sensor data. And the written language can arrive as written language, like doctor's notes or can arrive as speech.

Jeff:
If I'm interacting with a system, I can speak to it. And one of the applications I think we'll see pretty quickly with LLMs is car manuals, owner's manuals for cars. My daughter just got her first car, and I like to think I know some things about cars, but this car is a hybrid and it has so much technology. I'm completely baffled. And of course the owner's manual is impenetrable, as they always are. So what I need to be able to do — oh, there's no spare tire, so that was the first thing that threw me. So I need to be able to have a dialogue with the car and say, Car, I have a flat tire. What do I do? And have it tell me. Or, Car, I notice this thing happening, this vibration. What does that mean? And the car contains a large language model that's trained on the owner's manual for the car and a bunch of related data like frequently asked questions and mechanic reports and so on. And so it's not up to me to sort of interface with the car through this very clumsy mechanism of a 600-page poorly written user manual. It's up to the car to understand what I'm really asking, which by the way could also include images. Like, through my phone, Hey Car, is this is where I put in oil?

Jeff:
No, no, that's where you put in coolant. You know, look for the yellow cap or the orange cap for oil. So I think we're quickly heading to a world where, we've been in a world now for a while, where AI models tend to rely on just a single type of data. It's text or it's images or it's whatever. I think we're very quickly moving to this multi-modal world where AI models are going to be able to incorporate multiple types of data and use them to do a better job at whatever it is they're tasked with doing, like helping me manage my car or keeping workers safe around dangerous equipment in underground mines or diagnosing medical conditions or or what have you.

Jimmy:
Yeah, Jeff, you've provided a lot of really good examples of the ways that AI can kind of benefit society. And I want to circle back to that before we close up here. But one thing I did want to ask before we get toward the end is kind of a fun one. I had somebody reach out to me yesterday looking for advice on their master's thesis. So he said his interests are largely in AI and machine learning and deep learning and computer vision and embedded systems. And he said, I'm looking for a research topic that might align with current trends and challenges and embedded systems and AI. And I said, Well, I don't really feel qualified to make such a suggestion. I'm not an engineer. I'm just someone who happens to be part of this industry through marketing or previously through being an editor at a magazine. But, Jeff, you're qualified to answer such a question; doesn't mean you want to, but I guess I would pose that to you. Are there certain interesting trends or challenges that they might base their thesis on?

Winn:
Yeah.

Jeff:
Well, I love helping students, and I would invite that student to reach out to me through LinkedIn because I'd like to know more about that student's interests to really give a good answer that's oriented toward their interests. Like are they really a deep math person or are they more interested in applications or implementation? There's so many different angles here. It's hard to give an all-purpose answer without knowing more about the individual student's interests, maybe their career aspirations. So I'd be happy to get a message from that student and spend a few minutes thinking about an answer tuned to that individual's interests. But to the extent I can give a general answer, I would say, gosh, there are so many possibilities here. And I think I would be guided by what you're interested in. Again, if you're interested more in the fundamental math, there's a lot to look at there. If you're interested more in implementation, like how do we put these systems together and make them reliable in the real world, plenty to work on there. If you're interested in applications, plenty to work on there. And I think how hard you want to push into this sort of leading edge of technology should be governed by how much you want to invest in the project and kind of what your aptitudes are.

Jimmy:
That's fair. Okay. Well, I'll be happy to facilitate that conversation. A couple more questions before I close out with a fun one for you. But I did want to ask about that survey I think you mentioned earlier. What are some key takeaways there? I know that people can go and access the survey on the website and find out more, obviously. But from your perspective, what's changing in the world of computer vision and AI, at least in terms of these survey results? What's most interesting?

Jeff:
Yeah, to me the most interesting things are . . . well there are many. But first just the way that computer vision is proliferating into so many new applications. That's really kind of amazing and wonderful. In many cases things that just wouldn't have been feasible a few years ago. That's very exciting. Another important trend is this idea of multi-modal perception. We're not just using images. We're combining images with other types of sensor data, which people have been doing for a while, but it's been very challenging. This sensor fusion, as it's called, has been a bit of a black art. It's required very smart people working for long, long years to kind of figure out a solution tuned to a very specific problem with a very specific set of sensors. And so that was hard to scale. Now we are starting to have AI models that can basically learn the sensor fusion for a given application and set of sensors. And so that's opening up very exciting possibilities. Another trend that's very exciting is there's been so much improvement in the hardware, especially the processors.

Winn:
Yeah.

Jeff:
If I'd said to you 10 years ago, Oh, yeah, I'm a kitchen appliance manufacturer and I'm going to start putting computer vision into kitchen appliances to monitor the progress of cooking in an oven or washing dishes or whatever, you would have said, correctly, You're nuts. Nobody will pay for that. But now, people are doing it. There are successful kitchen appliances down to the $200 retail level that have computer vision. The computer vision is kind of invisible to the user, as it should be. User doesn't care it's computer vision. The thing just has this valuable capability. It's a smarter version than the one I had before, and that's enabled largely because of improvements in the processors. Enabling us to run deep learning algorithms in some cases on a one- or two-dollar processor, which would have been really inconceivable a few years ago. So the reduction in cost and also power consumption of the processors is huge. Power consumption enables things like a doorbell camera that can realistically run for six months on a battery charge. Doesn't have to have AC power connected to it and can do real-time object recognition, detect that people are approaching, that sort of thing. And then I think the final trend that's still in the very early stages but is very exciting is . . . this technology is very powerful, but it's very, very complicated . . .

Jeff:
and for many companies that would like to use this technology, it's still just not feasible because they can't build up an engineering team that's big enough and has enough expertise. So how do we make it easier to use this technology? And there's been a fair bit of progress in the last two, three years in terms of creating better development tools and platforms so that you don't necessarily have to know all the inner workings of how all these things work in order to effectively use them. Still early days, but I think that's maybe the single most important trend, because if you have to have an AI team to use this technology, that's going to be the limiting factor for many companies and groups within companies. It's just not going to be feasible. It's kind of like, with the advent of spreadsheets. Before spreadsheets, you either did the math manually in a business setting or you hired, you had a team of programmers to write the application to do whatever number crunching you needed. Now with spreadsheets, as long as you understand what you're trying to do, almost anybody can automate almost any kind of number-crunching task. Hopefully we'll get there with computer vision and other forms of machine perception. It's going to be a while, but you can see the glimmers of hope there on the horizon.

Winn:
I was smiling earlier, imagining a Douglas Adams version of my next Samsung refrigerator that does body shaming when I'm walking up to it and reaching for the chocolate milk or the Reese's cups that you have to keep in the freezer.

Jeff:
The one I really want is the refrigerator that knows what's in the refrigerator and how long the items have been in there, so it prevents me from eating things I shouldn't be eating because they're old. But even more importantly, I can say to it, Refrigerator, what can I make with what I have in my refrigerator? And it gets like three or four options. Oh, that sounds good. Show me the recipe. That's my dream.

Winn:
That's probably not far off at all, tell you the truth.

Jimmy:
That's now my dream too. I love it.

Winn:
That's way better than the body shaming refrigerator, for the record.

Jimmy:
Jeff, a lot of the questions I wanted to ask you we've already covered, and you've given us some really great examples, but I do want to go back to some of the examples you've mentioned before before we close out. I want to be respectful of your time. You've probably had this conversation a lot with people, when they come to you, somebody that you don't know, and you explain to them what you do and the subject of AI comes up and they get scared of it. But there's just so many good examples of the how it can better society. And you touched on this earlier, but AI is only as good as the data that it's provided by people. So for these conversations, which I'm sure you've had, how do you tell people Hey, don't be afraid of AI. It's actually really good for us.

Jeff:
Well, actually, my answer is slightly different. I think that AI is an incredibly powerful technology, maybe the most powerful technology to emerge in our lifetimes. The only things I can immediately think of that will have comparable magnitude of total impact would be the introduction of electricity for industrial use and household use and the internet. So think about electricity or the internet are being introduced. Is it going to be a force for good or for evil? Yes, both. It's an incredibly powerful technology. People are going to use it for bad purposes because people do that. Whatever technology is available, they'll use it to scam or whatever. So to me that's the reality. So there are amazing, compelling applications that are already emerging for the good of humanity. And there's one that I want to mention in just a minute that's kind of mind blowing, but there's also going to be some really scary stuff that happens, and we can't prevent that. But what we can do is be alert to it, to the possibility, and take reasonable measures to protect ourselves against it. And so I think that's kind of a realistic outlook.

Winn:
Well, we had to make circuit breakers. We had to make OCPD devices for electrical, We have to have anti-malware and other tools to protect us on the internet.

Winn:
And we're probably going to use AI to create the tools that will protect us from bad actors in deep learning.

Jeff:
And none of those protection technologies are perfect. But in aggregate they do a very good job. And I think most people would say, Well, in aggregate, we're glad we have electricity and we're glad we have the internet, even though, yeah, sometimes people get hurt by them for sure.

Jeff:
So the mind-blowing application I want to mention, which brings us back to manufacturing is up in New Hampshire, a group called ARMI is working on building out a manufacturing industry to manufacture human tissues and organs, to take processes that are pretty well established in research labs on a very small scale and industrialize them so that there are literally factories that can churn out replacement organs and tissues to serve people who are waiting for transplants — for example, soldiers who are injured, etc. That will not be possible without AI. How do you look at an organ that's growing in a chamber and determine whether it's on track or not? By the way, it has to remain sterile. Well, that's going to require computer vision and maybe other sensor modalities monitoring that tissue as it's forming and being part of a feedback control loop to optimize the process. When I first heard about what these guys were doing, it sounded like science fiction. But now that I've spent some time with them, it's real and it's mind blowing. And so to me, that's a manufacturing application. It's early days, but it's very promising and it's going to save eventually I think thousands of lives. And it will not be possible without this kind of perceptual AI that we're talking about today.

Winn:
Without question.

Winn:
A little bit more complex than than growing chicken breasts in the lab, which is currently a commercial development area.

Winn:
Yeah.

Jeff:
And for people who want to learn more about applications of computer vision, we have some great resources on our website. It's edge dash AI dash vision.com. There's a section on there with lots of short videos of cool commercial applications. Most of them are commercial applications, a few of them are proofs of concept, and also the conference and trade show that we mentioned at the beginning of the podcast, the Embedded Vision Summit, which takes place this year May 21 through 23 in Silicon Valley, is a great place to come and hear some of the people who have developed some of the leading-edge commercial applications of computer vision share a little bit about their journey, what they've done, and what they learned along the way. So those are a couple of resources for people who want to learn more about applications.

Winn:
I appreciate the plug for the conference. Go ahead Jimmy. Go ahead brother.

Winn:
Yeah.

Jimmy:
I was just going to say that in the interest of not flooding your inbox, Jeff, I encourage everyone to check out those websites and if they have any specific questions for Jeff or the alliance, I'd be happy to take those and pass them along. You can visit us at manufacturing dash Matters.com or reach out on LinkedIn. But aside from that, I certainly would encourage everybody, if you're within that space take a look at the alliance and how they might be able to help you and definitely check out the event.

Winn:
And as someone who's been to the conference multiple times, one of the things that's always excited me most about it is its focus on applications. I mean, you're just seeing use case after use case of people developing very targeted systems, leveraging multiple sensor modes, different datasets and everything to solve real-world questions. It's not just academia, math, balancing models, and things of that nature, although certainly a lot of that discussion is going on there.

Jeff:
And it's very cross-industry. Like I was saying earlier, a lot of the most compelling innovation is happening these days not in the manufacturing or industrial sector for computer vision. So come and see what people are doing in medical, in consumer, in automotive, and figure out what of that you want to bring in to your application to leapfrog the current capabilities.

Winn:
So I know we're down to just a couple of minutes left. We had a couple of questions from the audience, including, Do you have any successful use cases in synthetic data and quality inspection? So basically, maybe applying those LLM modalities you were talking about to grow datasets for industrial applications. So another one: What are the pros and cons of synthetic data, AI vision systems, and quality inspection? Kind of related. I don't know if you have any thoughts on that, Jeff, about the current state.

Jeff:
Yeah, I can't think of any use cases that I'm allowed to talk about in terms of commercial use cases. So I'll answer the second question, which is the pros and cons of synthetic data for quality inspection. And I think. . .

Winn:
For the record, that means, yes, that it's being done.

Jeff:
The pros are that with synthetic data, and for those who are not familiar, the idea of synthetic data is you create images using the same kinds of techniques that video games use, for example, or movie special effects use. They are artificial images, but they can be extremely realistic looking. And there are now whole companies and platforms for creating these synthetic images. And the idea is that, well, let's say you're making an automotive safety application and you want your system to be able to recognize all kinds of animals that might be in the road. But to actually capture images of all kinds of animals in the road, that would be hard. That would be expensive.

Winn:
From every angle and every weather condition.

Winn:
Exactly. But have basically like a video game platform that can create a photorealistic rendering of that kind of scenario in all different angles and weather conditions and so on, that's not difficult. So it's a way to get images that would otherwise in the real world be difficult to get. And by the way, you don't have to then manually label those images and go in and say, that's a raccoon and that's a deer, because they were machine generated. So they were generated programmatically. So you know ahead of time, that's a deer, that's a raccoon and exactly where they are and so on. So it's a way to get data for training neural networks that might be difficult or expensive to otherwise get. And it comes pre-labeled, which saves a whole bunch of work that has to happen with natural images. So those are the key pros, which are pretty compelling in some applications. The cons are it's not free, and it typically takes a fair amount of work to actually enumerate all the variations in cases that you need and generate the data and then validate that, yes, we got what we need and it relies on human imagination to think of the variations.

Jeff:
So in a manufacturing environment, it's on you to basically specify what kinds of defects you want to be present in the synthetic image. And a defect in a manufacturing process is harder to get right the first time in a synthetic image versus if I say, Oh, show me three people standing on a curb getting ready to cross a street. That's a more everyday sort of thing that you would expect. Computer graphics understands what that is. But if I say, Show me a mis-threaded fastener, it's really on me to be very detailed about what that means. And in the end, it may wind up being a lot of work to get the synthetic images that I want. So this is obviously an oversimplification, but that's what I can offer in a two-minute answer.

Winn:
Yeah, we need a whole other discussion just simply on that, because I've got a whole bunch of questions about complexity versus what you got to throw at it. But I know we're out of time today.

Jimmy:
Yeah. And Jeff, thank you so much for taking the time. Again, it's edge dash AI dash vision.com. And if you have any general questions or comments, either for us or for Jeff, we would be happy to pass them along. Reach out to us at manufacturing dash matters.com. Thanks everyone. Thanks Jeff.

Winn:
Thank you guys.

Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.

Automatically convert your mp3 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.

Sonix has many features that you’d love including automatic transcription software, collaboration tools, upload many different filetypes, advanced search, and easily transcribe your Zoom meetings. Try Sonix for free today.

Jimmy: [00:00:06] Hello, everybody, and welcome to this live broadcast of the Manufacturing Matters podcast. My name is Jimmy Carroll. I have the pleasure of being joined with my colleague Winn Hardin and Jeff Bier, the founder of the Edge AI and Vision Alliance and president of BDTI. Jeff, first, I want to thank you for taking the time to spend with us today. I really appreciate it.

Jeff: [00:00:25] Oh, thanks for having me, Jimmy. It’s good to be here.

Jimmy: [00:00:28] Thanks so for those who don’t know out there, I guess it’d be helpful if you could give us a little bit of a background on the Edge AI and Vision Alliance and what you’re most excited about, and then maybe BDTI as well.

Jeff: [00:00:40] Sure. So very briefly, the Edge AI and Vision Alliance is an industry partnership founded about 13, 14 years ago, focused on accelerating and facilitating the use of embedded computer vision and other forms of machine perception to solve real-world problems in industry. And as a practical matter, we do that through a lot of educational programs, including an annual conference and trade show called the Embedded Vision Summit. This industry alliance is an outgrowth of an engineering consulting firm called BDTI that I’ve been running for many years. And for the past 10 years or so, BDTI has also been focused on embedded computer vision. We provide contract engineering services to help companies who want to incorporate embedded computer vision into their products. We especially work in industrial and consumer applications.

Winn: [00:01:42] You know, one of the things that’s most exciting to me about the Embedded Vision Alliance all these years is that Jimmy and I generally come from more of the industrial space, the industrial vision, AI, motion control, robotics, etc. But you and your team and your segment represents kind of the spreading of traditional technologies into consumer-based products, into new markets that are way beyond the plant floor. So I’ve always looked at you guys as being one of the vectors for really driving these technologies beyond into new spaces.

Jeff: [00:02:15] Yeah, it’s super interesting, right? If you go back 20, 30 years, manufacturing inspection was really I would say the most important commercial application of computer vision and where a lot of the innovation was happening in the technology because it was the most important commercial application. And there really weren’t very many other commercial applications at that time because the technology was limited in its capabilities. It was expensive, big, complicated. So, yeah, if you’re manufacturing, you know, a million dollars of something a day on a production line and you could improve your your yield by 5% with a computer vision system, you would do that. But for most applications where computer vision would have provided benefit, could have provided benefit, it was just out of reach. The technology was too complicated, too limited, too expensive, power hungry, big and bulky, and so on. Now that’s changed dramatically in the last 30 years, especially in the last 10 years. And the pace of innovation is accelerating. So now things have flipped. And now, although of course manufacturing inspection continues to be a really important application of computer vision, the center of mass of innovation has moved elsewhere. Who would have thought, right, consumer electronics companies like Apple are ahead of manufacturing companies that have been at this for decades because they have such scale that they can afford to have 1,000 computer vision engineers, or actually many thousands of computer vision engineers, and invest in developing the technology.

Jeff: [00:03:54] And so you could look at that as, oh, isn’t it a shame, other sectors are ahead of industrial in terms of their innovation and their adoption of computer vision, but I prefer to look at it as an opportunity. Now, if you’re an industrial, it’s not all on your industry to figure everything out in advance all the technologies. You can look and say, Oh, look over there. That thing they did in the iPhone, who would have thought we could adapt that and use it in manufacturing? Oh, look over there. The thing they did in that car for driver safety, we can adopt in manufacturing and adapt to our needs. So it’s actually a fantastic . . . it’s a little bit of an inversion. Instead of homegrown innovation, it’s an opportunity to adopt innovation that’s being done in other sectors of the economy. And I think it’s actually a tremendous opportunity because so many billions of dollars and so many brilliant people are being invested in advancing this technology that the pace of innovation, like I said, is accelerating almost beyond imagination. And so companies that are smart about plucking the right bits from other industries, from other kinds of applications, have a tremendous opportunity to really leapfrog and advance what’s possible in industrial.

Jeff: [00:05:14] I’ll give you one concrete example, because this is all pretty abstract so far. You know everybody’s heard, of course, about this really scary incident with the Alaska Airlines Boeing plane where the plug door blew out in flight. And now it looks like the reason was some bolts were not properly installed when the plug door had been removed to enable some repairs on other parts of the aircraft and then reinstalled. Now, if we were talking about a printed circuit board going down an assembly line, you would expect to have machine vision inspection such that if bolts were missing or chips were missing or whatever, that would have been immediately flagged and that piece would have been put aside for rework or just discarded. But for this kind of manual task, these bolts are put in by hand, of course we don’t have machine vision inspection systems because it’s been too hard. It’s been beyond the standard technology. This is not looking at still images under very carefully controlled conditions. This is looking at video to understand sequences of operations. And those operations are being done by humans who move in infinitely varied ways, wear different-shaped, different-colored uniforms, have different-shaped bodies, and so on.

Jeff: [00:06:31] And so to have a computer vision system that can monitor and oversee that manual inspection process and say, Hey, wait a minute, guys, we forgot some important steps here. This aircraft is not ready to move on. That is now within reach. Not primarily because of work that people have been doing in manufacturing inspection but primarily because of work that people have been doing in other fields, like sports analytics. When you see how the pro sports events, they have real-time computer vision understanding, for example, for soccer, which team has more possession, percentage of passes that get intercepted, and so on. It’s the same kind of technology. It’s very messy human movement under kind of varied and challenging conditions. And it’s video. You can’t understand this from images. You have to understand the sequence of images. So that is coming into maturity now, that capability, and it’s going to come back into manufacturing such that hopefully mishaps like this Boeing issue with the missing bolts don’t happen because, yes, humans will always make mistakes. Having the machine supervise is I think an important way to catch those mistakes so they can be corrected early.

Winn: [00:07:47] So what you’re saying is Vegas should be giving some kickbacks to the rest of the industry right? because, you know, this last Super Bowl was, what, $165 million in betting I think went through Vegas on that particular day, which is all dependent on stats, you know, and online betting largely, to a large extent. So Vegas, we’re waiting on the phone call whenever that comes. You know that makes me also think, and it’s not just in operational maintenance modes where you’re completely uncontrolled. We’ve got friends over in Israel, Kitov, for example, who are using AI with standard inspection to be able to look at a server stack with literally hundreds of different wires connecting all the different ports and everything and validate that these very organic elements, which are going to be laying all kinds of different ways — from the connector to the connection itself, the cable connection itself — are present in the right location. I mean literally hundreds of inspection points that we would have never thought about, which could be seen in so many different ways from different perspectives.

Jeff: [00:08:44] Yeah and that’s because of deep learning. That’s really the essential AI breakthrough that has enabled us to create these visual perception systems that can deal with these very messy, unstructured, complicated environments, like you were talking about a nest of cable connections, and understand: Yeah, okay, all the cables are present and they’re connected properly or not. Another great example of that, something that really wouldn’t have been practical 10 or 20 years ago but now is becoming increasingly deployed, is agricultural applications out in the fields. So, for example, Blue River Technology, which is now part of John Deere, they have this precision spraying technology where those giant agricultural machines with the giant, 100-foot booms that are spraying herbicide or pesticide or fertilizer, instead of blanketing the entire field and having 90 or 95% of the sprayed material wasted, they use cameras to say it’s basically a friend or foe situation. Is that a weed or is that a crop seedling? And then they precision spray whatever it is, the herbicide, pesticide, fertilizer, only where it needs to go. And so they’re able to by an order of magnitude reduce the amount of herbicides and pesticides and fertilizer that they’re using. This never would have been possible with classical computer vision techniques because differentiating like a weed seedling from a crop seedling is just too hard. There’s infinite variation there. But thanks to deep learning, now we can do that reliably.

Jeff: [00:10:26] So, yeah, deep learning has really been a huge, powered a huge advance in what’s possible now with visual perception. And what’s very exciting to me is, as amazing as that has been over the last let’s say six or eight years in industry that deep learning has come into widespread use, what’s coming, starting now, in the next six to eight years is going to be even more amazing, is this positive feedback cycle where largely a few giant companies and universities and research labs have been putting so much effort into advancing the state of the art that the machine perception capabilities are advancing at an incredible speed. So, for example, I was mentioning being able to interpret video, what’s going on in a video versus still images. I would say that largely is becoming practical today. And you see it deployed in certain applications, including automotive safety, largely due to this just massive investment in academia and a few very large corporate research labs over the last five or 10 years, which literally is billions and billions of dollars and thousands of brilliant people working on this. And so it’s a little bit hard to contemplate. You know, things have changed dramatically in the last few years. They’re going to change even more dramatically in the next few years.

Winn: [00:12:00] Ag-tech is one of the most exciting things to me, so I hope we can suss out through some of the rest of our conversation some of those future-looking applications that you’re talking about, so we can go into more detail. Jimmy, your turn brother. I keep dominating.

Jimmy: [00:12:12] Yeah, no worries. Thanks, Jeff. You mentioned the last six or eight years in terms of AI and the growing adoption over that time period. I think it’s fascinating because at least on the industrial side, like at the Vision Stuttgart show I think it was eight years ago, the marketing hype cycle kind of came out around deep learning, and it was being promised as this tool that’s going to change everything. And initially it came out and it seemed as if, at least on the factory floor, the manufacturing side, people weren’t really sure what to do with it or how it would add value. And certainly it was not going to replace traditional machine vision or computer vision. But it does seem to me, and Jeff, correct me if I’m wrong here, but it does seem to me that in the last couple of years, there’s been just a growing number of cases where people seem to really have found a way, like a really novel and useful way to have these deep learning technologies augment existing computer vision, machine vision systems to completely create a new set of opportunities and capabilities. Like you’re saying, both on the factory floor and beyond. Is that kind of your perception of it too?

Jeff: [00:13:23] Yeah, I think so. I mean, six, eight years ago people in industrial and manufacturing applications were saying things to me like, Oh yeah, deep learning is cool, but that’ll never work for manufacturing inspection because these algorithms are only 95% accurate and we need 99.999% accurate. And I can understand where those people were coming from. But the thing you gotta keep in mind is, don’t bet against deep learning. And the reason is, traditional computer vision or for that matter any kind of traditional algorithm field that you want to name — compression, cryptography, you name it — that is powered by humans, there’s a limit to what humans can do. There’s a limit to individual human ingenuity and productivity. And there’s a limit to how many humans can effectively collaborate to create one thing. These limitations do not exist for deep learning. Deep learning thrives on data, and as long as you can keep aggregating data and keep throwing compute at training the networks with those data, it can get better and better and better. So a great example of this is, if I have some strange, difficult-to-diagnose health situation, and I go in to see my local primary care doctor, that primary care doctor probably won’t be able to get a correct diagnosis because she’s probably never seen this condition before. So she might refer me to a specialist in the nearest big city.

Jeff: [00:14:55] I travel a few hours. I go see the specialist. Now I have a better chance of getting a correct diagnosis. But let’s say my situation is very unusual for some reason and that specialist is also stumped. Well, maybe now I have to fly across the country to the really top, top specialist. Well, the world we’re getting to in deep learning is where my local doctor is going to be able to put my symptoms, my labs, and so on into an AI system that is going to aggregate the expertise of not one but all of the top experts in the field. And it’s going to very quickly come back with a proposed diagnosis. It still needs to be validated by an expert. But the point I’m trying to make here is that deep learning allows us to aggregate knowledge because of being able to aggregate data and train these networks. And so I would never bet against deep learning in terms of being able to do better than humans at any kind of algorithmic problem. It may take a while. And one of the things that’s been a key limiting factor has been the availability of data. So, for example, in manufacturing inspection, I’ve talked to a number of companies who say, Yeah, there’s a problem here because we make these vision inspection systems, but many of our customers will not allow an image to leave their factory.

Jeff: [00:16:18] They’re like contract manufacturers for high-profile brands. And the brands, those places are locked down like a maximum security federal prison. Nothing can go out. So how are you going to train your deep neural network with no data? Well, the answer is you’re not. However, part of this coming wave of visual AI or perceptual AI that’s starting to emerge out of research labs and come into industry is offering ways to circumvent the data bottleneck. And I know one of the things on many people’s minds these days in connection with AI is generative AI because of how ChatGPT burst onto the scene and and has become such a phenomenon. So let me talk about that, because that’s part of what’s reshuffling the deck here. So people look at ChatGPT, a form of generative AI, a large language model, and if you’re in computer vision you think, Wow, that’s amazing, but it doesn’t really have anything to do with computer vision. That’s what I initially thought too. But digging in deeper and talking with real experts in this area, what I’ve come to understand is actually large language models are going to fundamentally change how we do computer vision. And one of the reasons why they’re going to do that is that language really embeds conceptual understanding of the world in a way that images and computer vision don’t.

Jeff: [00:17:49] So let me make this concrete with an example. Let’s imagine that I’ve given you the assignment of creating an AI system that recognizes airplanes flying in the sky. Well today what you would need to do is collect a whole bunch of images of airplanes. And depending on my specifications, you might have to collect a lot of images, different airplanes from different angles and different lighting conditions, different liveries from different airlines, etc. You might have to collect thousands, tens of thousands, hundreds of thousands of images in order to train a current-generation deep neural network like a YOLO or something like that to reasonably accurately recognize airplanes. And if you did a diligent job, you’d be able to get very good results, probably 99 point something percent accurate recognition under the conditions that I specified. But what if you don’t have access to those images or it’s just prohibitively expensive to get them? In fact, what if you live in a parallel universe where you’ve never seen a flying machine at all? You don’t know the concept of an airplane. Okay, now you have a real problem in terms of training a classical deep neural network. And this is in fact the problem that occurs often in manufacturing inspection applications because, for example, we’re looking for scratches on the case of this laptop.

Jeff: [00:19:15] Well, what does a scratch look like? Infinitely varied right? But going back to the airplane example, if we merge models that understand language with models that understand vision, you have the ability now to essentially say to the model, Well, you’ve never seen an airplane, but imagine a metal thing in the sky with something that looks like outstretched arms. Two axes. Exactly. See what you can find. And because due to its understanding of language, it’s not merely recognizing words and matching words, it actually understands the concepts. The model probably will be able to identify airplanes. Now it probably won’t do perfect on the first try. Like it’ll accidentally classify some birds as airplanes or some utility poles. And you’ll say, Ah, no. If it’s more vertical than horizontal, it’s attached to the ground, it’s probably not an airplane. If it’s flapping its wings — well now we’re getting to video understanding. But okay, airplanes tend to have linear shapes; birds tend to have more curved, organic shapes. And so you kind of have this dialogue the way you do now with ChatGPT. If you use ChatGPT as I do, often on the first try, it doesn’t get me what I want. But I go back and I say, That’s not quite what I meant.

Jeff: [00:20:37] I meant this, I meant that, and after two or three tries, often I get what I want. So one of the things that is super exciting about this: Okay, this merging of language understanding and image understanding I think will open up possibilities to create application-specific visual AI that doesn’t have to be trained with tens of thousands or hundreds of thousands of images, but it has been trained with millions of general images of the world. And so it understands, for example, what an airplane is. And now you can say, Well, I’m looking for a commercial passenger airplane that is flying, and it’ll have a pretty good idea of what you mean on the first try, and then you’ll have a dialogue with it and you’ll refine it. And so I think that’s super exciting because that offers a hope of breaking this training data bottleneck. One of the activities of my industry group, the Edge AI and Vision Alliance, is we survey developers who are creating these visual AI systems every year. And one of the things we asked them about is, What’s hard for you? What’s making your life difficult in terms of developing these systems? And very consistently, issues related to training data are at the top of the list. Obtaining the training data, doing quality assurance on the training data, doing labeling on the training data.

Winn: [00:21:59] I thought you were going to bring it back to tagging, actually, when you were talking about the interaction between the LLM or the language models and inspection, because in the initial baby steps, you have to have that tagging element. You have to have consensus among different trainers and everything. But you’re definitely talking the next level. I wanted to ask if the competitive advantage that leveraging AI deep learning models in today’s operations, if that’s also pushing these companies to be less proprietary about their datasets, maybe more willing to share, I mean, realizing that the benefits actually outweigh the potential liability risks and therefore maybe we’re seeing more sharing? But it doesn’t sound like that’s necessarily the case.

Jeff: [00:22:44] I haven’t seen it. I think there’s a lot of paranoia. For example, another industry where this is a big problem is medical.

Jeff: [00:22:52] I’ve literally had people where I’ve been trying to get them to come give presentations on some amazing innovations they’re doing in computer vision for medical applications, and they’re like, Well, I might be able to do it in a private invitation-only event where there’s no recording allowed and you don’t distribute my slides, because they’re really locked down because of privacy rules, and it’s appropriate. But sharing data with a vendor so they can train a network, like really hard. On the other hand, the flip side is if you’re the company that has one of these flywheel business models where you have a natural way of capturing the data, you’re in a great position. So one example I really like is there’s this startup called Hayden AI. I don’t know if you guys have spent much time in Manhattan or other big cities like that, but Manhattan in particular I’m familiar with. And one of the things that’s so frustrating about Manhattan is their bus lanes. But taking the bus is slower than walking because New Yorkers do not pay attention to the bus lane rules, and people will drive in the bus lanes, they’ll park in the bus lanes, they’ll park their truck in the bus lane and unload it. And the chances of getting a ticket are very low. So they’ll take that gamble. Well, Hayden has a system that goes in every city bus. It has a forward-looking camera, and if you park in the bus lane, you’re now going to get a ticket because the camera is going to recognize your vehicle’s parked in the bus lane, it’s going to read your license plate, and you’re automatically going to get a ticket, same as if you went through like a toll bridge without the the correct tag.

Jeff: [00:24:24] So now the probability of getting a ticket from abusing the bus lane goes from near zero to near 100%. So one would expect that is going to significantly drive behavior change because people are going to rack up thousands of dollars in fines, they’re going to get their licenses revoked, and so on. And so now these guys have the potential to completely unblock the bus lanes in Manhattan and make riding the bus actually a thing that works. It’s faster than walking, which then causes fewer people to drive and has this fantastic effect potentially on the economy and the environment. But the point that’s really so relevant here is every mile those buses drive, Hayden AI collects thousands of frames of data of images that they can then use to improve their algorithms. And when something goes wrong, let’s say somebody gets a ticket that shouldn’t have because there was some weird situation where a delivery truck was carrying a mirror and it looked like this vehicle was in the bus lane but actually it was in the next lane and the license plate was reflected or something bizarre like that, well they get that feedback and they’re like, Ah, okay, now we know that that’s a thing and we can adjust our algorithms to watch out for that case. So companies that manage to create a business model and get some initial deployment such that they can continuously collect more data and use that to improve their algorithms, that’s really the ideal situation, but often challenging in manufacturing-type applications, medical-type applications because of concerns about protecting privacy in the case of medical applications and proprietary . . .

Winn: [00:25:57] Investors . . . 

Winn: [00:25:58] HIPAA requirements or investor concerns, I’m sure J&J and Merck and Pfizer don’t really want people going out there sharing the next billion-dollar idea on a regular basis.

Winn: [00:26:07] Right.

Winn: [00:26:08] And yet, I’ve worked with a number of companies during COVID, for example, that were applying deep learning models to PCR, large-scale screening machines and getting much better results as a result. And we all know the mammography deep learning models that are helping to differentiate micro-calcifications from cysts and really help to not scare people with false diagnoses but make radiologists so much more effective. Well, I’m just hoping that over time that risk–reward equation balances out a little bit more because that will just continue the acceleration that you talked about early on. I mean, data acquisition is so difficult in every space.

Jeff: [00:26:53] Yeah. And I think this next coming generation of models that brings language understanding, it merges it with vision, is also going to change the game because I think, like I mentioned, it’s going to enable creating effective models without having massive quantities of training data. In a sense, it’s more like the human experience where if you had grown up on a planet where there were no airplanes, but I said to you, Well, it’s kind of like a bird, but it’s made of metal. It’s about the size of a school bus or two and very linear in its design. Oh and by the way it tends to make a roaring or whining noise when it passes overhead. You’d immediately recognize the first airplane you saw. And that’s where this next generation of AI models is going to take us. And by the way, the addition of the audio aspect that brings us to what’s often called multimodal models. You know, humans we have multiple senses for a reason, right? Nature evolved eyes and ears and so on because they they complement each other. Vision is incredibly powerful for humans. But we have a limited field of view. So if a predator is sneaking up behind us, we’re much more likely to hear that before we see it. And then we use the audio cues to then direct our vision towards the thing to better understand, Oh, that was just a branch falling or oh my gosh, it’s a tiger. Time to split. We haven’t seen many implementations yet out in the commercial world of AI systems that effectively use multiple sensors together. Probably automotive safety is the best example of that today.

Winn: [00:28:36] Yeah.

Jeff: [00:28:36] Even ADAS in mainstream cars today, where typically you’ll have radar plus vision working together, the radar is more reliable at detecting, Yeah, there’s actually a solid object there, and it’s exactly this distance away, and it’s moving at this speed. Radar can do that better than vision. On the other hand, what the object is is hard for radar and typically much easier for vision. So there are a few examples like that. But those are the exceptions that prove the rule. There isn’t a lot out there in terms of commercially deployed systems that are using multiple sensor types together to improve machine perception. But in a loosely similar way to how I’m talking about this merging of language understanding with image understanding, we’re now seeing more and more models that merge multiple diverse sensor types. So it could be camera, radar, lidar, time-of-flight audio. . . 

Winn: [00:29:39] Um.

Jeff: [00:29:39] . . . ultrasound are able to use all of the available data to improve the quality of perception. And I think where you’re going to see this in more and more industrial applications is, well, many places. One is safety, similar to the automotive safety thing where you have machines moving around near people, whether they’re robots or material-moving machines or whatnot. And you have some potentially dangerous situations, and you need to keep the people safe. It’s hard to do that totally reliably with just one type of sensor. And the problem is, if you’re too conservative and you have a lot of false positives, false alarms, and the system shuts down unnecessarily, people get frustrated and they bypass the system. Obviously, in the other direction, if you’re too liberal and you allow dangerous situations to pass, then people may get hurt or killed. That’s even worse. So the accuracy is really important, and having multiple types of sensors is often going to be the way that we achieve that in many different kinds of safety-critical environments. The other place where I think it’s going to pay dividends is in inspection systems where having multiple ways of looking at a thing . . . for example, some people will use hyperspectral imaging for looking at produce. And damage to a piece of produce that is not really visible in the visible spectrum is often completely obvious.

Jeff: [00:31:13] You know, with a hyperspectral sensor, I think we’re going to see more and more of that where it’s visible spectrum, hyperspectral, it’s ultrasound, it’s radar, it’s lidar, it’s even audio. There was a paper published a few years ago that was fascinating, where researchers took a microphone inside a car. And just from the signal from that microphone, they could tell if the road was wet. Think about that. Sure, of course they could. I can. I know the difference in the sound of my car tires in contact with the road and also the spray flying off the tires. Okay, but the fact that you can now have a machine do those kinds of things, to me it’s suggestive that there’s a whole lot of latent data available that we haven’t been using because we haven’t had a mechanism to harvest it. Like, what else could we learn by listening to things? And of course, in factory and other industrial environments today, people are increasingly listening to machines, either audio or vibration, to say, Oh, you know what? Something’s not quite right here. Actually, last time we heard this pattern of sound the bearing was about to fail. So we better shut this machine down and call in for maintenance. 

Winn: [00:32:28] Traditionally that’s been the way you evaluate motor health. I mean, back in the old day, it was actually taking the screwdriver and putting it to your face so you could feel the vibration, which you couldn’t hear in a noisy environment. Those guys have pretty much retired at this point, so definitely need new mechanisms. And it’s interesting when we talk about the multi-modals, most of the time we’re talking about different parts of the electromagnetic spectrum. If we’re talking about audio, that’s actually a physical force. But do you think that combining that information with the LLM capability will — I think you’ve already answered the question for me — that it’s going to help to train models and different applications that haven’t even really been envisioned yet? I mean, if it has a large enough dataset, DL or an AI system is going to be able to extrapolate the possibilities, whether it’s micro behavior or in cellular structures or other things where we have a dearth of visual data to be able to see every potential variable or condition that could exist in the cellular environment. But maybe by leveraging millimeter wave ultrasound, infrared, and different parts of the spectrum, the computer can tell us eventually what to look for, without us having any real clue at all.

Jeff: [00:33:46] Yeah. I think a place where people are working on this quite extensively already is in medicine, because in medicine, a lot of the data comes in textual form. Patient reports symptoms, doctor makes notes, lab tests come back, including interpretation of lab tests. And this all has to be integrated along with things like X-rays and ultrasound images and so on. Also, going back to this idea of when you go see your doctor and you have some difficult-to-diagnose problem, you’d like your doctor to have read every recent research paper related to your ailment. 

Winn: [00:34:35] Settle with my medical record, frankly. 

Jeff: [00:34:39] And your medical record, sure. But no doctor can do that, right? No individual human can do that. But AI can do that. AI can do that and say, You know, there were 17 papers published in the last three years on this topic, and three of them suggest that people who show this symptom may have that condition. And therefore the following tests are indicated. In an industrial setting, you could see this same idea being very powerful. For example, if you read that NTSB report on the the Boeing plug door issue, one of the key clues in reconstructing the sequence of events was text messages among the Boeing workers, where they were talking about the sequence of events and what happened, what was happening, and what needed to happen. So if you imagine the visual inspection system that’s supervising the manual assembly task of reinstalling this door, but it also has access to messages among the team that might be written or might be spoken. That’s going to be additional valuable context that, if properly used, I think will make those systems that much more effective. So I think we will see systems that. And we already are starting to see systems that take advantage of written language as well as images and other kinds of sensor data. And the written language can arrive as written language, like doctor’s notes or can arrive as speech.

Jeff: [00:36:14] If I’m interacting with a system, I can speak to it. And one of the applications I think we’ll see pretty quickly with LLMs is car manuals, owner’s manuals for cars. My daughter just got her first car, and I like to think I know some things about cars, but this car is a hybrid and it has so much technology. I’m completely baffled. And of course the owner’s manual is impenetrable, as they always are. So what I need to be able to do — oh, there’s no spare tire, so that was the first thing that threw me. So I need to be able to have a dialogue with the car and say, Car, I have a flat tire. What do I do? And have it tell me. Or, Car, I notice this thing happening, this vibration. What does that mean? And the car contains a large language model that’s trained on the owner’s manual for the car and a bunch of related data like frequently asked questions and mechanic reports and so on. And so it’s not up to me to sort of interface with the car through this very clumsy mechanism of a 600-page poorly written user manual. It’s up to the car to understand what I’m really asking, which by the way could also include images. Like, through my phone, Hey Car, is this is where I put in oil?

Jeff: [00:37:37] No, no, that’s where you put in coolant. You know, look for the yellow cap or the orange cap for oil. So I think we’re quickly heading to a world where, we’ve been in a world now for a while, where AI models tend to rely on just a single type of data. It’s text or it’s images or it’s whatever. I think we’re very quickly moving to this multi-modal world where AI models are going to be able to incorporate multiple types of data and use them to do a better job at whatever it is they’re tasked with doing, like helping me manage my car or keeping workers safe around dangerous equipment in underground mines or diagnosing medical conditions or or what have you.

Jimmy: [00:38:30] Yeah, Jeff, you’ve provided a lot of really good examples of the ways that AI can kind of benefit society. And I want to circle back to that before we close up here. But one thing I did want to ask before we get toward the end is kind of a fun one. I had somebody reach out to me yesterday looking for advice on their master’s thesis. So he said his interests are largely in AI and machine learning and deep learning and computer vision and embedded systems. And he said, I’m looking for a research topic that might align with current trends and challenges and embedded systems and AI. And I said, Well, I don’t really feel qualified to make such a suggestion. I’m not an engineer. I’m just someone who happens to be part of this industry through marketing or previously through being an editor at a magazine. But, Jeff, you’re qualified to answer such a question; doesn’t mean you want to, but I guess I would pose that to you. Are there certain interesting trends or challenges that they might base their thesis on?

Winn: [00:39:33] Yeah.

Jeff: [00:39:34] Well, I love helping students, and I would invite that student to reach out to me through LinkedIn because I’d like to know more about that student’s interests to really give a good answer that’s oriented toward their interests. Like are they really a deep math person or are they more interested in applications or implementation? There’s so many different angles here. It’s hard to give an all-purpose answer without knowing more about the individual student’s interests, maybe their career aspirations. So I’d be happy to get a message from that student and spend a few minutes thinking about an answer tuned to that individual’s interests. But to the extent I can give a general answer, I would say, gosh, there are so many possibilities here. And I think I would be guided by what you’re interested in. Again, if you’re interested more in the fundamental math, there’s a lot to look at there. If you’re interested more in implementation, like how do we put these systems together and make them reliable in the real world, plenty to work on there. If you’re interested in applications, plenty to work on there. And I think how hard you want to push into this sort of leading edge of technology should be governed by how much you want to invest in the project and kind of what your aptitudes are.

Jimmy: [00:40:59] That’s fair. Okay. Well, I’ll be happy to facilitate that conversation. A couple more questions before I close out with a fun one for you. But I did want to ask about that survey I think you mentioned earlier. What are some key takeaways there? I know that people can go and access the survey on the website and find out more, obviously. But from your perspective, what’s changing in the world of computer vision and AI, at least in terms of these survey results? What’s most interesting?

Jeff: [00:41:25] Yeah, to me the most interesting things are . . . well there are many. But first just the way that computer vision is proliferating into so many new applications. That’s really kind of amazing and wonderful. In many cases things that just wouldn’t have been feasible a few years ago. That’s very exciting. Another important trend is this idea of multi-modal perception. We’re not just using images. We’re combining images with other types of sensor data, which people have been doing for a while, but it’s been very challenging. This sensor fusion, as it’s called, has been a bit of a black art. It’s required very smart people working for long, long years to kind of figure out a solution tuned to a very specific problem with a very specific set of sensors. And so that was hard to scale. Now we are starting to have AI models that can basically learn the sensor fusion for a given application and set of sensors. And so that’s opening up very exciting possibilities. Another trend that’s very exciting is there’s been so much improvement in the hardware, especially the processors.

Winn: [00:42:48] Yeah.

Jeff: [00:42:49] If I’d said to you 10 years ago, Oh, yeah, I’m a kitchen appliance manufacturer and I’m going to start putting computer vision into kitchen appliances to monitor the progress of cooking in an oven or washing dishes or whatever, you would have said, correctly, You’re nuts. Nobody will pay for that. But now, people are doing it. There are successful kitchen appliances down to the $200 retail level that have computer vision. The computer vision is kind of invisible to the user, as it should be. User doesn’t care it’s computer vision. The thing just has this valuable capability. It’s a smarter version than the one I had before, and that’s enabled largely because of improvements in the processors. Enabling us to run deep learning algorithms in some cases on a one- or two-dollar processor, which would have been really inconceivable a few years ago. So the reduction in cost and also power consumption of the processors is huge. Power consumption enables things like a doorbell camera that can realistically run for six months on a battery charge. Doesn’t have to have AC power connected to it and can do real-time object recognition, detect that people are approaching, that sort of thing. And then I think the final trend that’s still in the very early stages but is very exciting is . . . this technology is very powerful, but it’s very, very complicated . . . 

Jeff: [00:44:17] and for many companies that would like to use this technology, it’s still just not feasible because they can’t build up an engineering team that’s big enough and has enough expertise. So how do we make it easier to use this technology? And there’s been a fair bit of progress in the last two, three years in terms of creating better development tools and platforms so that you don’t necessarily have to know all the inner workings of how all these things work in order to effectively use them. Still early days, but I think that’s maybe the single most important trend, because if you have to have an AI team to use this technology, that’s going to be the limiting factor for many companies and groups within companies. It’s just not going to be feasible. It’s kind of like, with the advent of spreadsheets. Before spreadsheets, you either did the math manually in a business setting or you hired, you had a team of programmers to write the application to do whatever number crunching you needed. Now with spreadsheets, as long as you understand what you’re trying to do, almost anybody can automate almost any kind of number-crunching task. Hopefully we’ll get there with computer vision and other forms of machine perception. It’s going to be a while, but you can see the glimmers of hope there on the horizon.

Winn: [00:45:38] I was smiling earlier, imagining a Douglas Adams version of my next Samsung refrigerator that does body shaming when I’m walking up to it and reaching for the chocolate milk or the Reese’s cups that you have to keep in the freezer.

Jeff: [00:45:52] The one I really want is the refrigerator that knows what’s in the refrigerator and how long the items have been in there, so it prevents me from eating things I shouldn’t be eating because they’re old. But even more importantly, I can say to it, Refrigerator, what can I make with what I have in my refrigerator? And it gets like three or four options. Oh, that sounds good. Show me the recipe. That’s my dream.

Winn: [00:46:14] That’s probably not far off at all, tell you the truth.

Jimmy: [00:46:17] That’s now my dream too. I love it.

Winn: [00:46:20] That’s way better than the body shaming refrigerator, for the record.

Jimmy: [00:46:24] Jeff, a lot of the questions I wanted to ask you we’ve already covered, and you’ve given us some really great examples, but I do want to go back to some of the examples you’ve mentioned before before we close out. I want to be respectful of your time. You’ve probably had this conversation a lot with people, when they come to you, somebody that you don’t know, and you explain to them what you do and the subject of AI comes up and they get scared of it. But there’s just so many good examples of the how it can better society. And you touched on this earlier, but AI is only as good as the data that it’s provided by people. So for these conversations, which I’m sure you’ve had, how do you tell people Hey, don’t be afraid of AI. It’s actually really good for us.

Jeff: [00:47:16] Well, actually, my answer is slightly different. I think that AI is an incredibly powerful technology, maybe the most powerful technology to emerge in our lifetimes. The only things I can immediately think of that will have comparable magnitude of total impact would be the introduction of electricity for industrial use and household use and the internet. So think about electricity or the internet are being introduced. Is it going to be a force for good or for evil? Yes, both. It’s an incredibly powerful technology. People are going to use it for bad purposes because people do that. Whatever technology is available, they’ll use it to scam or whatever. So to me that’s the reality. So there are amazing, compelling applications that are already emerging for the good of humanity. And there’s one that I want to mention in just a minute that’s kind of mind blowing, but there’s also going to be some really scary stuff that happens, and we can’t prevent that. But what we can do is be alert to it, to the possibility, and take reasonable measures to protect ourselves against it. And so I think that’s kind of a realistic outlook. 

Winn: [00:48:39] Well, we had to make circuit breakers. We had to make OCPD devices for electrical, We have to have anti-malware and other tools to protect us on the internet. 

Winn: [00:48:46] And we’re probably going to use AI to create the tools that will protect us from bad actors in deep learning.

Jeff: [00:48:53] And none of those protection technologies are perfect. But in aggregate they do a very good job. And I think most people would say, Well, in aggregate, we’re glad we have electricity and we’re glad we have the internet, even though, yeah, sometimes people get hurt by them for sure. 

Jeff: [00:49:11] So the mind-blowing application I want to mention, which brings us back to manufacturing is up in New Hampshire, a group called ARMI is working on building out a manufacturing industry to manufacture human tissues and organs, to take processes that are pretty well established in research labs on a very small scale and industrialize them so that there are literally factories that can churn out replacement organs and tissues to serve people who are waiting for transplants — for example, soldiers who are injured, etc. That will not be possible without AI. How do you look at an organ that’s growing in a chamber and determine whether it’s on track or not? By the way, it has to remain sterile. Well, that’s going to require computer vision and maybe other sensor modalities monitoring that tissue as it’s forming and being part of a feedback control loop to optimize the process. When I first heard about what these guys were doing, it sounded like science fiction. But now that I’ve spent some time with them, it’s real and it’s mind blowing. And so to me, that’s a manufacturing application. It’s early days, but it’s very promising and it’s going to save eventually I think thousands of lives. And it will not be possible without this kind of perceptual AI that we’re talking about today.

Winn: [00:50:55] Without question.

Winn: [00:50:57] A little bit more complex than than growing chicken breasts in the lab, which is currently a commercial development area.

Winn: [00:51:05] Yeah.

Jeff: [00:51:06] And for people who want to learn more about applications of computer vision, we have some great resources on our website. It’s edge dash AI dash vision.com. There’s a section on there with lots of short videos of cool commercial applications. Most of them are commercial applications, a few of them are proofs of concept, and also the conference and trade show that we mentioned at the beginning of the podcast, the Embedded Vision Summit, which takes place this year May 21 through 23 in Silicon Valley, is a great place to come and hear some of the people who have developed some of the leading-edge commercial applications of computer vision share a little bit about their journey, what they’ve done, and what they learned along the way. So those are a couple of resources for people who want to learn more about applications.

Winn: [00:51:54] I appreciate the plug for the conference. Go ahead Jimmy. Go ahead brother.

Winn: [00:51:57] Yeah.

Jimmy: [00:51:58] I was just going to say that in the interest of not flooding your inbox, Jeff, I encourage everyone to check out those websites and if they have any specific questions for Jeff or the alliance, I’d be happy to take those and pass them along. You can visit us at manufacturing dash Matters.com or reach out on LinkedIn. But aside from that, I certainly would encourage everybody, if you’re within that space take a look at the alliance and how they might be able to help you and definitely check out the event.

Winn: [00:52:31] And as someone who’s been to the conference multiple times, one of the things that’s always excited me most about it is its focus on applications. I mean, you’re just seeing use case after use case of people developing very targeted systems, leveraging multiple sensor modes, different datasets and everything to solve real-world questions. It’s not just academia, math, balancing models, and things of that nature, although certainly a lot of that discussion is going on there.

Jeff: [00:52:55] And it’s very cross-industry. Like I was saying earlier, a lot of the most compelling innovation is happening these days not in the manufacturing or industrial sector for computer vision. So come and see what people are doing in medical, in consumer, in automotive, and figure out what of that you want to bring in to your application to leapfrog the current capabilities.

Winn: [00:53:25] So I know we’re down to just a couple of minutes left. We had a couple of questions from the audience, including, Do you have any successful use cases in synthetic data and quality inspection? So basically, maybe applying those LLM modalities you were talking about to grow datasets for industrial applications. So another one: What are the pros and cons of synthetic data, AI vision systems, and quality inspection? Kind of related. I don’t know if you have any thoughts on that, Jeff, about the current state.

Jeff: [00:53:51] Yeah, I can’t think of any use cases that I’m allowed to talk about in terms of commercial use cases. So I’ll answer the second question, which is the pros and cons of synthetic data for quality inspection. And I think. . . 

Winn: [00:54:06] For the record, that means, yes, that it’s being done.

Jeff: [00:54:09] The pros are that with synthetic data, and for those who are not familiar, the idea of synthetic data is you create images using the same kinds of techniques that video games use, for example, or movie special effects use. They are artificial images, but they can be extremely realistic looking. And there are now whole companies and platforms for creating these synthetic images. And the idea is that, well, let’s say you’re making an automotive safety application and you want your system to be able to recognize all kinds of animals that might be in the road. But to actually capture images of all kinds of animals in the road, that would be hard. That would be expensive.

Winn: [00:54:48] From every angle and every weather condition.

Winn: [00:54:49] Exactly. But have basically like a video game platform that can create a photorealistic rendering of that kind of scenario in all different angles and weather conditions and so on, that’s not difficult. So it’s a way to get images that would otherwise in the real world be difficult to get. And by the way, you don’t have to then manually label those images and go in and say, that’s a raccoon and that’s a deer, because they were machine generated. So they were generated programmatically. So you know ahead of time, that’s a deer, that’s a raccoon and exactly where they are and so on. So it’s a way to get data for training neural networks that might be difficult or expensive to otherwise get. And it comes pre-labeled, which saves a whole bunch of work that has to happen with natural images. So those are the key pros, which are pretty compelling in some applications. The cons are it’s not free, and it typically takes a fair amount of work to actually enumerate all the variations in cases that you need and generate the data and then validate that, yes, we got what we need and it relies on human imagination to think of the variations.

Jeff: [00:56:04] So in a manufacturing environment, it’s on you to basically specify what kinds of defects you want to be present in the synthetic image. And a defect in a manufacturing process is harder to get right the first time in a synthetic image versus if I say, Oh, show me three people standing on a curb getting ready to cross a street. That’s a more everyday sort of thing that you would expect. Computer graphics understands what that is. But if I say, Show me a mis-threaded fastener, it’s really on me to be very detailed about what that means. And in the end, it may wind up being a lot of work to get the synthetic images that I want. So this is obviously an oversimplification, but that’s what I can offer in a two-minute answer.

Winn: [00:56:56] Yeah, we need a whole other discussion just simply on that, because I’ve got a whole bunch of questions about complexity versus what you got to throw at it. But I know we’re out of time today.

Jimmy: [00:57:05] Yeah. And Jeff, thank you so much for taking the time. Again, it’s edge dash AI dash vision.com. And if you have any general questions or comments, either for us or for Jeff, we would be happy to pass them along. Reach out to us at manufacturing dash matters.com. Thanks everyone. Thanks Jeff.

Winn: [00:57:19] Thank you guys.