She’s an important figure behind today’s artificial intelligence boom, but not all computer scientists thought Fei-Fei Li was on the right track when she came up with the idea for a giant visual database called ImageNet that took years to build.
Li, now a founding director of Stanford University’s Institute for Human-Centered Artificial Intelligence, is out with a new memoir that recounts her pioneering work in curating the data set that accelerated the computer vision branch of AI.
The book, The World I See, also portrays her formative years that abruptly shifted from China to New Jersey, and follows her through academia, Silicon Valley and the halls of Congress as growing commercialisation of AI technology brought public attention and a backlash. She spoke with The Associated Press about the book and the current AI moment. The interview has been edited for length and clarity.
Q: Your book describes how you envisioned ImageNet as more than just a huge data set. Can you explain?
A: ImageNet really is the quintessential story of identifying the North Star of an AI problem and then finding a way to get there. The North Star, for me, was to really rethink how we can solve the problem of visual intelligence. One of the most fundamental problems in visual intelligence is understanding or seeing objects, because the world is made of objects. Human vision is grounded in our understanding of objects. And there are many, many, many of them. ImageNet is really an attempt to define the problem of object recognition and also to provide a path to solve it, which is the big data path.
Q: If I could time travel back 15 years, when you’re hard at work on ImageNet, and tell you about DALL-E, Stable Diffusion, Google Gemini and ChatGPT — what would most surprise you?
A: What does not surprise me is that everything you mention — DALL-E, ChatGPT, Gemini — is large-data based. They are pretrained on a large amount of data. That’s exactly what I was hoping for. What surprised me is, we got to generative AI faster than most of us thought. Generation for humans is actually not that easy. Most of us are not natural artists. The easiest generation for humans are words because speaking is generative, but drawing and painting is not generative for normal humans. We need the Van Goghs of the world.
Q: What do you think most people want from intelligent machines, and is that aligned with what scientists and tech companies are building?
A: I think, fundamentally, people want dignity and a good life. That’s almost the founding principle of our country. Machines and tech should be aligned with universal human values — dignity and a better life, including freedom and all of those things. Sometimes when we talk about tech or sometimes when we build tech, whether it’s intended or unintended, we don’t talk enough about that. When I say ‘we’ it includes technologists, it includes businesses, but also includes journalists. It’s our collective responsibility.
Q: What are the biggest misconceptions about AI?
A: The biggest misconception of AI in journalism is when journalists use the subject AI and a verb and put humans in the object. Human agency is very, very important. We create technology, we deploy technology, and we govern technology. The media and the public discourse, but heavily influenced by media, is talking about AI without the proper respect to human agency. We have so many articles, so many discussions, that start with ‘AI brings blah, blah, blah; AI does blah, blah, blah; AI delivers blah, blah, blah; AI destroys blah, blah, blah’. And I think we need to recognise this.
Q: Having studied neuroscience before you got into computer vision, how different, or similar, are AI processes to human intelligence?
A: Because I’ve scratched the surface of neuroscience, I respect even more how different they are. We don’t really know the intricate details of how our brains think. We have some inkling of lower-level visual tasks like seeing colours and shapes. But we don’t know how humans write Shakespeare, how we come to love someone, how we designed the Golden Gate Bridge. There’s just so much complexity in human brain science that is still a mystery. We don’t know how we do that in under 30 watts, the energy the brain uses. How come we’re so terrible at math while we are so fast at seeing and navigating and manipulating the physical world? The brain is the infinite source of inspiration for what artificial intelligence should be and should do. Its neural architecture — (Nobel Prize-winning neurophysiologists) Hubel and Wiesel were really the discoverers of that — was the beginning of artificial neural network inspiration. We borrowed that architecture, even though mathematically it doesn’t fully replicate what the brain does. There is a lot of intertwined inspiration. But we also have to respect that there’s a lot of unknowns, so it’s hard to answer how much they are similar.
AP