Facial recognition technology for industry worker to access machine control copyright biancoblue via Stock Market
Delve through decades of development and innovations surrounding visual AI and where it’s headed in the future.
Visual artificial intelligence (or visual AI, for short) is a field in computer science that allows systems and applications to create inferences, take action or offer solutions based on visual input like images or video.
Dr. Porter, the Chief Science Officer for SparkCognition, answered questions for Stephen Gold, the Chief Marketing Officer, and he explained in depth what makes visual AI especially exciting in the coming years.
Artificial intelligence has grown exponentially over the years. Dr. Porter explained that “AI is a very mature field now.” After decades of testing, innovation, and refinement, the term “Artificial Intelligence” has grown from simply knowing to making inferences and eventually to retaining information and reasoning. The perfect example of this advancement is in the language process AI tools we see, especially in ChatGPT, and image processing AI tools like Dall-E or Midjourney. Looking at these examples, we see how, through machine learning, artificial intelligence has improved at performing tasks automatically and without needing traditional commands. Now that AI has the data available, it quickly sees, predicts, detects, and eventually navigates in its environment. The advancements in robotics are a great testament to AI’s ability to mimic human interaction.
In the beginning, artificial intelligence had imitated human cognition. It started with simple games like checkers in the 1960s, but it is now enabling autonomous vehicles, with Google planning to launch its own around 2030. Dr. Porter explained, “Some of the first disruptive applications of AI were in the field of games.” Now it has become the backbone of many games played across multiple platforms by multiple users in real time.
Machine learning is one of the fundamental building blocks driving the advancements in AI. So what is machine learning?
It is the science behind predictive maintenance in applications like ChatGPT and Dall-E. Dr. Porter explained machine learning as “a revolutionary technology; it’s not only responsible for the successes we’re seeing in computer vision, it’s also responsible for the successes we’re seeing in a wide variety of other tools like ChatGPT and Dall-E, at SparkCognition, we’re applying machine learning to applications for predictive maintenance and more.”
According to Dr. Porter, four defining technologies apply to AI and also visual AI, these are connectivity, computation, algorithms, and data. Where connectivity and computation are the backbones of nearly all tech, for visual AI, this specifically means the integration of CCTV cameras and surveillance systems and giving them the computation power and GPUs to analyze and read visual data. But the advancements in algorithms and big data are more significant to visual AI. Even though machine learning is the basis for visual AI, data is the building block of everything. It is the foundation on which visual AI thrives.
Visual AI is especially exciting as, in the last decade, we have seen extensive open-source data sets feed neural networks. As an example, these data sets can be used to train models for handwriting recognition through supervised learning using images of hundreds of ways an alphabet can be written and then fed to the neural net. There are many other applications and possibilities increase as the data increases.
Coco is one of these open-source data sets that is a significant contribution, containing 1.5 million images of 80 objects and their environment. Supervised learning with large data sets is a much richer and more comprehensive way to teach an algorithm to recognize objects. MultiThumos is another large data set that has 40,000 video clips that are labeled to show people performing actions like sitting, walking, or even fishing. It is these data sets that then build use cases.
Created around the 1970s, today neural networks and deep learning are maturing and in the last decade they have taken off in commercial applications. Neural nets are tied in deeply with machine learning and AI’s ability to predict and detect. Inspired by the human biology of the visual cortex, the neural net is the digital twin of the neurons and axons in the human body. Where the human nervous system operates on biological data and electrochemical energy, the neural net has activations that are weighed at nodes.
So, for the AI to learn, it has to find the correct weights and the right pathway, which is the heart of the neural network. A system that rewards and punishes the AI until it eventually finds the correct answer. And then it never forgets.
This branch of machine learning has now moved into supervised learning. This is where large data sets come in handy, as explained above. Interpreting and testing this data is how the algorithm sees if it can identify and detect outcomes, becoming more accurate as it traverses the system. So the more layers between the input and output, the more complex the neural net will be; it might even become a deep neural net. But the one thing that is the most significant for a neural network is trust. Therefore, transparency is essential. With visual AI, however, there aren’t severe limitations like other perceptive networks. The image or video requires little explanation; the query can either be seen or not.
What started with object detection has now gone into situational identification, and the example Dr. Porter used to explain visual AI’s ability is something as simple as crossing the road. It is a great example for visual AI, as it has the ability to check if the crosswalk is empty by examining the image or video. It can then respond to queries about whether someone is jaywalking. It can observe, detect and report without outside help. Visual AI is now smart enough to summarize whether the traffic is light, whether it’s rush hour, or if there are any security concerns or violations. It seems relatively simple when we say ‘summarize,’ but this results from hours of processes and machine learning period, Visual AI can now surveil a street and find partially obscured license plates or the name of a taxi driver by reading the visible markers on his vehicle.
Building on the ability of this technology to see, observe, and detect, visual AI can now enhance images. Dr. Porter explained: “So if you think back to those cop movies or those countless hours of watching CSI, you’d remember how you could zoom in on a busy street camera and find the suspect. The movies had it easy until today. Visual AI is now advancing towards super-resolution, and visual AI tools can augment image data to predict facial features and provide accurate reports.” So how is computer vision or visual AI transforming the industry? It allows the following augmentation:
- capture and interpret media content
- classifying objects
- training machines to understand the content
- creating capabilities from object detection
- activity tracking
- inspection
Computer vision is also changing the way we do business. By building on existing systems, AI is augmenting human interaction to preempt, react, and alert users according to real-time data. And because it doesn’t require a complete deployment of equipment, it is up and running in a few days. Dr. Porter explained that SparkCognition’s computer vision system now does precisely that. SparkCognition’s Visual AI Advisor is adapted for various fields, including health and safety, security, productivity, inspections, and situational awareness. It works in nearly every industry, including construction, manufacturing, schools, and more.
Visual AI is now a hardened technology. It is proven and deployed. And even third-party integrations are entirely secure and private. There is no need to record feeds or keep data stored when systems like Visual AI Advisor can check everything in real time. It can even help blur images where needed for enhanced privacy.
With potential like this, it’s no wonder Dr. Porter says, “My prediction is that the 2020s will be a decade of computer vision.”