Computer Vision in the Era of Large Language Models

One of the things that distinguishes IN2 from other companies in the field, is that we dedicate years, decades even, to a certain technology before we claim it to be world-changing. This is the case with computer vision (CV). While everyone is busy mesmerizing over Large Language Models like GPT-4, and getting entertained by fancy text bots, computer vision has been quietly transforming industries and people's lives. We have stayed with it because it's far from just another buzzword—it's a game-changer, and it’s been one long before LLMs started hogging the limelight. From the first photo similarity algorithm we developed back in the early 2000s to the latest projects we're working on, CV has stayed one of the strongest bets in our tech portfolio.

In this post, we’re cutting through the noise to highlight the computer vision innovations that are changing the game—whether the tech world is paying attention or not.

Industry Applications

Manufacturing: Enhancing Quality Control and Automation

Precision is key in manufacturing and computer vision systems are great at inspecting products. Manufacturers can leverage well-established models like the multiple generations of the YOLO (You Only Look Once) algorighm to detect defects in products like printed circuit boards (PCBs). All this is possible due to the concept of transfer learning: the model is originally trained on millions of very diverse images, then it specializes on a particular case (like detecting defects in PCBs) by being presented with a smaller set of training data, containing faulty PCBs, annotated manually. The model is further embedded in computers that receive images captured by IoT cameras mounted on the production line. Finally, the PCBs that present defects like a missing component, or corroded copper trace are removed. The process continues having an almost flawless detection rate and without human intervention, saving time and money for the company in the long term.

PCB defect detection

Healthcare: Revolutionizing Patient Care

Computer vision has been making advancements in healthcare through applications such as tumor detection, organ segmentation, and precision surgery. MedSAM, the Segment Anything Model (SAM) adapted by Meta AI, is performing extremely well in medical imaging tasks. While SAM handles general object segmentation very well, independent of the nature of the image, MedSAM specializes in semantic medical images with impressive accuracy, managing tasks like segmenting organs or detecting tumors. Tasks that previously were done manually are now completed faster with higher accuracy and doctors can focus their attention on attending to patients and act faster when a problem is detected.

Automatic organ segmentation

Societal Benefits

Public Safety: Advanced Surveillance and Disaster Response

Another domain where computer vision can have an important impact on society is improving surveillance and disaster response systems by making traditional security cameras smarter. A classic architecture like YOLO can also be used in this case because of its real-time detection capabilities to monitor traffic which helps in predicting and/or identifying accidents and taking preventive measures faster. In addition to traffic management, computer vision technology can automatically analyze live video feeds to identify unusual activities or security threats, set off alarms, and help first responders. This reduces the need for constant human monitoring, which in turn helps streamline and speed up security response. 

A very important point to keep in mind is that these applications require paying very close attention to issues concerning privacy that need addressing to conform with the GDPR.

An AI traffic control system using computer vision

Accessibility: Empowering the Visually Impaired

Segment Anything Model (SAM) and many other computer vision tools can be used to make the lives of visually impaired people. For instance, smartphone apps using AI can segment and classify objects in the environment accurately so that users can identify day-to-day objects like food or clothing or help them navigate through the streets. Also, using technologies like Optical character recognition (OCR) they can use apps to listen, out loud, to some of the text written on signs or books and other printed documents, without depending upon someone else when they visit a nearby bookstore.

How Social Media Helps the Visually Impaired | Blog | IBVI
Check out our blog to see how social media helps the visually impaired. Learn more.

Agricultural Advancements

Crop Monitoring: Optimizing Farming with Precision Agriculture

For modern farmers, computer vision is an important asset that can transform how they care for their crops. The best example is the use of drones equipped with precision cameras to take detailed images of fields, detecting everything from crop health and soil conditions right down to pests or diseases. The data is then used for making informed decisions on irrigation, fertilization, and pest control that help to optimize resources.

Harvesting: Boosting Efficiency with Autonomous Machines

Harvesting machines can operate independently and find ripe fruit to pick thanks to computer vision technology. They identify the ripe crops and their position so that they can be picked up by the machine, reducing human labor costs significantly in the long term. Precision in this process is very important because you want to pick up the produce exactly when the crops are at their prime. Technology improves the quality of the harvested product while also reducing wastage and saving money for producers.

Earth Observation

Addressing problems from above

A niche and interesting subdomain of computer vision is Earth observation, the area of science that takes advantage of satellites taking very high-resolution pictures of Earth. Missions like Sentinel give us precious information about Earth’s surface, capturing images in the visual spectrum, infrared, and radar images (SAR). There has been an immense advancement in spatial resolution in satellite imagery, reaching a standard of 10m for the latest missions like Sentinel-2, launched in 2017. 

Explore Copernicus satellite missions - Sentinel Online
Explore Copernicus Sentinel missions, instruments, data, and technical insights. Your go-to source for efficient access and up-to-date information.

The space industry has developed a lot in recent years because of the advancements in satellite imagery but, an important point is that data is now more accessible than ever. With the help of satellite science agencies like NASA and ESA many small projects are also possible using open-source data and frameworks built around it. Platforms like Sentinel Copernicus Browser offer satellite coverage to be studied and used in projects of all sizes, with many applications: environmental monitoring, urban planning, or agriculture.

Conclusion

While Large Language Models like GPT-4 are making headlines, computer vision is quietly revolutionizing various sectors with profound real-world impacts. From enhancing quality control in manufacturing and transforming healthcare diagnostics to optimizing agriculture and improving public safety, computer vision technologies are leading innovations that directly affect daily life.