Lessons Learned

How to create a change detection system.

We did not set out to create a unique change detection system. But evolved as a major component of our end product

What users really want in a caption

We originally thought a brief caption containing all the relevant information would be the best approach. After conducting multiple user interviews we found out that the details of the caption were not as important as what objects were detected and the timestamp of the event.

A transformer model would have been better for the Auto-Caption

With the limited resources we had for this project we could not properly train a transformer auto-caption model. Transformers require a higher amount of compute than we originally anticipated.

Next Steps

The Edge

Develop two-factor authentication. This would allow Argos to work seamlessly with existing cameras in households and negate the additional hardware requirements.
Work to reduce the time to process objects in a frame to the time it takes to start recording a video segment. This is currently ~1 second which equates to 25 frames of lost data.
Develop a plug-and-play rollout for edge device configuration. A system that detects an Argos server on the LAN and automatically registers the camera on the cloud

The Cloud

Create communication from edge to cloud for selecting objects of interest. The current version only filters out objects in the cloud
Increase website security by using a PHP framework such as codeigniter. This will help prevent SQL injection and improve code structure by implementing an MVC structure.
Use CUDA and GPU in the cloud to increase image processing speed

Models

Add more images to our YOLOv5 model to improve accuracy
Use a transformer-based model for video captioning instead of LSTM
Using other pre-trained models to extract features especially ones made for understanding videos like I3D
Using pre-trained embedding like glove so that the model understands sentences better

User Feedback

Department of Defense, Special Operations, and Airport Security provided us their thoughts on Argos

"I especially like the portability factor. We have installed cameras as well as temporary setups that each require the same amount of man hours to review and analyze. The fact this can work with our existing streaming cameras as well as shorter duration setups is incredible"

"What matters most to my team is focusing on the footage with things that matter most to us. We waste a lot of time scrubbing through video for highlights and, in all honestly, probably lose a lot of valuable information in that process. That said, the caption service doesn't help that much. We only care about being able to snap to the next time or frame of interest and then being able to analyze that clip"

"It would be nice to have pre/post roll for each segment so we can gain some context about the events."

"Where's the audio?"

"I really like how I can select the objects that matter to me for each camera. Not all setups are the same. How can we add more objects as needed? For instance, if we have a report of a new type of potential explosive container, how can we tell the system to BOLO 'Be On the Look Out' for that item?"