Block Six Analytics’ Media Analysis Platform Outperforms Google and Amazon Computer Vision Products

BY JOSHUA L. HERZBERG AND ALEXANDER CORDOVER

Block Six Analytics’ (B6A) Media Analysis Platform (MAP) analyzes sports video broadcasts to compute how long in-stadium signage appears on screen. However, other computer vision products like Amazon’s Rekognition and Google’s Cloud Vision can also perform this task. Both offer text localization (finding text in a video frame) and optical character recognition (reading the text) services akin to what MAP offers. But which product is most accurate?

To evaluate and compare the products, we conducted experiments on one 3.5-hour game video. This amounts to analyzing 50,350 frames We tested each algorithm’s capability to identify how often Delta Airlines and Verizon Communications signage appeared on screen. The images contained static and visual signage for both companies. The Delta signage appeared on screen for 9 minutes 22 seconds while Verizon signage appeared for 14 minutes and 18 seconds.

To evaluate the products’ algorithms, we compare the predicted results to the actual presence of signage in the image.  When an algorithm analyzes a frame from a video, we compare the predicted result to the ground truth result. If the algorithm correctly finds signage, a True Positive has occurred. When the algorithm correctly finds no signage, a True Negative has occurred. On the other hand, when the algorithm incorrectly finds signage where none exists, a False Positive has occurred. When the algorithm incorrectly finds no signage but signage exists, a False Negative has occurred.

Each algorithm processed every frame in the 3.5-hour broadcast.

DELTA
True Positive Frames: (Percent of Correct Time) False Negatives (Minutes: Seconds) False Positives (Minutes: Seconds) F1 Score1
B6A MAP 1675 (74.5) 574 (2:24) 0 (:00) .854
Google Cloud Vision 951 (42.3) 1298 (5:25) 9 (:02) .593
Amazon Rekognition 597 (26.5) 1652 (6:53) 7 (:02) .419
VERIZON
True Positive Frames: (Percent of Correct Time) False Negatives (Minutes: Seconds) False Positives (Minutes: Seconds) F1 Score1
B6A MAP 3063 (89.2) 370 (1:33) 0 (:00) .943
Google Cloud Vision 1132 (33.0) 2301 (9:36) 10 (:03) .495
Amazon Rekognition 1798 (52.4) 1635 (6:49) 0 (:00)
.687

False Positives and False Negatives are both important to teams, brands, and agencies, but MAP distinguishes itself by minimizing False Negatives. In both tests, MAP misses less than one fourth of the time that Cloud Vision and Rekognition miss. Users of Google’s or Amazon’s tools must manually review each frame of video to adjust for the gap in accuracy. This squanders entirely the efficiency of computational analysis and incurs additional financial costs. For example, to adjust for the gap in Google’s Delta results, a human must watch the entire game to find the missing 6:53 of screen time.

If clients want more accurate analysis, they must choose algorithms with fewer False Negatives. MAP’s neural networks and value calculations were built and trained specifically to analyze sports broadcasts. MAP provides more accurate results than these leading providers, while also avoiding extraneous manual review.