Block Six Analytics’ Media Analysis Platform Outperforms Google and Amazon Computer Vision Products

BY JOSHUA L. HERZBERG AND ALEXANDER CORDOVER

Block Six Analytics’ (B6A) Media Analysis Platform (MAP) analyzes sports video broadcasts to compute how long in-stadium signage appears on screen. However, other computer vision products like Amazon’s Rekognition and Google’s Cloud Vision can also perform this task. Both offer text localization (finding text in a video frame) and optical character recognition (reading the text) services akin to what MAP offers. But which product is most accurate?

To evaluate and compare the products, we conducted experiments on one 3.5-hour game video. This amounts to analyzing 50,350 frames We tested each algorithm’s capability to identify how often Delta Airlines and Verizon Communications signage appeared on screen. The images contained static and visual signage for both companies. The Delta signage appeared on screen for 9 minutes 22 seconds while Verizon signage appeared for 14 minutes and 18 seconds.

To evaluate the products’ algorithms, we compare the predicted results to the actual presence of signage in the image. When an algorithm analyzes a frame from a video, we compare the predicted result to the ground truth result. If the algorithm correctly finds signage, a True Positive has occurred. When the algorithm correctly finds no signage, a True Negative has occurred. On the other hand, when the algorithm incorrectly finds signage where none exists, a False Positive has occurred. When the algorithm incorrectly finds no signage but signage exists, a False Negative has occurred.

Each algorithm processed every frame in the 3.5-hour broadcast.

  
    DELTA

    True Positive Frames: (Percent of Correct Time)
    False Negatives (Minutes: Seconds)
    False Positives (Minutes: Seconds)
    F1 Score1
  

    B6A MAP
    1675 (74.5)
    574 (2:24)
    0 (:00)
    .854
  

    Google Cloud Vision
    951 (42.3)
    1298 (5:25)
    9 (:02)
    .593
  

    Amazon Rekognition
    597 (26.5)
    1652 (6:53)
    7 (:02)
    .419

  

DELTA	True Positive Frames: (Percent of Correct Time)	False Negatives (Minutes: Seconds)	False Positives (Minutes: Seconds)	F1 Score¹
B6A MAP	1675 (74.5)	574 (2:24)	0 (:00)	.854
Google Cloud Vision	951 (42.3)	1298 (5:25)	9 (:02)	.593
Amazon Rekognition	597 (26.5)	1652 (6:53)	7 (:02)	.419

  
    VERIZON

    True Positive Frames: (Percent of Correct Time)
    False Negatives (Minutes: Seconds)
    False Positives (Minutes: Seconds)
    F1 Score1
  

    B6A MAP
    3063 (89.2)
    370 (1:33)
    0 (:00)
    .943
  

    Google Cloud Vision
    1132 (33.0)
    2301 (9:36)
    10 (:03)
    .495
  

    Amazon Rekognition
    1798 (52.4)
    1635 (6:49)
    0 (:00)

    .687
  

VERIZON	True Positive Frames: (Percent of Correct Time)	False Negatives (Minutes: Seconds)	False Positives (Minutes: Seconds)	F1 Score¹
B6A MAP	3063 (89.2)	370 (1:33)	0 (:00)	.943
Google Cloud Vision	1132 (33.0)	2301 (9:36)	10 (:03)	.495
Amazon Rekognition	1798 (52.4)	1635 (6:49)	0 (:00)	.687

¹F1 Score combines False Positives and False Negatives into a single number.

False Positives and False Negatives are both important to teams, brands, and agencies, but MAP distinguishes itself by minimizing False Negatives. In both tests, MAP misses less than one fourth of the time that Cloud Vision and Rekognition miss. Users of Google’s or Amazon’s tools must manually review each frame of video to adjust for the gap in accuracy. This squanders entirely the efficiency of computational analysis and incurs additional financial costs. For example, to adjust for the gap in Google’s Delta results, a human must watch the entire game to find the missing 6:53 of screen time.

If clients want more accurate analysis, they must choose algorithms with fewer False Negatives. MAP’s neural networks and value calculations were built and trained specifically to analyze sports broadcasts. MAP provides more accurate results than these leading providers, while also avoiding extraneous manual review.