Computer Vision

       Block Six Analytics’ Media Analysis Platform Outperforms Google and Amazon Computer Vision Products   BY JOSHUA L. HERZBERG AND ALEXANDER CORDOVER  Block Six Analytics’ (B6A) Media Analysis Platform ( MAP ) analyzes sports video broadcasts to compute how long in-stadium signage appears on screen. However, other computer vision products like Amazon’s   Rekognition   and Google’s   Cloud Vision   can also perform this task. Both offer  text localization (finding text in a video frame) and optical character recognition (reading the text)  services akin to what MAP offers. But which product is most accurate?  To evaluate and compare the products, we conducted experiments on one 3.5-hour game video. This amounts to analyzing 50,350 frames We tested each algorithm’s capability to identify how often Delta Airlines and Verizon Communications signage appeared on screen. The images contained static and visual signage for both companies. The Delta signage appeared on screen for 9 minutes 22 seconds while Verizon signage appeared for 14 minutes and 18 seconds.  To evaluate the products’ algorithms, we compare the predicted results to the actual presence of signage in the image.  When an algorithm analyzes a frame from a video, we compare the predicted result to the ground truth result. If the algorithm correctly finds signage, a  True Positive  has occurred. When the algorithm correctly finds no signage, a  True Negative  has occurred. On the other hand, when the algorithm incorrectly finds signage where none exists, a  False Positive  has occurred. When the algorithm incorrectly finds no signage but signage exists, a  False Negative  has occurred.  Each algorithm processed every frame in the 3.5-hour broadcast.      
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-7btt{font-weight:bold;border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-amwm{font-weight:bold;text-align:center;vertical-align:top}
 
 
   
     DELTA  
     True Positive Frames: (Percent of Correct Time) 
     False Negatives (Minutes: Seconds) 
     False Positives (Minutes: Seconds) 
     F1 Score 1  
   
   
     B6A MAP 
     1675 (74.5) 
     574 (2:24) 
     0 (:00) 
     .854 
   
   
     Google Cloud Vision 
     951 (42.3) 
     1298 (5:25) 
     9 (:02) 
     .593 
   
   
     Amazon Rekognition 
     597 (26.5) 
     1652 (6:53) 
     7 (:02) 
     .419  
   
      
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;}
.tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
.tg .tg-7btt{font-weight:bold;border-color:inherit;text-align:center;vertical-align:top}
 
 
   
     VERIZON  
     True Positive Frames: (Percent of Correct Time) 
     False Negatives (Minutes: Seconds) 
     False Positives (Minutes: Seconds) 
     F1 Score 1  
   
   
     B6A MAP 
     3063 (89.2) 
     370 (1:33) 
     0 (:00) 
     .943 
   
   
     Google Cloud Vision 
     1132 (33.0) 
     2301 (9:36) 
     10 (:03) 
     .495 
   
   
     Amazon Rekognition 
     1798 (52.4) 
     1635 (6:49) 
     0 (:00)  
     .687 
   
       1  F1 Score combines False Positives and False Negatives into a single number.       False Positives and False Negatives are both important to teams, brands, and agencies, but MAP distinguishes itself by minimizing False Negatives. In both tests, MAP misses less than one fourth of the time that Cloud Vision and Rekognition miss. Users of Google’s or Amazon’s tools must manually review each frame of video to adjust for the gap in accuracy. This squanders entirely the efficiency of computational analysis and incurs additional financial costs. For example, to adjust for the gap in Google’s Delta results, a human must watch the entire game to find the missing 6:53 of screen time.  If clients want more accurate analysis, they must choose algorithms with fewer False Negatives. MAP’s neural networks and value calculations were built and trained specifically to analyze sports broadcasts. MAP provides more accurate results than these leading providers, while also avoiding extraneous manual review.

BY JOSHUA L. HERZBERG AND ALEXANDER CORDOVER

Block Six Analytics’ Media Analysis Platform Outperforms Google and Amazon Computer Vision Products

Block Six Analytics’ Media Analysis Platform, MAP analyzes sports video broadcasts to compute how long in-stadium signage appears on screen. However, other computer vision products like Amazon’s Rekognition and Google’s Cloud Vision can also perform this task. Both offer text localization (finding text in a video frame) and optical character recognition (reading the text) services akin to what MAP offers. But which product is most accurate?