[macOS] FlutterSurfaceManager should not return surfaces that are in use (#52082)

Fixes https://github.com/flutter/flutter/issues/138936

Currently the engine hold a single backbuffer per flutter view layer,
which gets swapped with front buffer. This however is too optimistic, as
the front buffer in some cases might not get immediately picked up by
the compositor. When that happens, during next frame the cache may
return a backbuffer that is in fact still being used by the compositor.
Rendering to such surface will result in artifacts seen in the video.
This seems more likely to happen with busy platform thread. It is also
more likely to happen with the presence of "heavy" platform views such
as `WKWebView`.

IOSurface is able to report when it is being used by the compositor
(`IOSurfaceIsInUse`). This PR ensures that the backing store cache never
returns surface that is being used by compositor. This may result in
transiently keeping more surfaces than before, as the compositor
sometimes seems to holds on to the surface longer than it is displayed,
but that's preferable outcome to visual glitches.

This PR adds `age` field to `FlutterSurface`. When returning buffers to
the surface cache (during compositor commit), the age of each surface
currently present in cache increases by one, while the newly returned
surfaces have their age set to 0.

When returning surfaces from cache, a surface with lowest age that is
not in use by compositor is returned.

Surfaces with age that reaches 30 are evicted from the pool. Reaching
this age means one of two things:
- When surface is still in use at age 30 it means the compositor is
holding on to it much longer than we expect. In this case just forget
about the surface, we can't really do anything about it.
- When surface is not in use at age 30, it means it hasn't been removed
from cache for 29 subsequent frames. That means the cache is holding more
surfaces than needed.

Removing all surfaces from cache after idle period remains unchanged.

Before: 


https://github.com/flutter/engine/assets/96958/ba2bfc43-525e-4a88-b37c-61842994d3bc

After:


https://github.com/flutter/engine/assets/96958/8b5d393e-3031-46b1-b3f0-cb7f63f8d960


*If you had to change anything in the [flutter/tests] repo, include a
link to the migration guide as per the [breaking change policy].*

## Pre-launch Checklist

- [x] I read the [Contributor Guide] and followed the process outlined
there for submitting PRs.
- [x] I read the [Tree Hygiene] wiki page, which explains my
responsibilities.
- [x] I read and followed the [Flutter Style Guide] and the [C++,
Objective-C, Java style guides].
- [x] I listed at least one issue that this PR fixes in the description
above.
- [x] I added new tests to check the change I am making or feature I am
adding, or the PR is [test-exempt]. See [testing the engine] for
instructions on writing and running engine tests.
- [x] I updated/added relevant documentation (doc comments with `///`).
- [x] I signed the [CLA].
- [x] All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel
on [Discord].

<!-- Links -->
[Contributor Guide]:
https://github.com/flutter/flutter/wiki/Tree-hygiene#overview
[Tree Hygiene]: https://github.com/flutter/flutter/wiki/Tree-hygiene
[test-exempt]:
https://github.com/flutter/flutter/wiki/Tree-hygiene#tests
[Flutter Style Guide]:
https://github.com/flutter/flutter/wiki/Style-guide-for-Flutter-repo
[C++, Objective-C, Java style guides]:
https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
[testing the engine]:
https://github.com/flutter/flutter/wiki/Testing-the-engine
[CLA]: https://cla.developers.google.com/
[flutter/tests]: https://github.com/flutter/tests
[breaking change policy]:
https://github.com/flutter/flutter/wiki/Tree-hygiene#handling-breaking-changes
[Discord]: https://github.com/flutter/flutter/wiki/Chat
diff --git a/shell/platform/darwin/macos/framework/Source/FlutterSurface.h b/shell/platform/darwin/macos/framework/Source/FlutterSurface.h
index b566e68..50f6a2a 100644
--- a/shell/platform/darwin/macos/framework/Source/FlutterSurface.h
+++ b/shell/platform/darwin/macos/framework/Source/FlutterSurface.h
@@ -32,7 +32,13 @@
 @property(readonly, nonatomic, nonnull) IOSurfaceRef ioSurface;
 @property(readonly, nonatomic) CGSize size;
 @property(readonly, nonatomic) int64_t textureId;
+// Whether the surface is currently in use by the compositor.
+@property(readonly, nonatomic) BOOL isInUse;
 
 @end
 
+@interface FlutterSurface (Testing)
+@property(readwrite, nonatomic) BOOL isInUseOverride;
+@end
+
 #endif  // FLUTTER_SHELL_PLATFORM_DARWIN_MACOS_FRAMEWORK_SOURCE_FLUTTERSURFACE_H_
diff --git a/shell/platform/darwin/macos/framework/Source/FlutterSurface.mm b/shell/platform/darwin/macos/framework/Source/FlutterSurface.mm
index 4b65f25..f61f81c 100644
--- a/shell/platform/darwin/macos/framework/Source/FlutterSurface.mm
+++ b/shell/platform/darwin/macos/framework/Source/FlutterSurface.mm
@@ -10,6 +10,8 @@
   CGSize _size;
   IOSurfaceRef _ioSurface;
   id<MTLTexture> _texture;
+  // Used for testing.
+  BOOL _isInUseOverride;
 }
 @end
 
@@ -27,6 +29,18 @@
   return reinterpret_cast<int64_t>(_texture);
 }
 
+- (BOOL)isInUse {
+  return _isInUseOverride || IOSurfaceIsInUse(_ioSurface);
+}
+
+- (BOOL)isInUseOverride {
+  return _isInUseOverride;
+}
+
+- (void)setIsInUseOverride:(BOOL)isInUseOverride {
+  _isInUseOverride = isInUseOverride;
+}
+
 - (instancetype)initWithSize:(CGSize)size device:(id<MTLDevice>)device {
   if (self = [super init]) {
     self->_size = size;
diff --git a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.h b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.h
index 279d580..d515741 100644
--- a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.h
+++ b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.h
@@ -88,7 +88,7 @@
 /**
  * Removes all cached surfaces replacing them with new ones.
  */
-- (void)replaceSurfaces:(nonnull NSArray<FlutterSurface*>*)surfaces;
+- (void)returnSurfaces:(nonnull NSArray<FlutterSurface*>*)surfaces;
 
 /**
  * Returns number of surfaces currently in cache. Used for tests.
diff --git a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.mm b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.mm
index bed88bf..d87aed4 100644
--- a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.mm
+++ b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManager.mm
@@ -153,7 +153,7 @@
   FML_DCHECK([NSThread isMainThread]);
 
   // Release all unused back buffer surfaces and replace them with front surfaces.
-  [_backBufferCache replaceSurfaces:_frontSurfaces];
+  [_backBufferCache returnSurfaces:_frontSurfaces];
 
   // Front surfaces will be replaced by currently presented surfaces.
   [_frontSurfaces removeAllObjects];
@@ -266,9 +266,15 @@
 
 // Cached back buffers will be released after kIdleDelay if there is no activity.
 static const double kIdleDelay = 1.0;
+// Once surfaces reach kEvictionAge, they will be evicted from the cache.
+// The age of 30 has been chosen to reduce potential surface allocation churn.
+// For unused surface 30 frames means only half a second at 60fps, and there is
+// idle timeout of 1 second where all surfaces are evicted.
+static const int kSurfaceEvictionAge = 30;
 
 @interface FlutterBackBufferCache () {
   NSMutableArray<FlutterSurface*>* _surfaces;
+  NSMapTable<FlutterSurface*, NSNumber*>* _surfaceAge;
 }
 
 @end
@@ -278,28 +284,65 @@
 - (instancetype)init {
   if (self = [super init]) {
     self->_surfaces = [[NSMutableArray alloc] init];
+    self->_surfaceAge = [NSMapTable weakToStrongObjectsMapTable];
   }
   return self;
 }
 
+- (int)ageForSurface:(FlutterSurface*)surface {
+  NSNumber* age = [_surfaceAge objectForKey:surface];
+  return age != nil ? age.intValue : 0;
+}
+
+- (void)setAge:(int)age forSurface:(FlutterSurface*)surface {
+  [_surfaceAge setObject:@(age) forKey:surface];
+}
+
 - (nullable FlutterSurface*)removeSurfaceForSize:(CGSize)size {
   @synchronized(self) {
+    // Purge all cached surfaces if the size has changed.
+    if (_surfaces.firstObject != nil && !CGSizeEqualToSize(_surfaces.firstObject.size, size)) {
+      [_surfaces removeAllObjects];
+    }
+
+    FlutterSurface* res;
+
+    // Returns youngest surface that is not in use. Returning youngest surface ensures
+    // that the cache doesn't keep more surfaces than it needs to, as the unused surfaces
+    // kept in cache will have their age kept increasing until purged (inside [returnSurfaces:]).
     for (FlutterSurface* surface in _surfaces) {
-      if (CGSizeEqualToSize(surface.size, size)) {
-        // By default ARC doesn't retain enumeration iteration variables.
-        FlutterSurface* res = surface;
-        [_surfaces removeObject:surface];
-        return res;
+      if (!surface.isInUse &&
+          (res == nil || [self ageForSurface:res] > [self ageForSurface:surface])) {
+        res = surface;
       }
     }
-    return nil;
+    if (res != nil) {
+      [_surfaces removeObject:res];
+    }
+    return res;
   }
 }
 
-- (void)replaceSurfaces:(nonnull NSArray<FlutterSurface*>*)surfaces {
+- (void)returnSurfaces:(nonnull NSArray<FlutterSurface*>*)returnedSurfaces {
   @synchronized(self) {
-    [_surfaces removeAllObjects];
-    [_surfaces addObjectsFromArray:surfaces];
+    for (FlutterSurface* surface in returnedSurfaces) {
+      [self setAge:0 forSurface:surface];
+    }
+    for (FlutterSurface* surface in _surfaces) {
+      [self setAge:[self ageForSurface:surface] + 1 forSurface:surface];
+    }
+
+    [_surfaces addObjectsFromArray:returnedSurfaces];
+
+    // Purge all surface with age = kSurfaceEvictionAge. Reaching this age can mean two things:
+    // - Surface is still in use and we can't return it. This can happen in some edge
+    //   cases where the compositor holds on to the surface for much longer than expected.
+    // - Surface is not in use but it hasn't been requested from the cache for a while.
+    //   This means there are too many surfaces in the cache.
+    [_surfaces filterUsingPredicate:[NSPredicate predicateWithBlock:^BOOL(FlutterSurface* surface,
+                                                                          NSDictionary* bindings) {
+                 return [self ageForSurface:surface] < kSurfaceEvictionAge;
+               }]];
   }
 
   // performSelector:withObject:afterDelay needs to be performed on RunLoop thread
diff --git a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManagerTest.mm b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManagerTest.mm
index 04471eb..8d69962 100644
--- a/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManagerTest.mm
+++ b/shell/platform/darwin/macos/framework/Source/FlutterSurfaceManagerTest.mm
@@ -117,8 +117,13 @@
   auto surfaceFromCache = [surfaceManager surfaceForSize:CGSizeMake(110, 110)];
   EXPECT_EQ(surfaceFromCache, surface2);
 
-  [surfaceManager presentSurfaces:@[] atTime:0 notify:nil];
-  EXPECT_EQ(surfaceManager.backBufferCache.count, 1ul);
+  // Submit empty surfaces until the one in cache gets to age >= kSurfaceEvictionAge, in which case
+  // it should be removed.
+
+  for (int i = 0; i < 30 /* kSurfaceEvictionAge */; ++i) {
+    [surfaceManager presentSurfaces:@[] atTime:0 notify:nil];
+    EXPECT_EQ(surfaceManager.backBufferCache.count, 1ul);
+  }
 
   [surfaceManager presentSurfaces:@[] atTime:0 notify:nil];
   EXPECT_EQ(surfaceManager.backBufferCache.count, 0ul);
@@ -163,6 +168,33 @@
   EXPECT_EQ(surface3, surface1);
 }
 
+TEST(FlutterSurfaceManager, BackingStoreCacheSurfaceStuckInUse) {
+  TestView* testView = [[TestView alloc] init];
+  FlutterSurfaceManager* surfaceManager = CreateSurfaceManager(testView);
+
+  auto surface1 = [surfaceManager surfaceForSize:CGSizeMake(100, 100)];
+
+  [surfaceManager presentSurfaces:@[ CreatePresentInfo(surface1) ] atTime:0 notify:nil];
+  // Pretend that compositor is holding on to the surface. The surface will be kept
+  // in cache until the age of kSurfaceEvictionAge is reached, and then evicted.
+  surface1.isInUseOverride = YES;
+
+  auto surface2 = [surfaceManager surfaceForSize:CGSizeMake(100, 100)];
+  [surfaceManager presentSurfaces:@[ CreatePresentInfo(surface2) ] atTime:0 notify:nil];
+  EXPECT_EQ(surfaceManager.backBufferCache.count, 1ul);
+
+  for (int i = 0; i < 30 /* kSurfaceEvictionAge */ - 1; ++i) {
+    auto surface3 = [surfaceManager surfaceForSize:CGSizeMake(100, 100)];
+    [surfaceManager presentSurfaces:@[ CreatePresentInfo(surface3) ] atTime:0 notify:nil];
+    EXPECT_EQ(surfaceManager.backBufferCache.count, 2ul);
+  }
+
+  auto surface4 = [surfaceManager surfaceForSize:CGSizeMake(100, 100)];
+  [surfaceManager presentSurfaces:@[ CreatePresentInfo(surface4) ] atTime:0 notify:nil];
+  // Surface in use should bet old enough at this point to be evicted.
+  EXPECT_EQ(surfaceManager.backBufferCache.count, 1ul);
+}
+
 inline bool operator==(const CGRect& lhs, const CGRect& rhs) {
   return CGRectEqualToRect(lhs, rhs);
 }