Vacuum stale In Progress tasks more frequently (#3699)

https://github.com/flutter/flutter/issues/122117 is still happening frequently. Currently the cron job is only running every 12 hours, which means that when it happens we can have large numbers of stuck columns on the dashboard (it seems that when this happens it sometimes happens to an entire batch of backfill scheduling, which is currently 50) for up to 12 hours.

Looking at the history of this task:
- Originally it ran every 10 minutes, and was much more aggressive in what tasks it reset, causing https://github.com/flutter/flutter/issues/121989 and was subsequently disabled in https://github.com/flutter/cocoon/pull/2515
- It was re-enabled in https://github.com/flutter/cocoon/pull/3354, more narrowly targetted so it wouldn't reset jobs that had been assigned builds. However, it was re-landed with a much more conservative 12 hour cycle. That version *replaced* the time-based check, however, instead of adding to it, which means it's subject to a race when it runs. The justification for removing it was that since the job frequency was low the race wouldn't be an issue (which is only somewhat true).

This re-adds a time-based check to avoid races, and increases the frequency to one hour (which may still cause some tree delays; we can evaluate once it's been running in this mode for a while if we want to make it more often and make the timestamp check less conservative). It also removes the reset of the creation timestamp to 0 so that the timestamp check will work; that appears to be a legacy of the earliest version of this job which was trying to clean up a slightly different case involving null creation timestamps.

Mitigation for https://github.com/flutter/flutter/issues/122117

*Replace this paragraph with a description of what this PR is changing or adding, and why. Consider including before/after screenshots.*

*List which issues are fixed by this PR. You must list at least one issue.*

*If you had to change anything in the [flutter/tests] repo, include a link to the migration guide as per the [breaking change policy].*
diff --git a/app_dart/lib/src/request_handlers/scheduler/vacuum_stale_tasks.dart b/app_dart/lib/src/request_handlers/scheduler/vacuum_stale_tasks.dart
index 5bfc680..d6b0f5b 100644
--- a/app_dart/lib/src/request_handlers/scheduler/vacuum_stale_tasks.dart
+++ b/app_dart/lib/src/request_handlers/scheduler/vacuum_stale_tasks.dart
@@ -32,7 +32,8 @@
   /// For testing, can be used to inject a deterministic time.
   final DateTime? nowValue;
 
-  /// Tasks that are in progress for this duration will be reset.
+  /// Tasks that are in progress without a build for this duration will be
+  /// reset.
   static const Duration kTimeoutLimit = Duration(hours: 3);
 
   @override
@@ -54,14 +55,25 @@
   Future<void> _vacuumRepository(gh.RepositorySlug slug) async {
     final DatastoreService datastore = datastoreProvider(config.db);
 
-    final List<FullTask> tasks = await datastore.queryRecentTasks(slug: slug).toList();
+    // Use the same commit limit as the backfill scheduler, since the primary
+    // purpose of fixing stuck tasks is to prevent the backfiller from being
+    // stuck on one of these tasks.
+    final List<FullTask> tasks =
+        await datastore.queryRecentTasks(slug: slug, commitLimit: config.backfillerCommitLimit).toList();
     final List<Task> tasksToBeReset = <Task>[];
+    final DateTime now = DateTime.now();
     for (FullTask fullTask in tasks) {
       final Task task = fullTask.task;
       if (task.status == Task.statusInProgress && task.buildNumber == null) {
-        task.status = Task.statusNew;
-        task.createTimestamp = 0;
-        tasksToBeReset.add(task);
+        // If the task hasn't been assigned a build, see if it's been waiting
+        // longer than the timeout, and if so reset it back to New as a
+        // mitigation for https://github.com/flutter/flutter/issues/122117 until
+        // the root cause is determined and fixed.
+        final DateTime creationTime = DateTime.fromMillisecondsSinceEpoch(task.createTimestamp ?? 0);
+        if (now.difference(creationTime) > kTimeoutLimit) {
+          task.status = Task.statusNew;
+          tasksToBeReset.add(task);
+        }
       }
     }
     log.info('Vacuuming stale tasks: $tasksToBeReset');
diff --git a/app_dart/test/request_handlers/scheduler/vacuum_stale_tasks_test.dart b/app_dart/test/request_handlers/scheduler/vacuum_stale_tasks_test.dart
index 592a3d5..dd9bf34 100644
--- a/app_dart/test/request_handlers/scheduler/vacuum_stale_tasks_test.dart
+++ b/app_dart/test/request_handlers/scheduler/vacuum_stale_tasks_test.dart
@@ -38,7 +38,7 @@
       );
     });
 
-    test('skips when no tasks are stale', () async {
+    test('skips when tasks have a build number', () async {
       final List<Task> originalTasks = <Task>[
         generateTask(
           1,
@@ -55,6 +55,30 @@
       expect(tasks[0].status, Task.statusInProgress);
     });
 
+    test('skips when tasks are not yet old enough to be considered stale', () async {
+      when(
+        mockFirestoreService.writeViaTransaction(
+          captureAny,
+        ),
+      ).thenAnswer((Invocation invocation) {
+        return Future<CommitResponse>.value(CommitResponse());
+      });
+      final List<Task> originalTasks = <Task>[
+        generateTask(
+          1,
+          status: Task.statusInProgress,
+          parent: commit,
+          created: DateTime.now().subtract(const Duration(minutes: 5)),
+        ),
+      ];
+      await config.db.commit(inserts: originalTasks);
+
+      await tester.get(handler);
+
+      final List<Task> tasks = config.db.values.values.whereType<Task>().toList();
+      expect(tasks[0].status, Task.statusInProgress);
+    });
+
     test('resets stale task', () async {
       when(
         mockFirestoreService.writeViaTransaction(
@@ -79,6 +103,7 @@
           3,
           status: Task.statusInProgress,
           parent: commit,
+          created: DateTime.now().subtract(const Duration(hours: 4)),
         ),
       ];
       final DatastoreService datastore = DatastoreService(config.db, 5);
@@ -87,9 +112,7 @@
       await tester.get(handler);
 
       final List<Task> tasks = config.db.values.values.whereType<Task>().toList();
-      expect(tasks[0].createTimestamp, 0);
       expect(tasks[0].status, Task.statusNew);
-      expect(tasks[2].createTimestamp, 0);
       expect(tasks[2].status, Task.statusNew);
 
       final List<dynamic> captured = verify(mockFirestoreService.writeViaTransaction(captureAny)).captured;
diff --git a/cron.yaml b/cron.yaml
index 3d9e976..2d8acc0 100644
--- a/cron.yaml
+++ b/cron.yaml
@@ -11,7 +11,7 @@
 # https://github.com/flutter/flutter/issues/120395#issuecomment-1444810718
 - description: vacuum stale tasks
   url: /api/scheduler/vacuum-stale-tasks
-  schedule: every 12 hours
+  schedule: every 1 hours
 
 - description: backfills builds
   url: /api/v2/scheduler/batch-backfiller