DARPA
The DARPA Bio Attribution Challenge is a virtual competition designed to spur innovation in identifying, characterizing, and attributing modified threat sequence data (protein, gene, and genome) within a simulated environment by probing the frontier of possible methodologies that can provide several orders of magnitude improvements over current state-of-the-art (SOTA) tools.


Overview
The DARPA Bio Attribution Challenge is a virtual competition designed to spur innovation in identifying, characterizing, and attributing modified threat sequence data (protein, gene, and genome) within a simulated environment by probing the frontier of possible methodologies that can provide several orders of magnitude improvements over current state-of-the-art (SOTA) tools. The competition aims to engage diverse participants from government, academia, and the private sector to advance massive scale data analysis combined with deep homology detection, pushing the boundaries of sensitivity, specificity, speed, and scale.
Participants must demonstrate innovative methods, approaches, and models for achieving precise identification of natural or modified threat agents in near real-time in petabyte scale data sets. Performance will be measured based on accuracy (correct identification and attribution), speed (time to achieve identification and attribution), innovation (novelty of methods), and data efficiency (attribution with minimal data).
The Challenge will be executed in two phases:
Round 1: Detection - Identification & Characterization: Accurately and precisely (i.e. with statistically significant reproducibility) identify and characterize pathogens of concern for human, crop, or livestock health.
Round 2: Determination - Attribution: Accurately and precisely identify unique anomalies, defined as non-routine, high-consequence, or security-relevant sequence- or metadata-derived signatures associated with engineered or otherwise manipulated organisms, intentional pathogen release, or accidental release/containment failure.
Awards:
The structure will include round-specific monetary prizes recognizing the top 3 performers in each round. An overall “Best in Show” and other special (non-monetary) recognitions to acknowledge specific achievements and insights will be provided too.
____________________
A Purely Computational Challenge
This is a purely computational challenge; no actual pathogens or biological materials will be used. All data is deliberately curated and developed by Lawrence Livermore National Laboratory (LLNL) to mimic realistic and complex scenarios without disclosing sensitive information. Participant software will be run in a secure, government-controlled environment.
Timeline
Awards/Prizes
Round-Specific Awards ($180K): Recognizing the top 3 performers in each round for their achievement within that specific stage and the benchmarks established around accuracy, speed, innovation, and data efficiency.
Total Monetary Award Amount: $180K
Non-Monetary Awards:
Highest Precision – closest measurements to actual values
Highest Accuracy – most consistency and reproducibility in measurements
Judging/Evaluation Metrics
Round 1: Detection-Identification & Characterization (2 Months)
Focus: This phase will focus on accurately and precisely (i.e. with statistically significant reproducibility) identify and characterize pathogens of concern for human, crop, or livestock health.
Round 1:
A maximum amount of 8 hrs of compute will be available for each confirmed participant’s software to run.
Macro F1 is calculated as follows:
For each target organism, calculate F1 score across all samples: F1 = 2 * (Precision * Recall) /(Precision + Recall) where Precision = TP/(TP+FP) and Recall = TP/(TP+FN).
Macro F1 = arithmetic mean of all per-organism F1 scores
Each organism weighted equally regardless of prevalence in dataset
Round 2: Determination-Attribution (1.5 Months)
Focus: This phase focuses on accurately and precisely identifying anomalies, defined as non-routine, high-consequence, or security-relevant sequence- or metadata-derived signatures associated with engineered or otherwise manipulated organisms, intentional pathogen release, or accidental release/containment failure.
Software will process approximately 850 TB – 1 PB of data
A maximum amount of 8 hrs of compute will be available for each confirmed participant’s software to run.
How to Enter
Maximum size of deliverables is 500 GB
Details on specific endpoint for submission will be provided to confirmed participants
A max of 20 participants will be accepted into the challenge for round 1 and a down select will occur to select the participants who will move on to round 2.
Data Safeguards and Participant Information
Data Privacy & Security
Eligibility Requirements
Registration Eligibility
The DARPA Bio Attribution Challenge is open to individuals and team members of all nationalities and of all ages, with the following exceptions:
Award Eligibility
Challenge Terms and Conditions
Representations:
Upon submission, the Participant hereby represents and warrants that:
The Participant will not attempt to undermine the legitimate operation of the Challenge and understands that doing so may result in disqualification and forfeiture of any prizes. Prohibited conduct includes, but is not limited to: providing false information during registration or regarding eligibility; failing to comply with the Challenge Terms, Guidelines, or Rules; tampering with or interfering with the administration of the Challenge or the participation of other competitors; submitting content that violates third party rights, applicable law, or is lewd, obscene, discriminatory, or otherwise inappropriate; violating the U.S. Department of Defense Social Media User Agreement (https://www.dodig.mil/Disclaimers/Social-Media-User-Agreement/) when using Challenge related discussion forums; threatening, harassing, or engaging in abusive conduct toward other participants or representatives of DARPA, Tech Grove, NAWCTSD, or the U.S. Government. The Government reserves the right to disqualify any individual or organization whose conduct or affiliations bring discredit to the Government or could reasonably be expected to do so.
Data Rights and Marking
Relationship of the Parties
Participant Liability and Insurance
Disputes
Governing Law
Additional Links & Resources
FASTQ:
Metagenomics:
Coding Guidelines
Note: Submission files may be encrypted for added security before upload to official submission endpoint. If submitting an encrypted file, please email the passphrase separately to TechGrove@ucf.edu.
Reach out to our team by completing the form below.
© Copyright Central Florida Tech Grove. All Rights Reserved