Recovering MAGs from Antarctic Soils

Welcome to this reproducible project, where we follow the MerenLab tutorial guide to recover, refine, and characterize metagenome-assembled genomes (MAGs) from two Antarctic lake catchments (Byers Peninsula).

1. Preprocessing and Quality Filtering

In this step, we performed quality filtering of raw reads from Limnopolar and Chica soil samples using illumina-utils.

The input data are the raw paired-end reads (*.fastq.gz) stored in data/raw/.

Defining metagenomic sets

We will be working with 16 metagenomic sets, 8 for each basin. There are 48 samples in total, each one has the corresponding prefix from ‘sets.txt’, followed by a letter indicating the plot they come from (A, B or C).

The contents of the sets file should look like this:

Quick note: adjust the paths as you wish.

with open("sets.txt") as f:
    print(f.read())

ChiD1T
ChiD1D
ChiD2T
ChiD2D
ChiW1T
ChiW1D
ChiW2T
ChiW2D
LimnD1T
LimnD1D
LimnD2T
LimnD2D
LimnW1T
LimnW1D

A samples.txt file will be also required. This file contains the sample name, followed by the compressed reads file:

with open("samples.txt") as f:
    print(f.read())

Sample  r1  r2
ChiD1T  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1T_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1T_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1T_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1T_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1TB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1TC_2.fastq.gz
ChiD1D  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1D_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1D_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1D_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1D_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1DB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD1DC_2.fastq.gz
ChiD2T  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2T_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2T_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2T_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2T_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2TB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2TC_2.fastq.gz
ChiD2D  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2D_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2D_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2D_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2D_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2DB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiD2DC_2.fastq.gz
ChiW1T  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1T_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1T_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1T_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1T_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1TB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1TC_2.fastq.gz
ChiW1D  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1D_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1D_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1D_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1D_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1DB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW1DC_2.fastq.gz
ChiW2T  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2T_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2T_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2T_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2T_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2TB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2TC_2.fastq.gz
ChiW2D  /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2D_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2D_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2D_C_1.fastq.gz /home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2D_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2DB_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/ChiW2DC_2.fastq.gz
LimnD1T /home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_C_1.fastq.gz  /home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_B_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD1T_C_2.fastq.gz
LimnD2D /home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_A_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_B_1.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_C_1.fastq.gz  /home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_A_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_B_2.fastq.gz,/home/meridian/MERIDIAN_2024/Samples_HN00251955/LimnD2D_C_2.fastq.gz

Quality filtering of raw reads

We will work with the program iu-gen-configs on the samples.txt file:

iu-gen-configs samples.txt

The program iu-filter-quality-minoche performs quality filtering of raw Illumina reads based on the method described by Minoche et al. (2011).

Briefly, it removes reads and base pairs that are likely to contain sequencing errors or low-quality regions. This helps to reduce noise and improve the accuracy of downstream analyses (assembly and binning).

for sample in `awk '{print $1}' samples.txt`
do
    if [ "$sample" == "sample" ]; then continue; fi
    iu-filter-quality-minoche $sample.ini --ignore-deflines
done

This step generates files that summarize the quality filtering results for each sample, as well as the FASTQ files for each sample that contain quality-filtered reads:

#with open("ChiD1T-STATS.txt") as f:
#    print(f.read())

Removal of unwanted reads (Bowtie2)

Before we move on with the co-assemblies, we are going to get rid of possible contaminated reads originated from human genome, since human contamination is very frequent. On the other hand, we also want to eliminate the phiX174 genome. It was the first genome to be sequence Sanger, F., Air, G., Barrell, B. et al. Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687–695 (1977). Since that, the viral genome is included as an spike by Illumina kits to control quality of the sequencing process and some contaminating reads could be left.

To perform short read alignment we are going to use [Bowtie2(https://bowtie-bio.sourceforge.net/bowtie2/index.shtml)]. There are other popular aligners such as BWA, but Bowtie2 parameters are easy to understand and modify. Specially, as we are not really interested in aligned reads, those that mapped to the human or phiX174 genomes, by using Bowtie2 we have an option to extract not aligned reads easily. We will use the fasta files generated in last step: *-QUALITY_PASSED_R*.fastq.gz

I recommend using conda as your installer and enviroment manager:

conda create -n bowtie2 -c bioconda bowtie2 -y

Download human coding sequences from NCBI database. We then can build an index as bowtie2 requires:

wget https://ftp.ncbi.nlm.nih.gov/genomes/refseq/vertebrate_mammalian/Homo_sapiens/latest_assembly_versions/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_cds_from_genomic.fna.gz
gunzip GCF_000001405.40_GRCh38.p14_cds_from_genomic.fna.gz
bowtie2-build GCF_000001405.40_GRCh38.p14_cds_from_genomic.fna.gz human_cds

Aligning reads against human cds

for SET in `cat sets.txt`
do
    bowtie2 -x human_cds \
            -1 ${SET}*-QUALITY_PASSED_R1.fastq.gz \
            -2 ${SET}*-QUALITY_PASSED_R2.fastq.gz \
            --un-conc ${SET}_nohuman_R%.fastq \
            -S ${SET}_tmp.sam \
            -p 28
done

PhiX174 decontamination

# Download from NCBI and decompress
wget https://ftp.ncbi.nlm.nih.gov/genomes/refseq/viral/Sinsheimervirus_phiX174/latest_assembly_versions/GCF_000819615.1_ViralProj14015/GCF_000819615.1_ViralProj14015_genomic.fna.gz
gunzip GCF_000819615.1_ViralProj14015_genomic.fna.gz

# Building Bowtie2 index
bowtie2-build GCF_000819615.1_ViralProj14015_genomic.fna phix

# Aligning human decontaminated reads against PhiX174 index
for SET in `cat sets.txt`
do
    bowtie2 -x phix174 \
        -1 ${SET}_nohuman_R1.fastq \
        -2 ${SET}_nohuman_R2.fastq \
        --un-conc ${SET}_nohuman_noPhiX_R%.fastq \
        -S /dev/null \
        -p 28
done

# Count decontaminated reads
for SET in `cat sets.txt`
do
    echo ">>> $SET"
    wc -l ${SET}_nohuman_noPhiX_R*.fastq | awk '{print $1/4}'
done

# We simplify file names
for SET in `cat sets.txt`
do
    mv ${SET}_nohuman_noPhiX_R1.fastq ${SET}_clean_R1.fastq
    mv ${SET}_nohuman_noPhiX_R2.fastq ${SET}_clean_R2.fastq
done

Now we have cleaned and trimmed reads, with no human genome contamination and no phage phix174. Hurray!

2. Co-assembly of metagenomic sets

Co-assembly, what is it?

“Co-assembly” is the computational process of assembling DNA sequencing reads from multiple biological samples into a single, combined reference assembly, as opposed to performing an independent assembly for each sample. This approach brings many benefits:

Improved Genome Reconstruction: Coassembly can recover more complete microbial genomes from complex communities (metagenomes) compared to assembling each sample individually (Riley et al. (2023)).
Higher Read Depth: By combining reads from multiple samples, the overall depth of coverage for shared genomic regions increases, which helps in the assembly of low-abundance organisms.
Enhanced Gene Discovery: A more complete and robust assembly leads to more accurate gene prediction and broader detection of genes present in the microbial community (Erkorkmaz et al. (2025)).
Comparative Analysis: A single reference coassembly provides a common framework for comparing gene content and microbial community structure across different samples (see figure above).

Should we co-assemble?

As Happy Belly Bioinformatics points out, co-assembly might seem like a good general choice to follow, but it won’t be ideal in all circumstances. From my personal experience, the golden rule in bioinformatics is that there is no golden rule as for when it would be better to co-assemble multiple samples together over when it would be better to run individual assemblies on each. The outcome can depend on many factors, such as variation among samples, community diversity, the assembler you choose, the biological questions you are addressing, and many more. In some cases, a co-assembly may actually result in a lower-quality assembly than individual assemblies. For this reason, it’s important to evaluate each dataset on its own. If you have the time and resources, testing both approaches and comparing the results is always good practice.

In our case, since samples belong to the same study site but are subjeted to different enviromental conditions (humidity, MO(%), depth), they are more likely to share a portion of their sequence by space. By performing co-assemblies across each 3 biological replicates, the mean coverage of microbial genomes can be increased, which in turn improves the completeness and resolution of the recovered genomes (MAGs). This strategy takes advantage of the overlap between replicates while still capturing condition-specific diversity, providing a more robust genomic representation of the microbial communities.

Co-assembling

Since we are going to work with a large dataset, I prefer to use MEGAHIT (Liu et al., 2015) as our assembler, since it can handle large ammounts of information better than SPADES. Syntax is as follows:

for SET in `cat sets.txt`
do
    megahit -1 ${SET}_clean_R1.fastq \
            -2 ${SET}_clean_R2.fastq \
            --min-contig-len 1000 \
            -o $SET-RAW.fa \
            --num-cpu-threads 28
done

A good practice is to rename contig names to simplify header names of all scafolds. We want to rename headers with the ´set´name they belong to. This will make downstream analyses much easier to follow and understand. Anvio also likes easy names ;). We can use the command anvi-script-reformat-fasta as follows:

for SET in $(cat sets.txt)
do
    DIR="$SET-RAW.fa"
    FILE="$DIR/$SET-RAW.fa"

    if [[ -f "$FILE" ]]; then
        anvi-script-reformat-fasta "$FILE" \
                                   --simplify-names \
                                   -o "$DIR/$SET-RAW-FIXED.fa" \
                                   --prefix "$SET"

        # Overwrite the temporal file (-FIXED) with original names
        mv "$DIR/$SET-RAW-FIXED.fa" "$FILE"
    else
        echo "$FILE not found"
    fi
done

I recommend creating a folder named “assemblies-1000nt” and using symbolic links to all renamed assemblies files. For that, i created a bash script:

You can also move directly the files to a new folder. I just prefer to keep files in their original location.

ASSEMBLY_DIR="/media/meridian/Elements/MERIDIAN_2024/02_assembly"
LINK_DIR="$ASSEMBLY_DIR/assemblies-1000nt"
mkdir -p "$LINK_DIR"

for dir in "$ASSEMBLY_DIR"/*.fa; do
    if [ -d "$dir" ]; then
        base=$(basename "$dir")  
        target="$dir/$base"       
        if [ -f "$target" ]; then
            ln -s "$target" "$LINK_DIR/$base"
            echo "Link created: $LINK_DIR/$base -> $target"
        fi
    fi
done

3. Generating contigs databases

Both the tetra-nucleotide frequency and differential mean coverage of scaffolds across metagenomes are instrumental to accurate binning of scaffolds into metagenomic bins. To prepare for the binning step, we first removed scaffolds too short to produce a reliable tetra-nucleotide frequency using the program ´anvi-script-reformat-fasta´, and then used the program ´anvi-gen-contigs-database´ with default parameters to profile all scaffolds and generate an anvi’o contigs database that stores the nucleic acid sequence, tetra-nucleotide frequency, GC-content, and the coordinates of genes Prodigal v2.6.3 (Hyatt et al. (2010)) identified:

for SET in `cat sets.txt`; do
    anvi-script-reformat-fasta \
        .../assemblies-1000nt/$SET-RAW.fa \
        --min-len 2500 \
        --simplify-names \
        -o $SET.fa

    anvi-gen-contigs-database \
        -f $SET.fa \
        -o $SET-CONTIGS.db \
        -T 28
done

Single-copy core genes

Now that we have our contigs databases, we can add some basic information about our sequences. Using HMMER to scan for a commonly used set of single-copy genes. Most microbial genomes contain exactly one copy of each single-copy core gene (SGC) in the set of known SGCs for their taxonomic domain. So, the number of HMM hits to a known SGC in a metagenome can serve as an estimate of the number of genomes in that sample (for a given taxonomic domain). Furthermore, this step will help us estimate genome completeness/redundancy in real-time as we work on binning our contigs:

for SET in `cat sets.txt`
do
    anvi-run-hmms -c $SET-CONTIGS.db --num-threads 28
done

Taxonomy

We can import taxonomy into anvio with pretty much any software, although anvio has specific parsers to make sense of the output of Kaiju and Centrifuge. We will use centrifuge, assigning taxonomy to the open-reading frames prodigal predicted. First, we will use the program ´anvi-get-dna-sequences-for-gene-calls´ to export nucleic acid sequences corresponding to the gene calls from each anvi’o contigs database. Then we will use Centrifuge and an index of NCBI’s nr database to assess the taxonomy of genes, and finally, the ´program anvi-import-taxonomy´ to import this information back into the anvi’o contigs databases.

for SET in `cat sets.txt`
do
    anvi-get-sequences-for-gene-calls -c $SET-CONTIGS.db -o $SET-gene_calls.fa
done
    centrifuge -f -x /path/to/nr_index/nr $SET-gene_calls.fa -S $SET-centrifuge_hits.tsv -p 28
done

Then we can import them into our contigs databases:

for SET in `cat sets.txt``
do
    anvi-import-taxonomy-for-genes -c $SET-CONTIGS.db -i $SET-centrifuge_hits.tsv -p centrifuge
done

Functional annotation

Now is a good time to poblate our contigs databases with functional annotation of the open-reading frames prodigal predicted. My experience has taught me that using a variety of databases is the best practice to ensure that we are not being overly sensitive to the lacking of information. For this reason, we will use NCBI COGs and KEGG for functional annotation. First, we need to set up our COG and KEGG for anvi’o:

# COGs
anvi-setup-ncbi-cogs -T 20 --just-do-it
for SET in `cat sets.txt`
do
    anvi-run-ncbi-cogs -c $SET-CONTIGS.db -T 20
done

# KEGGs
anvi-setup-kegg-data -T 20 --just-do-it
for SET in `cat sets.txt`
do      
    anvi-run-kegg-kofams -c $SET-CONTIGS.db -T 20
done

Generating contigs statistics

I am a guy who really likes numbers and statistics. Lucky for me, contigs statistics are crucial in genome assembly because they give us important information about the quality and structure of the assembly. Metrics such as the total number of contigs, N50, L50, maximum contig length, and length distribution help assess whether the assembly is sufficiently contiguous to support reliable binning. They also reveal the proportion of short versus long contigs, which is important since binning algorithms rely on sequence composition and coverage signals that are more meaningful in longer contigs. Without this assessment, you risk using assemblies that are too fragmented, introducing noise, or setting inappropriate minimum contig length thresholds, all of which can compromise the quality and interpretability of the recovered genomes (MAGs).

We can obtain contigs statistics by using the program ´anvi-display-contigs-stats´:

  
anvi-display-contigs-stats 03_contigs_dbs/*.db \
                          --report-as-text \
                          --as-markdown \
                          -o all_contigs_statistics.md

Which will show something like:

contigs_db	ChiD1T	ChiW1T	IZBWet	LimnW1T
Total Length	657351660	376227450	142891251	708536870
Num Contigs	114608	72268	28763	129590
Num Contigs > 100 kb	114	14	0	94
Num Contigs > 50 kb	643	145	36	501
Num Contigs > 20 kb	3247	1338	459	3081
Num Contigs > 10 kb	10529	5588	1987	10667
Num Contigs > 5 kb	34805	20727	7542	37217
Num Contigs > 2.5 kb	114608	72268	28763	129590
Longest Contig	441616	201055	95413	336007
Shortest Contig	2500	2500	2500	2500
Num Genes (prodigal)	724637	420527	155248	791048
L50	23973	17874	7446	29016
L75	59451	40000	16307	69315
L90	90041	58085	23342	102965
N50	6200	5426	5031	5742
N75	3656	3477	3353	3541
N90	2875	2826	2787	2844
Archaea_76	5087	3351	798	6226
Bacteria_71	10902	6992	1567	13103
Protista_83	777	490	187	875
bacteria (Bacteria_71)	147	98	22	152
eukarya (Protista_83)	0	0	0	0
archaea (Archaea_76)	0	1	0	1

We can then use pandoc to convert markdown tables to PDF or HTM.

pandoc -V geometry:landscape \
       all_contigs_statistics.md
       -o OUTPUT_FILE_NAME.pdf

This is just an example. Since we have high number of metagenomic sets, the table will be difficult to fit in a pdf page. HTML seems like the reasonable choice for us.

4. Let’s recruit some reads

Build indexes

for SET in `cat sets.txt`
do
    bowtie2-build $SET.fa $SET
done

Mapping against Contigs

for sample in `awk '{print $1}' samples.txt`
do
    if [ "$sample" == "sample" ]; then continue; fi

    # determine which metagenomic set the $sample belongs to:
    SET=`echo $sample | awk 'BEGIN {FS="_"}{print $1}'`

    # do the bowtie mapping to get the SAM file:
    bowtie2 --threads 24 \
            -x $SET \
            -1 $sample_clean_R1.fastq \
            -2 $sample_clean_R2.fastq \
            --no-unal \
            -S $sample.sam

    # covert the resulting SAM file to a BAM file:
    samtools view -F 4 -bS $sample.sam > $sample-RAW.bam

    # sort and index the BAM file:
    samtools sort $sample-RAW.bam -o $sample.bam
    samtools index $sample.bam

    # remove temporary files:
    rm $sample.sam $sample-RAW.bam

Profiling the mapping results

for sample in `awk '{print $1}' samples.txt`
do
    if [ "$sample" == "sample" ]; then continue; fi

    # determine which metagenomic set the $sample bleongs to:
    SET=`echo $sample | awk 'BEGIN {FS="_"}{print $1}'`

    anvi-profile -c $SET-CONTIGS.db \
                 -i $sample.bam \
                 --num-threads 16 \
                 -o $sample

done

5. Binning begins!

for SET in $(cat sets.txt); do
    path/to/jgi_summarize_bam_contig_depths \
    --outputDepth ${SET}_metabat.txt \
    /media/meridian/Elements/MERIDIAN_2024/04_read_recruitment/${SET}.bam
done

Resources

GitHub Repository: Antarctic-MAGs
Reference Tutorial: MerenLab – TARA Oceans MAGs
Quarto Documentation: quarto.org

Notes

All analyses are implemented with Snakemake for reproducibility and reported with Quarto.
Recommended environment: conda/mamba or containerized (Docker/Singularity).

References

Erkorkmaz, B. A., Zeevi, D., & Rudich, Y. (2025). Metagenomic co-assembly uncovers mobile antibiotic resistance genes in airborne microbiomes. Communications Earth & Environment, 6(1), 397. https://doi.org/10.1038/s43247-025-02381-3

Hyatt, D., Chen, G.-L., LoCascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11(1), 119. https://doi.org/10.1186/1471-2105-11-119

Riley, R., Bowers, R. M., Camargo, A. P., Campbell, A., Egan, R., Eloe-Fadrosh, E. A., Foster, B., Hofmeyr, S., Huntemann, M., Kellom, M., Kimbrel, J. A., Oliker, L., Yelick, K., Pett-Ridge, J., Salamov, A., Varghese, N. J., & Clum, A. (2023). Terabase-scale coassembly of a tropical soil microbiome. Microbiology Spectrum, 11(4), e00200–23. https://doi.org/10.1128/spectrum.00200-23