Genetic discrimination of wild versus farmed gilthead sea bream Sparus aurata using microsatellite markers associated with candidate genes

– Farm escapees and their offspring impose a signi ﬁ cant impact on the environment and may therefore alter the future evolutionary trajectories of wild populations. To date, there is no management plan in place in Mediterranean countries to prevent ﬁ sh escapes. Here, we investigate microsatellite length variations in three candidate genes, including prolactin (PRL), growth hormone (GH), and the receptor activity modifying protein 3 gene (RAMP3), to study the genetic structure of the main ﬁ sh species farmed in the Mediterranean, gilthead seabream ( Sparus aurata ). We also evaluate the performance of microsatellites in discriminating ﬁ sh origin (wild or farmed). Results from 298 individuals, including farmed, wild adult and juvenile ﬁ sh were compared with results from 19 neutral markers used in a previous study. All loci were polymorphic, selectively neutral, and had the statistical power to detect signi ﬁ cant population differentiation. Global FST was similar to that estimated using 19 loci (0.019 and 0.023, respectively), while pairwise comparisons identi ﬁ ed farmed populations as the main drivers of genetic divergence, with a much higher magnitude of overall genetic differentiation within farmed populations (0.076) than that estimated using the 19 neutral microsatellite loci (0.041). Bayesian structural analysis showed that the PRL, GH, and RAMP3 markers were able to distinguish farmed from wild populations, but were not able to distinguish different wild groups as 19 neutral microsatellite markers did. Farmed populations of different origins were assigned to a separate cluster with a high individual assignment score ( > 88%). It appears that the candidate markers are more in ﬂ uenced by arti ﬁ cial selection compared to neutral markers. Further validation of their ef ﬁ ciency in discriminating wild, farmed, and mixed ﬁ sh origins using a more robust sample size is needed to ensure their potential use in an escaped ﬁ sh monitoring programme.


Introduction
Environmental sustainability and increased use of marine habitats have been described as the greatest challenges facing the aquaculture industry today (Le Féon et al., 2021). Aquaculture has become the world's fastest growing food production sectors, though this growth comes with serious concerns for the environment in which it operates. These include pollution from nutrients and chemicals, the spread of diseases, habitat alteration, and interbreeding with wild conspecifics (Buschmann et al., 2006;Ford and Myers, 2008).
Fish escapes from cage farms are common due to construction or operational failures (Jackson et al., 2015).
The contribution of escaped fish to natural populations varies widely among farmed species. A recent meta-analysis indicated that one-third of the world's marine ecoregions are threatened to some degree by escaped farmed fish (Atalah and Sanchez-Jerez, 2020). Namely, escaped fish can have a variety of ecological and genetic impacts on native populations through interbreeding and competition for food and habitat (Jonsson and Jonsson, 2006;Šegvić-Bubić et al., 2011aŠegvić-Bubić et al., , 2011bŠegvić-Bubić et al., , 2014Šegvić-Bubić et al., , 2017. Genetic introgression of escapees can lead to reduced overall fitness of wild populations (Fleming et al., 2000;Glover et al., 2012). Farmed fish typically originate from a small number of parental broodstock characterised by a lower effective population size compared to the wild population. Such a scenario has previously been observed in sea bass and sea bream farming (Brown et al., 2015;Šegvić-Bubić et al., 2017;Žužul et al., 2019;Polovina et al., 2020;Toledo-Guedes et al., 2021). Due to the Ryman-Laikre effect, genetic exchange between captive and wild populations could lead to a reduction in effective population size and genetic diversity of local wild populations (Waples et al., 2016). Depending on the magnitude and frequency of escapes, changes in allele frequency and the introduction of new genetic material may have synergistic effects on certain physiological processes and behaviours of natural populations (Solberg et al., 2020).
Aquaculture of marine fish in Croatia is carried out exclusively in floating cages in coastal or semi-coastal locations. Production of European seabass (Dicentrarchus labrax) and gilthead seabream (Sparus aurata) amounted to 12,000 tonnes in 2019 (FEAP, 2020). Since there is only one national hatchery, most fry are still imported from France and Italy. Unfortunately, many Mediterranean countries, including Croatia, do not have an action plan to prevent and manage escapes (Arechavala-Lopez et al., 2018). Recently, a large number of gilthead sea bream from the eastern Adriatic Sea, consisting of fish of different origins (wild vs. farmed vs. wild farm-associated) and ontogenetic stages (adult vs. juvenile) in 22 populations, were studied using a panel of 19 neutral microsatellite loci in several consecutive years. Hybrids and farmed sea bream were detected at an intermediate level in the natural environment. The total number of escapes was nearly 10%, and more than half were captured in the proximity of aquaculture sites (Žužul et al., 2019). A similar scenario was observed for farmed European seabass from the eastern Adriatic and from the waters of Cyprus, where the proportion of escaped fish in wild samples ranged from 14-15% (Brown et al., 2015;Šegvić-Bubić et al., 2017). This situation indicates the urgent need for a more accurate and cost-effective tool for fishery operators to control farm fish origin and recapture in the case of escape.
Prediction of genes associated with biological traits plays an important role in selection and breeding programmes in aquaculture, where molecular markers have been used to generate linkage maps for important economic phenotypic traits such as growth, sex determination, pathogen resistance, and environmental adaptation (Yue, 2014). Single-nucleotide polymorphisms (SNPs) have become the marker of choice in aquaculture and have proven successful in detecting and managing escaped Atlantic salmon . Microsatellite length variability in candidate genes such as growth hormone (GH) and prolactin (PRL) have the capacity for revealing populations under natural selection (Chaoui et al., 2012). Since these markers have never been tested on gilthead sea bream subjected to artificial selection, the objective of this study was to investigate the potential utility of microsatellites located in candidate genes to identify the origin of farmed fish. Samples were sourced from populations with known genetic backgrounds tested using 19 neutral loci (Žužul et al., 2019) and consisted of farmed, wild adult and juvenile individuals.

Sampling and microsatellite genotyping
A total of 1586 gilthead seabream were collected in the waters of the eastern Adriatic Sea between July 2015 and January 2017. The dataset included wild juvenile and adult fish, wild farm-associated and adult farmed fish of different broodstock origins (Western and Eastern Adriatic, Ionian Sea and Atlantic Ocean) and farmed in Croatian waters. The farms studied have been working with sourced fingerlings from French, Italian or national hatcheries for more than 20 years and each has an annual production capacity of 100 tonnes. Based on their production, these are important farms in the region. The wild farm-associated fish are samples collected in close proximity to the two largest tuna farms, where they are present year-round and feed on tuna baitfish loss.
Total genomic DNA from fish fins was extracted by proteinase K digestion followed by a simplified DNA isolation procedure (Laird et al., 1991). The total data set was first genotyped with 19 neutral microsatellite markers (SST; Lee-Montero et al., 2013; the results are published in Žužul et al., 2019). For this study, a subset of 298 individuals from nine sites of different origins (Tab. 1, Fig. 1) was genotyped with a multiplex comprising of three microsatellite loci linked to three candidate genes (cgSST), including prolactin (PRL), growth hormone (GH), and receptor activity modifying protein 3 gene (RAMP3) (see Tab. S1; Astola et al., 2003;Launey et al., 2003;Almuly et al., 2005), while genotype data of SST were obtained from the GenoBase of the Institute of Oceanography and Fisheries (http://jadran.izor.hr/~tsegvic/ aquapop/GenoBase.html). It should be noted that the current wild population dataset was created with individuals that had with neutral markers in Žužul et al. (2019) a Bayesian assignment score with Structure analysis greater than 90%, representing the fish group of "pure wild origin". Amplification of cgSST markers was performed in 10 mL reactions, and the final concentrations of all primers were uniformly set at 0.2 mM. PCR conditions were as follows: initial denaturation at 95°C for 5 min, 26 cycles at 95°C for 30 s, annealing at 60°C for 90 s, and elongation at 72°C for 30 s; and then a final elongation of 30 min at 60°C. The ABI3130 automated sequencer (Applied Biosystems) was used to separate fragments, while electropherograms and genotypes were analysed using GeneMapper software v.3.5 (Applied Biosystems).

Genetic diversity and population structuring
The presence and frequency of null alleles were tested using MICROCHECKER 2.2.3 software (Van Oosterhout et al., 2004) and FREENA (Chapuis and Estoup, 2007), respectively. GENEPOP v.3.1b (Raymond and Rousset, 2003) was used to perform an exact test for linkage disequilibrium (LD) and to test for significant deviations from Hardy-Weinberg equilibrium (HWE) for each locus and population. The Bonferroni correction (Rice, 1989) was applied to adjust significance levels for multiple tests. The mean effective number of alleles across all loci (Ae) and the mean number of alleles per locus (A) were calculated using POPGENE v.1.32 (Yeh et al., 2000). The inbreeding coefficient (FIS) and allelic richness (Ar) were calculated using FSTAT v.2.3 (Goudet et al., 2002), while the polymorphism information content (PIC) of each locus was estimated using Cervus v3.0.3 (Kalinowski et al., 2007). ARLEQUIN v.3.5 software was used to calculate observed (Ho) and expected (He) heterozygosity. Deviation from neutrality was performed in LOSITAN (Antao et al., 2008) using the FST method for outlier detection (Beaumont and Nichols, 1996) for candidate loci. A stepwise mutation model was applied with the following settings: 50,000 simulations, 95% CI, forced mean FST with a false discovery rate of 0.05.
Statistical power of genetic homogeneity tests for the set of markers and sample size was evaluated using POWSIM (Ryman et al., 2006). Pairwise and global FST values were calculated using ARLEQUIN v.3.5, with statistical significance assessed after 10,000 permutations. The Bayesian clustering program STRUCTURE 2.3.4 (Pritchard et al., 2000) was used to analyse the genetic structure of the included populations, with the parameters: population admixture and a priori location (LOCPRIOR) with a burn-in of 10,000 interactions followed by a run of 500,000 MCMC steps and K values from 1 to the maximum number of groups examined, with 20 replicates each. The most likely number of clusters was determined using the ad hoc statistic deltaK implemented in Structure Harvester 0.6.93 (Earl and von Holdt, 2012) and visualised using CLUMPP (Jakobsson and Rosenberg, 2007) and DISTRUCT (Rosenberg, 2004). The same analysis was repeated for i) the data set of 19 SST genotypes, and ii) the dataset combining the genotypes of the three cgSSTs and 19 SSTs for 298 individuals each. In addition, a non-model-based discriminant analysis of principal components (DAPC) method implemented in the Adegenet package (Jombart, 2008) for R was applied to the original and combined datasets, with the retained number of principal components (PCs) optimised using the xvalDapc function (Jombart, 2008).

Genetic diversity
A total of 298 Sparus aurata individuals were genotyped with three cgSST microsatellite markers, while genotype data for the SST dataset were obtained from GenoBase, IOR (Tab. 1, Fig. 1). The percentage of missing data per locus ranged from 0 to 1.5%, with an average of 0.3% and 0.8% for the SST and cgSST datasets, respectively.
Several populations showed significant deviation from Hardy-Weinberg equilibrium with trends toward heterozygote deficiency at the candidate gene locus (PRL) (Tab. S2, Supplementary information). The PRL locus showed the presence of null alleles at a low frequency (<8%), and this locus was retained because the estimated F ST values with and without application of the ENA correction method were Tab. 1. Information of sampling locations, year and population code, along with the origin and number of individual gilthead seabream that were genetically assayed with 3 microsatellites linked to three candidate genes (cgSST). All loci examined were polymorphic (mean PIC > 0.5), with the number of alleles per locus ranging from 4 to 38 for the SST and from 10 to 21 for the cgSST dataset (Tab. S2). The expected heterozygosity (He) ranged from 0.73 to 0.81 for the SST dataset and from 0.65 to 0.84 for He of the cgSST dataset. The observed heterozygosity (Ho) was similar among the studied populations (0.72-0.73) for the cgSST dataset, whereas Ho ranged from 0.76 to 0.72 for the SST dataset, with the lowest value found in the farmed group (Tab. 2). Concerning the cgSST loci examined, only the effective number of alleles per locus (5.1 vs. 8.0) and allelic richness (10.1 vs. 7.5) were found to be significantly reduced in the farmed populations in contrast to wild populations. The inbreeding coefficient, F IS , ranged from 0.01 to 0.08 in the SST dataset and was significantly greater than zero in 2 out of the 9 populations in the cgSST dataset, ranging from -0.03 to 0.12 (Tab. 2). Among the cgSST loci, the PRL locus showed a different distribution of alleles when populations were grouped by origin (wild vs. farmed), where alleles 328 and 344 were more frequent in the farmed than in the wild group (22 vs. 9%; 23 vs. 7%, respectively). No major changes in allele frequency distribution between fish origins were detected for the other two loci (Fig. S1).

Genetic differentiation among populations
The performance of cgSST markers to detect significant population differences was high. Simulations in POWSIM yielded a 95% probability (x 2 , Fisher's test) of detecting a true differentiation of F ST = 0.005 under different scenarios of effective population size (Ne) and number of drift generations (t) with 1000 replications (Tab. S3). Following tests of neutrality using the LOSITAN program, all three loci linked to candidate genes fell into the neutral category. The global F ST across all nine populations was 0.023 (p < 0.001) for the SST dataset and 0.019 (p < 0.001) for the cgSST dataset, demonstrating a low to moderate level of differentiation between populations. Pairwise F ST across all gilthead seabream samples ranged from -0.003 to 0.085 (Tab. 3), with 29 of 36 pairwise comparisons significant for the SST dataset and 18 of 36 significant for the cgSST dataset (p < 0.001 after Bonferroni correction). All significant cgSST comparisons were found between the farmed and the other populations, as well as among the three farmed populations, while an additional significant comparison was found between wild adults and wild farm-associated and juveniles for the SST dataset (Tab. 3).
Structure analysis of the three cgSST dataset of nine gilthead seabream populations revealed the presence of four clusters (K = 4, Fig. 2a). All farmed populations of different origins (CRO, ITA, FR) were each assigned to a separate cluster (blue, yellow, and red) with a high individual assignment score (≥88%) for each cluster. In contrast, wild populations from different sources (adults, juveniles, farmassociated) showed homogeneity and were assigned to the fourth cluster (green) with high assignment values ranging from 0.6 to 0.95% (Tab. S4). When hierarchical structuring was applied, no further structure was detected within the wild cluster. Structure analysis of the dataset containing 19 SSTs and the dataset in which the genotypes of 19 SSTs were added to those of the three cgSSTs revealed K = 5 (Fig. 2b, Fig. S2). This corresponds to the same pattern for the farm populations as for the cgSST dataset, though the wild populations that previously showed homogeneity in the cgSST dataset were assigned to two separate clusters (orange and green; Fig. 2b). The orange cluster included only wild-sampled populations, while the green cluster included wild farm-associated populations and juveniles, respectively. The DAPC analysis showed a similar pattern of clustering to that observed in STRUCTURE (Fig. S3).

Discussion
In the present study, three cgSST microsatellite markers were used for origin screening of 298 specimens of sea bream (wild adults and juveniles, farmed associated adults, and farmed individuals of different broodstock origin) collected in the coastal region of the eastern Adriatic Sea. Data obtained on the same individuals were compared with the results of 19 putative neutral markers (Žužul et al., 2019). These cgSST markers were selected because (1) their molecular functions are associated with traits of high interest for aquaculture (growth rate, osmoregulatory capacity) and are therefore subject to artificial selection in farmed fish, and (2) the GH and PRL genes have known effects on quantitative traits, e.g., growth rate or osmoregulatory capacity in fish, where several authors have linked the length variation of a polymorphic microsatellite in the promoter region of a gene to its transcriptional activity (Almuly et al., 2005(Almuly et al., , 2008Astola et al., 2003;Blel et al., 2010;He et al., 2012).
Candidate gene markers were identified as selectively neutral in the present study, in contrast to previous reports on GH and PRL loci when habitat-based genetic differentiation was tested in gilthead seabream (Chaoui et al., 2012;Guinand et al., 2016), but their power to detect significant population differentiation between wild and farmed populations was high. Indeed, farmed populations were the main driver of genetic divergence (Tab. 3), as observed in previous studies (Šegvić-Bubić et al., 2011b;Loukovitis et al., 2012;Cossu et al., 2019), although the magnitude of overall genetic differentiation within farmed populations was much larger (0.076) than that estimated with 19 neutral microsatellite loci (0.041; Žužul et al., 2019). For both types of markers, significantly lower allelic richness, effective size, and heterozygosity were observed in farmed populations compared with wild populations (Tab. 2). Such a pattern of indices has been reported for several fish species (Glover et al., 2013;Šegvić-Bubić et al., 2017;Cossu et al., 2019) and generally reflects high variance in the family size of breeding populations used in hatcheries and, in the case of gilthead seabream, a bias in the contribution of males and females to each mass spawning event (Brown, 2003). Reduced effective population size due to limited broodstock size may favour both the occurrence of the founder effect and genetic drift, which in turn may lead to a loss of genetic diversity (Loukovitis et al., 2012) and consequently limit the potential for selective breeding. The extent to which diversity is lost depends on several factors, including the number, size, and origin of broodstocks and spawning methods (García-Fernández et al., 2018).
While the polymorphism information content (PIC) for the PRL locus was the highest among the cgSST loci, only PRL showed significantly positive F IS values exclusively affecting wild populations (Tabs. S1 and S2). The presence of heterozygote deficiencies, which can be partially explained by the presence of null alleles, was previously noted for the candidate gene when genetic variation patterns in relation to habitat type were studied in wild gilthead sea bream (Chaoui et al., 2012). Furthermore, only the PRL locus showed a bias in Table 3. Pairwise FST values based on 3 cgSST loci (below diagonal) and pairwise FST values based on 19 neutral SST loci (above diagonal) among 9 populations of gilthead seabream from the Adriatic Sea, including wild, farmed, farm-associated and wild juvenile populations. Significant FST values are underlined at p < 0.001 (Bonferroni correction). Population codes are explained in Table 1 the distribution of allele frequencies between the pooled wild and farmed populations, and the contribution of two alleles (328 and 344) was significantly increased in the farmed populations in contrast to wild populations (Fig. S1). The discrepancy in the distribution of allele frequencies within wild populations represented by adults and juveniles from different areas of the eastern Adriatic was not captured. With regard to the genetic structuring of the populations tested in this study, Bayesian structural analysis showed that the markers PRL, GH, and RAMP3 were able to distinguish farmed from wild populations and, in addition, to assign farmed populations of different origins (CRO, ITA, FR) to a separate cluster with a high individual assignment score (>88%, Fig. 2, Tab. S4). However, the tested markers were not able to distinguish different wild groups (wild adults, juveniles and farm-associated) identified when using 19 neutral microsatellite markers (Fig. S2). It appears that candidate markers are more affected by artificial selection and therefore have great potential to be used as a tool for farm origin screening in an escaped fish surveillance programme. The lack of genetic structure within wild adults (15WT and 16WV) and reduced genetic connectivity with wild offspring and tuna farmassociated collections was also reported by Žužul et al. (2019) with a larger dataset. It could be argued that wild populations sampled over several years from large shoals that first emerge in the northern Adriatic in autumn and then migrate to the southern parts, perform exclusively trophic north-south migrations along the eastern Adriatic coast, and spawning occurs elsewhere in the Mediterranean. On the other hand, Žužul et al. (2019) confirmed a perennial genetic connection of wild fish associated with tuna farms and offspring from the coastal nursery areas, suggesting that tuna farm environments act as spawning grounds for resident sea bream. The permanent and very frequent aggregation behaviour of sea bream in the bottom layers of semi-offshore tuna farms (Stagličić et al., 2017) was primarily attributed to the FAD effect and the increased availability of tuna baitfish losses composed of small pelagic species.
It should be noted that cases of farm escapees or hybrids in the wild fish origin observed with 19 neutral markers (Žužul et al., 2019) were intentionally omitted from the present study. The wild population dataset was created with individuals of pure origin, i.e., individuals that had a Bayesian assignment score with neutral markers greater than 90%, because the goal was to first test the performance of the three loci in assigning individuals to their origin. Such a design limited the ability to evaluate the performance of candidate genes to the detection of wild and farmed hybrids and introgression per se. As the assignments of farm individuals were consistent with the results of 19 putative neutral markers, further validation should be performed on individuals of mixed origin. This step is very important for the management of escaped fish to clarify whether the candidate markers reflect the admixture pattern of the putative neutral loci, which in turn allows for the detection of hybrids.
Complementary to the genetic approach, a variety of origin descriptors are available to identify escaped gilthead seabream in wild populations. The simplest yet cheapest way to distinguish fish origin is to use external characteristics in combination with morphometrics (e.g., body and otolith shape, condition index) and organoleptic descriptors, but their use is limited to early escaped fish (Arechavala- Lopez et al., 2013;Šegvić-Bubić et al., 2014;Talijančić et al., 2019Talijančić et al., , 2021. Recent studies have shown that when analysing body shape and otolith characteristics, attribution of farm seabream to their true origin is more than 75% correct (Talijančić et al., , 2021, while Šegvić-Bubić et al. (2020) showed that the pattern of scale shape, supported by scale microchemistry, can serve as a discriminator in controlling recent escapes.
The application of diagnostic SNP arrays is routinely used in the Norwegian monitoring programme to investigate the extent of farmed escapees and introgression into wild salmon populations (Karlsson et al., 2011(Karlsson et al., , 2014Diserud et al., 2020). The pace of genetic progress in species important for Mediterranean aquaculture has recently accelerated through the work of several European projects. The recently developed and publicly available tools for high-throughput genotyping of 30 K SNPs for European seabass and gilthead seabream (Peñaloza et al., 2021) as well as those in Griot et al. (2021) provide a new tool that can be used to assess the genetic contribution of aquaculture escapees to wild populations. Surprisingly, none of the regulatory frameworks for aquaculture in the Mediterranean includes the management of escaped fish, in contrast to countries with highly developed marine aquaculture such as Norway, Canada or Chile.
In the present study, selected cgSST markers demonstrated their potential to discriminate the origin of fish (wild vs. farmed, Bayesian score assignment >88%), although selective breeding of gilthead seabream is still in its infancy and strains of several generations are under mass or family selection (Janssen et al., 2017). Nevertheless, further validation of their efficiency using a larger sample is needed to ensure their usefulness for the management of escaped gilthead sea bream.
Altogether, there is a solid knowledge base that can become an integral part of the management system for escaped fish in the Mediterranean Sea. In coastal areas, the spatial distribution of escaped fish is closely associated with fish farms (Šegvić-Bubić et al., 2017), where fish remain near the farm for several weeks to adapt. This time frame is considered the most appropriate for recapture of escaped fish, as described in Izquierdo-Gomez et al. (2016), where almost half of the biomass of escaped fish was captured in the first days after escape. Therefore, identification and recapture of escaped fish is an important step to address this emerging problem.