Does MATK, TRNH-PSBA, TRNL-F And Its1 Identification Mirror Morphological Identification in Hibisceae?

Abstract

DNA barcoding applied to conservation, and food authentication contributes to attaining the United Nations goals for sustainability. However, congruence between morphological and molecular data in plant species identification needs further study in specific taxons sampled in particular areas. Malaysian samples were used to test congruence between morphologically identified Hibisceae species with their identification based on matK, trnH-psbA, trnL-F , and ITS1. Morphological characters provisionally identified the taxons as Hibiscus rosa-sinensis , Hibiscus sabdariffa , and Malvaviscus arboreus previously named Hibiscus malvaviscus . Basic local alignment search tool (BLAST) and maximum likelihood tree using ITS1 sequences ambiguously identified the H. rosa-sinensis to several Hibiscus species. BLAST using ITS1 identified M. arboreus to the closely related M. penduliflorus , due to absence of M. aboreus ITS1 sequences in the database. Still, the maximum likelihood tree formed a separate clade suggesting the two species were different. BLAST using trnH-psbA identified species to the morphologically identified taxons, but tree topology could only discriminate M. arboreus . BLAST of trnL-F identified both H. rosa-sinensis and M. aboreus as H. rosa-sinensis , as the database lacks M. arboreus trnL-F sequences. The trnL-F locus also incorrectly identified H. sabdariffa as Gossypium nelsonii using BLAST and could not discriminate all three species using tree topology. Similarly, the matK could not confirm the identity of any of the three species. DNA barcoding of Hibisceae is not trouble-free, and thus needs further study on loci and analysis. However, DNA barcoding can guide the morphological examination to be done selectively and more effectively.

Keywords: HibisceaetrnL-Fmaturase KtrnH-psbAITS1species identification

Introduction

Species within the tribe Hibisceae such as Hibiscus sabdariffa L., Hibiscus rosa-sinensis and Malvaviscus arboreus (formerly known as Hibiscus arboreus) are valuable as sources of medicine or food in society (Chen et al., 2010). Correct identification of these species ensures the safety and efficacy of their products.

Morphology based identification, while the gold standard for species identification (Chan et al., 2014), is not straightforward. H. rosa-sinensis has many forms (Bates,1965; Singh & Khoshoo, 1989), perhaps due to the mixed ancestry of H. rosa-sinensis, which involved mating’s of many species including H. kokio, and H. arnottianus (Palmer & Palmer, 1954; Wilcox & Holt, 1913). In the case of M. arboreus, the Malvaviscus penduliflorus is very similar and sometimes regarded as a variety of that species (M. arboreus var. penduliflorus; Schery, 1942). Species delimitation within the Furcaria section to which H. sabdariffa belongs is also difficult due to overlapping morphologies (Sivarajan & Pradeep 1996).

The difficulties and arising errors in morphological identification and taxonomy may be overcome by combining it with molecular identification (Batovska et al., 2016). While molecular based studies (RAPD-PCR, ISSR-PCR or AFLP) have been used in identifying Hibiscus species (Kadve et al., 2012; Khafaga, 2013; Omalsaad et al., 2014; Tang et al., 2003), DNA barcoding has not been widely used. Loci such as atpB, rbcL, and ndhF, have focused on phylogeny at higher taxonomic levels (Pfeil et al., 2002; Pfeil & Crisp, 2005; Tate & Simpson, 2003). Only two studies specifically on Hibisceae and DNA barcoding were found in November of 2019, that is, the studies of Poovitha et al., (2016) and Liu et al. (2014).

DNA barcoding is an easy method (Kress & Erickson, 2008) which uses one or a few short, standardised DNA region(s) and checks this query against a database (Hebert et al., 2003). As such, DNA barcoding is widely used (Kress, 2017), and could help overcome the problems with identification of Hibisceae using morphology.

Among the barcoding loci recommended for species-level identification are the maturase K gene (matK) (Hollingsworth et al., 2011), intergenic spacer trnH-psbA (Kress et al., 2005), internal transcribed spacer, ITS1 (Wang et al., 2015) and intergenic spacer trnL-F (Hao et al., 2009). These loci, however, do not perform well across all plants (Dong et al., 2015). The matK locus exhibits low amplification and sequencing rates due to lack of universality of primers and the presence of mononucleotide repeats (Yu et al., 2011). The trnH-psbA marker often poses a problem in sequencing due to homopolymer tails resulting in stutter peaks (Shinde et al., 2003). The ITS locus is a multicopy marker, where homogenisation may be incomplete (Harpke & Peterson, 2006), leading to poor sequence quality. The trnL-F locus, in turn, poses a problem because it may have mononucleotide repeats, duplicated copies of the trnF gene as in Brassicaceae, or lost the intergenic spacer as in some taxa (Hollingsworth et al., 2011).

Problem Statement

Correct species identification is essential in all fields of biology (Tosh et al., 2016). As expertise for species identification is often unavailable (Coissac et al., 2012), molecular identification has been proposed (Hebert et al., 2003). However, molecular identification too has its problems (Hollingsworth et al., 2011) and optimising these molecular tools has the potential to make species identification more accessible and accurate. The success of species identification using DNA barcoding depends very much on the taxa in question, as much as the utilised marker (Amandita et al., 2019). For example, in Hibiscus, including H. rosa-sinensis and H. sabdariffa, matK was found to be the most suitable when compared to rbcLa , trnH-psbA and ITS2 (Poovitha et al., 2016). However, to discriminate Hibiscus syriacus from its adulterants correctly, ITS2 was recommended (Liu et al., 2014). While there are limited studies on using DNA barcodes to identify members of the Hibisciscae, sequences have been uploaded on the database for example by Sukrong, Phadungcharoen, and Tungphatthong (2019) for psbA-trnH (GenBank: LC461812.1), which adds to the reference database required for DNA barcoding to succeed. Biodiversity assessments have also contributed sequences of these species to the database (Papadopoulou et al., 2015). However, biodiversity assessments, may not include closely related species and while a barcode may function well for comparisons of distant species, it may not work as well when compared to closely related species (Yan et al., 2011). Additionally, sequences of the same species from a specific region such as Malaysia may show a high level of variation from the sequences on the National Center of Biotechnology Information, NCBI (Yang et al., 2017) due to local evolutionary forces. The lack of studies on DNA barcoding amongst the closely related members of Malvaceae, comparing a local sample against the global database is thus an area that should be studied.

Research Questions

The question is, whether there is congruence between DNA based identification using ITS1, trnL-F , trnH-psbA , and matK with morphological identification in Hibisceae? We hypothesise that each of the four DNA sequences tested will provide the same species assignment as morphology in the three species of Hibisceae used.

Purpose of the Study

This study aims to:

1) identify three Hibisceae species in Malaysia, using morphology, and

2) test if the loci (ITS1, trnL-F , trnH-psbA , and matK ) can be amplified, sequenced, and confirm the identities of species using the Basic Local Alignment Search Tool (BLAST) and maximum likelihood tree topology.

Research Methods

Morphological identification, followed by molecular identification, are described below.

Sample Collection and Morphological Identification

Three individuals from each of the three morphologically distinct taxons within the Tribe Hibisceae were obtained and characterised using the morphology of flowers and leaves. Ten replicates were observed or measured for each species.

Molecular Identification

Molecular identification involved obtaining sequences, checking for noise and evaluating these sequences using two sequence analysis methods for better reliability.

DNA Extraction, Amplification, and Sequencing

A modified CTAB protocol (Doyle & Doyle, 1987) was used to extract DNA. The modification involved the addition of 0.04g polyvinylpyrrolidone and 5.0µl -mercaptoethanol per ml of buffer. Polymerase Chain Reactions (PCR) using My Taq™ Mix (Bioline, USA) were performed according to the manufacturer’s protocol using primers and conditions listed in Table 01 . PCR products were purified and sequenced at MyTACG Bioscience Enterprise.

Table 1 -
See Full Size >

Sequence Analysis

DNA Sequence Assembler v4 software (2013) was used to obtain the quality of sequences, generate consensus sequences, remove low-quality sequence ends and trim out primer sequences. Further analysis, used consensus sequences or if not produced, used single reads, having a Phred score > 30, which indicated moderate to high quality (Ratnasingham & Hebert, 2007).

The assignment to a species used the final (query) sequence to compare against the NCBI database using BLAST (Altschul et al., 1990) for the best match. The lower E-value, higher score, and percentage identity indicated the most probable species (Fassler & Cooper, 2011). Additionally, a phylogenetic tree to determine discrimination between the taxons was drawn using sequences from this study and sequences retrieved from GenBank from the same and closely related species. Sequences were aligned with T-coffee (Di Tommaso et al., 2011). Then the maximum likelihood phylogenetic tree was generated using the best-fitting model of nucleotide substitution for each locus data as indicated by the Akaike Information Criterion (AIC). Mega X software was used to calculate AIC and generate the tree (Kumar et al., 2018). One thousand bootstrap replicates maintained adequate sampling (Pattengale et al., 2009), and bootstrap support was categorised as weak (50 - 70%), moderate (70 - 85%) and strong (>85%), according to Kress et al. (2002). Loci were analysed independently, to accommodate the potentially different histories and rates of change among loci (Maddison, 1997; Kubatko & Degnan, 2007). Monophyly was used to infer species discrimination and true identity.

Detecting Possible Noise

Fragment length and GC content determined using MEGA X (Kumar et al., 2018), when similar to previously reported values indicated the authenticity of DNA loci amplified (Buckler & Holtsford, 1996). Authenticity in the coding locus matK , is indicated by the absence of stop codons in the reading frame determined using tools in the Barcode of Life Database (BOLD) Portal (Ratnasingham & Hebert, 2007). We also ascertained the presence of non-homogeneity and substitution saturation as it would reduce the accuracy of the analysis. The Disparity Index analysis carried out in MEGA (Kumar & Gadagkar, 2001) determined if molecular data were homogeneous, and the Iss statistic, calculated using the program Data Analysis in Molecular Biology and Evolution, DAMBE (Xia & Xie, 2001) determined substitution saturation.

Findings

Morphological and molecular characterisation are presented individually, followed by the comparison of species identification by the two methods.

Morphology

Gross morphology differentiated the three species and identified them as members of the tribe Hibisceae. The three species were identified as Hibiscus rosa-sinensis (HRS-W), Hibiscus sabdariffa (HS) and Malvaviscus arboreus (MA).

H. sabdariffa was identified mainly by the thickened midribs and marginal ribs of the calyx (Figure 01B) as reported by Ross (2003). Additional characters supported this identification such as the bell-shaped corolla of pale pink colour with dark red centre (Figure 01A), the five obovate petals, 0.9 ± 1 cm long staminal column, and bell-shaped calyx (Figure 01B, C), with five triangular lobes of 2.0 ± 0.1 cm length, all of which approximate the values and descriptions reported for H. sabdariffa (Monaco Nature Encyclopaedia, n.d.). The size of the flower (3.4 ± 0.4 cm in width), number of bracts (8–10) and presence of alternate, lobed leaves with toothed margins, 9 ± 1.2 cm long also tally with the characteristics of this species described by Morton (1987). Additional characteristics recorded in this study that is petal length, and width, with means of 2.8 ± 0.4 cm, and 2.1 ± 0.1 cm respectively, as well as peduncle length of mean 0.7 ± 0.3 cm, are smaller than reported in the Monaco Nature Encyclopaedia, (n. d.) but could be due to the different varieties of H. sabdariffa with different characteristics (Torres-Morán, et al., 2011).

M. arboreus was identified to the genus Malvaviscus by it leaf shape (Figure 01H) and floral architecture (Figure 01E) of never opening fully but remaining as a contorted tube, each auriculate petal overlapping the next as reported in Turner and Mendenhall (1993). Naskar and Mandal (2014) recorded ten style branches, with a staminal tube approximately 2 cm long (Figure 01F, G), which tally with the results of this study and differentiate this Malvaviscus species from the other two Hibiscus species used in this study. M. arboreus was distinguished from Malvaviscus pendiliflorus, by the size of its corollas and calyces which have mean lengths of 2.5 ± 0.3 cm and 1.1 ± 0.1cm respectively. These lengths are in the range exhibited by M. arboreus and not M. pendiliflorus (Turner & Mendenhall,1993).

Samples identified as H. rosa-sinensis had characters comparable to that reported for this species by El Sayed, Ateya, and Fekry (2012) as well as other researchers. For example, the leaf was simple, ovate to oblong-lanceolate in shape having an acuminate apex, entire margin in the lower half part and dentate margin in the upper half as in Figure 01K (El Sayed et al., 2012; Salamah, Prihatiningsih, Rostina, & Dwiranti, 2018). El Sayed et al. (2012) also described the epicalyx (Figure 01J) as consisting of green bracts forming a whorl outside the calyx, 6–8 in number, linear-lanceolate shaped, and measuring 0.8–1.2 cm in length, similar to our results. The H. rosa-sinensis bracteoles are also free, unlike in H. sabdariffa (Ayanbamiji, Ogundipe, & Olowokudejo, 2012). The structure of the calyx (Figure 01J) also conformed to that expected in H. rosa-sinensis, that is united near to its half-length, oblong-lanceolate in shape, green in colour and 2.1 ± 0.2 cm in length. Each flower is 5 to7 cm in length and has five free petals as reported by El Sayed et al. (2012). And the number of stigmas is five, as reported by Salamah et al. (2018), as well as Naskar and Mandal (2014) (see Figure 01 D and 01I).

Figure 1: Hibiscus sabdariffa A. Floral morphology B. Epicalyx and calyx C. Staminal column D. Leaf morphology; Malvaviscus arboreus E. Floral morphology F. Staminal column G. Epicalyx and calyx H. Leaf; H. rosa-sinensis I. Floral morphology including staminal column J. Epicalyx and calyx K. Leaf morphology
Hibiscus sabdariffa A. Floral morphology B. Epicalyx and calyx C. Staminal column D. Leaf morphology; Malvaviscus arboreus E. Floral morphology F. Staminal column G. Epicalyx and calyx H. Leaf; H. rosa-sinensis I. Floral morphology including staminal column J. Epicalyx and calyx K. Leaf morphology
See Full Size >

DNA Sequences

The four DNA loci were successfully amplified and sequenced for all the nine samples in the three species studied. Absence of stop codons in the reading frame of the matK sequences, as well as the conformity of length and percentage GC (Table 02 ) of all the sequences obtained in this study to those reported in the literature (Table 02 ) support, though do not ensure the orthology of the sequences. However, among the matK sequences, four comparisons showed lack of homogeneity in the Disparity Index test (ID) calculated from a comparison of the 25 sequences, and substitution saturation (Iss, 1.28 > Iss.c, 0.78) was present. As inhomogeneous data could lead to a biased result (Tamura et al., 2013), and base substitution saturation decreases the amount of phylogenetic information contained in sequence data which disrupts analysis (Xia & Xie, 2001), the phylogenetic tree of this locus is viewed with reservations.

Table 2 -
See Full Size >

According to the best match in BLAST, only trnH-psbA identified all three species to the species provisionally identified by morphology (Table 03 ). However, this correct identification was in the absence of trnH-psbA sequences on the database, of the closely related M. penduliflorus . It would be interesting to see if the proper identification of M. arboreus will be maintained if these sequences were on the database.

BLAST analysis of ITS1 (Table 03 ) identified H. sabdariffa correctly to species level. BLAST of this locus could however only identify the Malvavicus to the genus level. Species-level identification was hindered by the lack of ITS1 sequences of the correct species on the database. BLAST provided ambiguous identification of H. rosa-sinensis as either H. rosa-sinensis , H. clayi , H. arnottianus , or H. kokio , perhaps explained by the ancestry of the H. rosa-sinensis , which involved crossings between many different species.

The BLAST analysis based on the matK locus showed incorrect identification of H. sabdariffa and ambiguous identification of both H. rosa-sinensis and M. arboreus (Table 03 ). These findings are contradictory to those reported by Poovitha et al. (2016) who found that matK was a suitable barcode for Hibiscus including H. rosa-sinensis and H. sabdariffa. Hence, the purity of the species or the evolution of the species at different localities may influence the effectiveness of DNA barcoding to identify species. This data supports the use of regional DNA databases as in the study of Clerc‐Blain, Starr, Bull, and Saarela (2010), which provided better species resolution using a regional DNA database as compared to a global DNA database.

The BLAST analysis of trnL-F (Table 03 ) identified both H. rosa-sinenisis and M. arboreus as the same species, and the H. sabdariffa to an incorrect species. There were no trnL-F sequences on the database as at October 2019 for H. sabdariffa, and M. arboreus, which lead to its identification to the closest species with sequences on the database and this was the wrong species. Thus, it is evident that building up a complete DNA barcode database for all species present on earth is a prerequisite for DNA barcoding to be used accurately.

Table 3 -
See Full Size >

The trnH-psbA sequence-based tree showed that two of the downloaded samples did not cluster together with their species (Figure 02 ). The tree may be biased as the ID showed these sequences to be non-homogenous in 18 comparisons. As these are downloaded sequences from plants originating from different geographical regions, they could have undergone different selection pressures. The effect of geography that is the differences in sequences between samples of the same species obtained locally, nationally, regionally and continentally have shown an increasing lack of monophyly (Bergsten et al., 2012).

Figure 2: Maximum Likelihood Tree based on trnH-psbA inferred by using the Tamura 3-parameter model (Tamura, 1992). Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
Maximum Likelihood Tree based on trnH-psbA inferred by using the Tamura 3-parameter model (Tamura, 1992). Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
See Full Size >

Tree topology of ITS1 (Figure 03 ) identified H. sabdariffa correctly, the samples in this study together with downloaded sequences of the same species forming a monophyletic clade. The ITS1 sequences of M. arboreus showed a separate clade with strong bootstrap support, which reinforces its morphological identification. Tree topology of ITS1 sequences reinforced the ambiguous identification of H. rosa-sinensis explained by the hybrid origins of the H. rosa-sinensis .

Figure 3: Maximum-Likelihood Tree based on ITS1sequences using the Jukes-Cantor model (Jukes, & Cantor, 1969) with a discrete Gamma distribution and invariable sites. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
Maximum-Likelihood Tree based on ITS1sequences using the Jukes-Cantor model (Jukes, & Cantor, 1969) with a discrete Gamma distribution and invariable sites. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
See Full Size >

The matK tree, like the BLAST analysis could not identify the species, as none of the clades, was monophyletic (Figure 04 ). It was concluded that matK was not useful in species identification among the Hibisciceae, probably because the variation is insufficient at this loci because the mutation is restricted in coding locus or alternately because it is not possible to obtain the mutational history because of substitution saturation of these sequences the (Hillis, 1991).

Figure 4: The Maximum Likelihood Tree based on matK sequences inferred using the and Tamura 3-parameter model (Tamura, 1992) using a discrete Gamma distribution (4 categories) and allowing for some sites to be evolutionarily invariable. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa and MA is Malvaviscus arboreus
The Maximum Likelihood Tree based on matK sequences inferred using the and Tamura 3-parameter model (Tamura, 1992) using a discrete Gamma distribution (4 categories) and allowing for some sites to be evolutionarily invariable. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa and MA is Malvaviscus arboreus
See Full Size >

Tree topology unlike BLAST of trnL-F (Figure 05 ) did not show M. arboreus to be closely related to H. rosa-sinensis . Data analysis methods thus influences inferences as has been reported previously (Kreuzer et al., 2019). Tree reliability is dependent on multiple sequence alignment (Xia, 2016), which may be reduced due to the non-homogeneity of sequences (7 out of 105 comparisons). Also compressing information across the sequence into a single measure of genetic similarity, hides specific differences (DeSalle & Goldstein, 2019). Identification by trnL-F in this study is inconclusive.

Figure 5: Maximum Likelihood Tree based on trnL-F sequences inferred using the Tamura 3-parameter model (Tamura, 1992) using a discrete Gamma distribution (4 categories) and allowing for some sites to be evolutionarily invariable. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
Maximum Likelihood Tree based on trnL-F sequences inferred using the Tamura 3-parameter model (Tamura, 1992) using a discrete Gamma distribution (4 categories) and allowing for some sites to be evolutionarily invariable. Bootstrap support is shown next to the branches. HRS-W is Hibiscus rosa-sinensis, HS is Hibiscus sabdariffa, and MA is Malvaviscus arboreus
See Full Size >

Limitations of Barcoding and Revisiting Morphology

The difficulty in identifying M. arboreus lies mainly in the lack of sequences of this species in the database. However, also contributing to the challenge is the lack of agreement on whether they are separate species or sub-species, the name M. arboreus variety penduliflorus is also being used (Turner & Mendenhall, 1993). Besides, occasional hybridisation between M. arboreus and M. penduliflorus is likely (Turner & Mendenhall, 1993), exacerbating the confusion between the two species. The clustering of H. sabdariffa with H. mechowii in three of the loci could be because there is a possibility that H. mechowii is the primitive form of H. sabdariffa (Edmonds, 1991). The ambiguous identification of H. rosa-sinesis, prompted us to make further morphological comparisons. The H. rosa-sinensis leaf morphology (Figure 01K), did not correspond to the leaf morphology of the white-flowered H. arnottianus which was described by Bhat (1995) to be elliptic, with an apex and base which is obtuse and an entire margin. The white petal colour found in our samples is not the reported colour of flowers of H. clayi nor H. kokio (Native Plants of Hawaii, n.d.). Unlike the leaves of the H. rosa-sinensis samples, leaves of H. clayi are smooth, or occasionally toothed near the tip (Native Plants of Haiwaii, n.d.). However, with its hybrid origins, H. rosa-sinensis may be a composite of characters from different ancestors, making it challenging to identify with certainty.

Conclusion

DNA based species identification in this study was not consistent or clear-cut. BLAST analysis of trnH-psbA and morphology-based identification was consistent; however, the tree topology showed ambiguities. ITS1 and matK identified all three species correctly to genus level, though trnL-F could not identify the M. arboreus even to genus level. In H. rosa-sinensis and H. sabdariffa the inconsistencies in identification may be due to their hybrid or evolutionary origins. A different problem exists in identifying the M. arboreus, that is, an incomplete database, combined with different taxonomist having different preferences in the naming of the species. While problems exist, DNA barcoding still has a role in identification. It directed us to review carefully related species and look for morphological characters which could negate or confirm the identification. It highlighted taxons that may be hybrids. And it also directed attention to species whose naming needs to be reviewed by taxonomist. In earth populated by some 391,000 species of vascular plants, this is no small contribution. Additionally, it is useful in cases which do not require species-level resolution, for example, knowing that food or drink contain something other than what is on the label is sufficient warning of danger. The importance of increased sampling within and across all species as well as geographical regions cannot be overstated. Sequence data analysis methods should also be explored to obtain ways which are not impacted by say non-homogeneity of sequences.

Acknowledgments

INTI International University funded this project but had no role in study design, data collection, analysis, or preparation of the manuscript.

References

Copyright information

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

About this article

Cite this paper as:

Click here to view the available options for cite this article.

Publisher

European Publisher

First Online

12.10.2020

Doi

10.15405/epsbs.2020.10.02.34

Online ISSN

2357-1330