In Silico Approach to Construct the 3D Structures of Spike Glycoproteins of Novel Variants of Severe Acute Respiratory Syndrome Coronavirus 2

ABSTRACT

Additionally, the VOC has reduced responses to known therapeutics (14). The mutational changes in the RBD of spike glycoproteins have caused structural modifications, eventually the product of amino acid alteration, which may affect viral pathogenesis (15). A better understanding of variant of concern possible structural alteration and similarity is required. To date, no structural data is available about these novel variants of SARS-CoV-2. Thus the computational approach has been used to model the structure of these proteins that pave the way to novel therapeutic strategies.

MATERIALS AND METHODS NCBI bibliographic database
The research status of protein modeling concerning novel variants of concern of SARS CoV-2 and Omicron was analyzed using the NCBI and GISAID database. The NCBI database and GISAID (https://www.ncbi.nlm.nih.gov; https://gisaid.org) provide a robust knowledge of literature and scientific experiments. This allows us to establish a suitable strategy for this study. A search query of the virus generated all the information needed to know. (2)

Sequence retrieval
The NCBI virus is an integrative, value-added resource devised to endow the retrieval, display, and analysis of a curated collection of virus sequences and large sequence datasets. Protein sequences of the spike proteins of the novel variants of concern (VOC) were retrieved from this database. It contains the protein sequences that are available as a. fasta file and stored for further study. The sequence analysis was conducted using the highlight sequence feature. The database search query had the gene accession number and pango lineage information with other parameters [3][4].

Sequence alignment
The pBlast (Protein Basic Local Alignment Search Tool) is an online search tool (https://blast.ncbi.nlm.nih.gov.). That finds the local similarity between nucleotide and protein sequences. (5, 16). The retrieved sequences were statistically analyzed and compared to the pre-existing repository of protein sequences of the Protein Database Bank (PDB). The aligned sequences sharing the highest percentage of similarity were shortlisted.

Structure Prediction
SWISS-MODEL is an entirely automated web-based homology-modeling server. It is accessible via the ExPASy web server, or the program DeepView (Swiss Pdb-Viewer) predicts the 3D structures of proteins. (21)(22) This server aims to make protein modeling attainable globally for all life science researchers. Constructing a homology model comprises four significant steps: template structure identification (s), target sequence and template structure alignment(s), model-building, and model quality evaluation. (23). The retrieved sequences in fasta file format were used to construct the 3D protein structure using SWISS-MODEL. The template quality used to build the protein structure was analyzed using an inbuilt global and local quality estimation tool. (27-28) (https://swissmodel.expasy.org/). Multiple models were generated for each submitted protein and stored in the PDB format.

Model Analysis
The qualitative analysis of 3D protein structures was done via the MolProbity tool offered by a SWISS-MODEL. (2) The reliability and stereochemical quality of the modeled protein were inspected using the Ramachandran plot for qualitative estimation. (24-25) The favorability score, called Ramachandran's favored region, is designated as the confidence score for each modeled protein. (30) The Ramachandran region was also analyzed, for the conformation of phi and psi angles of the peptide bond, placing them in the favored region. (30-31) The modeled protein having residues in the permitted region (Ramachandran outlier) and Ramachandran favored score was considered for screening the modeled protein. Only proteins that preconize 90% of the residues in the favorable, permitted region and having the highest confidence score were selected.

Model Submission
The predicted models were submitted to PMDB (http:// srv00.recas.ba.infn.it/PMDB/main.php),the public repository database. The manually built 3D structures of the protein are stored in this database. The models published in the scientific literature and validating experimental data can be accessed using this public repository database [2-3].

Building Homology Model
A comprehensive literature search and analysis was done using CoVsurver of GISAID and NCBI virus. The retrieved sequences of the novel variants of SARS CoV-2 from the SARS CoV-2 data hub of the NCBI virus were analyzed and subjected to homology model construction using the SWISS model web tool. The in silico tool yielded 1-5 structural protein models for each entry. The best Model for each protein was selected using the Ramachandran plot.

Ramachandran Plot Validation
The stereochemical quality of the 3D structure of proteins was validated using Ramachandran plot analysis. The reliability of the modeled protein structure was investigated using Ramachandran's favored score obtained by the Molprobity inbuilt within the SWISS-Model online tool. The predicted models having 90% of its residues in the Ramachandran favored or permitted region were considered significant. Consequently, among the multiple models generated for each protein, the proteins exhibiting the highest percentage of residues in the Ramachandran favorable region. Fig. 1 shows the Ramachandran plot analysis of a preferred model with >90% favored region and a least preferred model with <90% Ramachandran favored region. A reckoning of 40 modeled proteins depicted a substantial score in Ramachandran plot analysis and was selected for additional processing.

Submission to PMDB
The 40 proteins were selected based on their Ramachandran plot analysis and submitted to the online database of PMDB (https:// bioinformatics.cineca.it/PMDB/), made available to the public access for research purposes. Table 1. summarizes the details of the constructed Model, PMDB entry ID, and confidence score

CONCLUSION
The study aimed at constructing the 3D protein structures of spike glycoprotein of the novel variants of SARS CoV-2 using a computational approach. The homology modeling was employed to build the modeled structures with a significant confidence score based on Ramachandran plot analysis. The modeled protein structures were deposited to the PMDB database computational protein model made available to the public. Hence, in this study, 3D structures of novel SARS CoV-2 variants of concern have been predicted and deposited to PMDB for public access. This could be significant for drug discovery and the development of targeted drugs that could inhibit the binding of spike glycoproteins with host receptor proteins. The ignited concern in the treatment of the SARS CoV-2 infection is the significant mutations found in the RBD of the spike glycoprotein, which eventually increases the virus transmissibility and evasiveness. Consequently, it is of utmost priority to understand and model the 3D structures of the novel variants of SARS-CoV-2. Thus, the homology modeling approach could help overcome the issue of rapid mutation development and sensitivity. This will also provide an advantage to the structure-based computational drug design studies on coronavirus organism, obliging in developing an effective drug variant to overcome the current challenges healthcare faces to impede the infection.

ACKNOWLEDGMENTS
The author would like to express special gratitude to Prof. Rakesh Kumar Gupta, Principal, and the University of Delhi, for supporting the research conducted. The author declares no competing interests.

DECLARATION OF CONFLICTING INTEREST
The author declares no conflict of interest.

FINANCIAL DISCLOSURE
No financial support was received for this study.