Overview

Accurate assignment of amino-acid side chains remains a major bottleneck in macromolecular structure determination, particularly for low-resolution structures, samples derived from natural sources, and proteins exhibiting sequence heterogeneity. Even when global validation metrics are satisfactory, local side-chain ambiguities can propagate errors into functional interpretation and computational downstream analyses. The SEQUENCE SLIDER framework provides a foundation for automated sequence assignment by integrating structural biology data and phylogenetic analysis (PMID: 32133987; PMID: 35104880). However, local implementation and large-scale calculations limit accessibility for many experimental biologists.

SEQUENCE SLIDER-ML is an interactive web server that enables users to evaluate, rank, and visualize residue-specific side-chain hypotheses directly from structural data. Users upload a coordinate file (PDB or mmCIF) together with an experimental electron density map. The server analyzes the local structural environment and ranks alternative amino-acid hypotheses for each residue using an internal machine-learning–assisted confidence scoring model.

Results are presented through interactive sequence-level and 3D visualizations (with moorhen and GitHub repository), allowing users to inspect alternative hypotheses in the context of the experimental map. The server highlights low-confidence cases that require expert judgment, providing objective decision support rather than automated assignments.

The machine-learning component is used exclusively as an internal scoring engine. It is trained on curated experimentally solved protein structures and validated using blind, independent test sets, with model interpretability provided through SHAP analysis.

The server complements existing model-building and refinement software and is particularly useful for ambiguous regions, heterogeneous samples, and post-refinement validation.

Usage

Select PDB File:

Load the PDB file (as finished as possible) for sequence validation and evaluation.

Select MTZ File:

Load the MTZ file having one single label for either amplitudes or intensities.

Click on RUN SEQUENCE SLIDER:

  • Each one of the 20 natural amino acids for each residue position will be modeled and have its theoretical electron density analyzed in comparison to the experimental one
  • Local structural analysis is performed
  • ML model scores probability and confidence of each hypothesis

System Log and Status:

  • Shows progress of calculations
  • Each one of the 20 natural amino acids for each residue position
  • A link will be shown, to which results may be seen when finished, output

Output

Sequence Confidence Visualization

Figure having the sequence by chain showing confidence of each assignment for each residue present in the PDB:

Sequence confidence visualization

Residue Inspection

User can evaluate each residue in Inspect field by typing its single letter chain "Chain" field and its residue number in "Residue" field and clicking "Go":

Residue inspection interface

Moorhen page will be opened containing structure and the side-chain polder omit map (PMID: 28177311).

All 20 Natural Modelled Side-Chains:

All 20 modelled side-chains

Correct Assignment Example (28W):

Correct side-chain assignment

Other Output Files:

  • FASTA Files: predicted_sequence.fasta - Contain best scored sequence
  • Prediction Figure (25 residues per chunk): <pdb_id>_logo_chunk_<number>.jpg - Contain global probabilities of each residue
  • Prediction Results (Table): <pdb_name>_result_df.cs - Contains PDBid, Chain, ResN (residue number), ResT (residue type), probabilities for each amino acid (A, C, D, ..., Y), Predicted_AA, and Probability
  • Logs: sequence_slider.log

Color Scheme

The sequence logos and HTML visualization use the following color scheme for amino acids:

Acidic (D, E) #FF0000
Basic (K, R) #0000FF
Basic (H) #0066CC
Polar (S, T, N, Q) #00AA00
Polar (Y) #00CC66
Hydrophobic (A, V, I, L, M, F, W, P) #1f77b4
Glycine (G) #FFAA00
Cysteine (C) #FFFF00

Machine-learning Model Performance and Evaluation

The XGBOOST performance by class considering a blind validation set of real residue types and their actual prediction:

ML model performance metrics
Class Precision Recall F1-Score Support

Contact

For support or inquiries, contact:

Rafael Junqueira Borges:

rjborges@unicamp.br

License

This software is provided for research purposes. Please contact the authors for licensing details.