Main Content

seqconsensus

Calculate consensus sequence

Syntax

CSeq = seqconsensus(Seqs)
[CSeq, Score] = seqconsensus(Seqs)
CSeq = seqconsensus(Profile)
seqconsensus(..., 'PropertyName', PropertyValue,...)
seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue)

Arguments

SeqsSet of multiply aligned amino acid or nucleotide sequences. Enter a character array, string vector, cell array of character vectors, or an array of structures with the field Sequence.
ProfileSequence profile. Enter a profile from the function seqprofile. Profile is a matrix of size [20 (or 4) x Sequence Length] with the frequency or count of amino acids (or nucleotides) for every position. Profile can also have 21 (or 5) rows if gaps are included in the consensus.
ScoringMatrixValue

Either of the following:

  • Character vector or string specifying the scoring matrix to use for the alignment. Choices for amino acid sequences are:

    • 'BLOSUM62'

    • 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

    • 'BLOSUM100'

    • 'PAM10' increasing by 10 up to 'PAM500'

    • 'DAYHOFF'

    • 'GONNET'

    Default is:

    • 'BLOSUM50' — When AlphabetValue equals 'AA'

    • 'NUC44' — When AlphabetValue equals 'NT'

    Note

    The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the 'Scale' property to specify an additional scale factor to convert the output score from bits to another unit.

  • A 21x21, 5x5, 20x20, or 4x4 numeric array. For the gap-included cases, gap scores (last row/column) are set to mean(diag(ScoringMatrix)) for a gap matching with another gap, and set to mean(nodiag(ScoringMatrix)) for a gap matching with another symbol.

    Note

    If you use a scoring matrix that you created, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix.

Note

If you need to compile seqconsensus into a stand-alone application or software component using MATLAB® Compiler™, use a matrix instead of a character vector or string for ScoringMatrixValue.

Description

CSeq = seqconsensus(Seqs), for a multiply aligned set of sequences (Seqs), returns a character vector with the consensus sequence (CSeq). The frequency of symbols (20 amino acids, 4 nucleotides) in the set of sequences is determined with the function seqprofile. For ambiguous nucleotide or amino acid symbols, the frequency or count is added to the standard set of symbols.

[CSeq, Score] = seqconsensus(Seqs) returns the conservation score of the consensus sequence. Scores are computed with the scoring matrix BLOSUM50 for amino acids or NUC44 for nucleotides. Scores are the average euclidean distance between the scored symbol and the M-dimensional consensus value. M is the size of the alphabet. The consensus value is the profile weighted by the scoring matrix.

CSeq = seqconsensus(Profile) returns a character vector with the consensus sequence (CSeq) from a sequence profile (Profile).

seqconsensus(..., 'PropertyName', PropertyValue,...) defines optional properties using property name/value pairs.

seqconsensus(..., 'ScoringMatrix', ScoringMatrixValue) specifies the scoring matrix.

The following input parameters are analogous to the function seqprofile when the alphabet is restricted to 'AA' or 'NT'.

seqconsensus(..., 'Alphabet', AlphabetValue)

seqconsensus(..., 'Gaps', GapsValue)

seqconsensus(..., 'Ambiguous', AmbiguousValue)

seqconsensus(..., 'Limits', LimitsValue)

Examples

  seqs = fastaread('pf00002.fa');
  [C,S] = seqconsensus(seqs,'limits',[50 60],'gaps','all')

Version History

Introduced before R2006a