Documentation Center

  • Trials
  • Product Updates

getBaseCoverage

Class: BioMap

Return base-by-base alignment coverage of reference sequence in BioMap object

Syntax

Cov = getBaseCoverage(BioObj, StartPos, EndPos)
Cov = getBaseCoverage(BioObj, StartPos, EndPos, R)
Cov = getBaseCoverage(..., Name,Value)
[Cov, BinStart] = getBaseCoverage(...)

Description

Cov = getBaseCoverage(BioObj, StartPos, EndPos) returns Cov, a row vector of nonnegative integers. This vector indicates the base-by-base alignment coverage of a range or set of ranges in the reference sequence in BioObj, a BioMap object. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented). When StartPos and EndPos specify a segmented range, Cov contains NaN values for base positions between segments.

Cov = getBaseCoverage(BioObj, StartPos, EndPos, R) selects the reference where getBaseCoverage calculates the coverage.

Cov = getBaseCoverage(..., Name,Value) returns alignment coverage information with additional options specified by one or more Name,Value pair arguments.

[Cov, BinStart] = getBaseCoverage(...) returns BinStart, a row vector of positive integers specifying the start position of each bin (when binning occurs).

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

R

Positive integer indexing the SequenceDictionary property of BioObj, or a string specifying the actual name of the reference.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'binWidth'

Positive integer specifying the bin width, in number of base pairs (bp). Bins are centered within min(StartPos) and max(EndPos). Thus, the first and last bins span approximately equally outside the range from min(StartPos) to max (EndPos).

    Note:   You cannot specify both binWidth and numberOfBins.

'numberOfBins'

Positive integer specifying the number of equal-width bins to use to span the requested region. Bins are centered within min(StartPos) and max(EndPos). Thus, the first and last bins span approximately equally outside the range from min(StartPos) to max (EndPos).

    Note:   You cannot specify both binWidth and numberOfBins.

'binType'

String specifying the binning algorithm. Choices are:

  • 'max' — From the bin, getBaseCoverage selects the base position with the most reads aligned to it, then uses its alignment coverage value for the bin.

  • 'min' — From the bin, getBaseCoverage selects the base position with the least reads aligned to it, then uses its alignment coverage value for the bin.

  • 'mean' — Uses the average alignment coverage, computed from all base positions within the bin.

Default: 'max'

'complementRanges'

Specifies whether to return the alignment coverage for the base positions between segments, instead of within segments. If true, the length of Cov is numel(min(StartPos):max(EndPos)), and Cov contains NaN values for base positions within segments.

Default: false

'Spliced'

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

Output Arguments

Cov

Row vector of nonnegative integers. This vector specifies the number of read sequences that align with each base position or bin in the requested regions. A set of ranges can be overlapping or segmented. For a range, the length of Cov is numel(StartPos:EndPos). For a segmented range, the length of Cov is numel(min(StartPos):max(EndPos)). Cov contains NaN values for base positions between segments. When binning occurs, the number of elements in Cov equals the number of bins.

BinStart

Row vector of positive integers specifying the start position of each bin. BinStart is the same length as Cov. If no binning occurs, then BinStart equals min(StartPos):max(EndPos).

Examples

Construct a BioMap object, and then return the alignment coverage of each of the first 12 base positions of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to each of
% the first 12 base positions of the reference sequence
cov = getBaseCoverage(BMObj1, 1, 12)
cov =

     1     1     2     2     3     4     4     4     5     5     5     5
 

Construct a BioMap object, and then return the alignment coverage of the range between 1 and 1000, on a bin-by-bin basis, using bins with a width of 100 bp:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the number of reads that align to each 100-bp bin
% in the 1:1000 range of the reference sequence. Also return the
% start position of each bin
[cov, bin_starts] = getBaseCoverage(BMObj1, 1, 1000, 'binWidth', 100)
cov =

    17    20    41    44    45    48    48    45    46    42


bin_starts =

     1   101   201   301   401   501   601   701   801   901

See Also

| | | | | |

How To

Related Links

Was this topic helpful?