The GenomicVectors Types and Methods
Index
GenomicVectors.GenomeInfo
GenomicVectors.GenomicPositions
GenomicVectors.GenomicRanges
GenomicVectors.chromosomes
GenomicVectors.chrpos
GenomicVectors.genopos
GenomicVectors.nearest
GenomicVectors.overlap
GenomicVectors.overlapin
RLEVectors.ends
RLEVectors.starts
RLEVectors.widths
Types
#
GenomicVectors.GenomeInfo
— Type.
GenomeInfo Type
A GenomeInfo holds information about a genome including its name, chromosome names, chromosome lengths and chromosome offsets into a concatenated, linear genome (genopos). Indexing returns the genopos end of the indexed chromosome.
Examples
chrinfo = GenomeInfo("hg19",["chr1","chr2","chrX"],Int64[3e5,2e5,1e4])
genome(chrinfo)
chr_names(chrinfo)
chr_lengths(chrinfo)
chr_ends(chrinfo)
chr_offsets(chrinfo)
chrinfo[2] # 5e5
#
GenomicVectors.GenomicPositions
— Type.
GenomicPositions(chrpos, chromosomes, genomeinfo)
GenomicPositions(genopos, genomeinfo)
Represents single-nucleotide positions in a genome.
This type uses its (immutable) GenomeInfo
slot object to describe corresponding genome and positions can be expressed relative to this concatenated, linearized genome or relative to the chromosome containing a given position.
Sorting is by chromosome, as ordered by chrinfo,
By convention, all postions in a GenomicPositions
are considered to be on the plus strand.
Examples
genomeinfo = GenomeInfo("hg19",["chr1","chr2","chrX"],Int64[3e5,2e5,1e4])
chrs = ["chr2","chr1","chr2","chrX"]
pos = Int64[3e4,4.2e3,1.9e5,1e4]
gpos = genopos(pos,chrs,chrinfo)
x = GenomicPositions(pos,chrs,genomeinfo)
y = GenomicPositions(gpos,genomeinfo)
same_genome(x, y)
sort!(y)
convert(DataFrame, y)
#
GenomicVectors.GenomicRanges
— Type.
GenomicRanges
GenomicRanges
represent closed ranges in a genome. This type uses its (immutable) GenomeInfo
slot object to describe corresponding genome and positions can be expressed relative to this concatenated, linearized genome or relative to the chromosome containing a given position.
Examples
chrinfo = GenomeInfo("hg19",["chr1","chr2","chrX"],Int64[3e5,2e5,1e4])
chrs = ["chr1","chr2","chr2","chrX"]
starts = [100, 200, 300, 400]
ends = [120, 240, 350, 455]
gr = GenomicRanges(chrs,starts,ends,chrinfo)
Indexing
Indexing a GenomicRanges
with an array produces a new GenomicRanges
.
Getting/setting by a scalar gives/takes a Bio.Intervals.Interval. The leftposition and rightposition in this Interval must be in genome location units and correspond to the same chromosome. The seqname must match the genome of the GenomicRanges. Outgoing Intervals will have the index i
as their metadata. This makes it possible to obtain the original ordering if Intervals after conversion to, say, an IntervalCollection. Any metadata for an incoming Interval is ignored.
The each
function produces an iterator of (start,end) two-tuples in genome location units. This is use for many internal functions, like sorting. This is intentionally similar to RLEVectors.each
.
Genome Location API
GenomicVectors.jl
has ... All AbstractGenomicVector
s implement the API for GenomeInfo
for access to their genome descriptions.
Accessing position info
#
RLEVectors.starts
— Function.
RLEVectors
RLEVectors
is an alternate implementation of the Rle type from Bioconductor's IRanges package by H. Pages, P. Aboyoun and M. Lawrence. RLEVectors represent a vector with repeated values as the ordered set of values and repeat extents. In the field of genomics, data of various types measured across the ~3 billion letters in the human genome can often be represented in a few thousand runs. It is useful to know the bounds of genome regions covered by these runs, the values associated with these runs, and to be able to perform various mathematical operations on these values.
RLEVectors
can be created from a single vector or a vector of values and a vector of run ends. In either case runs of values or zero length runs will be compressed out. RLEVectors can be expanded to a full vector with collect
.
Aliases
Several aliases are defined for specific types of RLEVector (or collections thereof).
FloatRle RLEVector{Float64,UInt32}
IntegerRle RLEVector{Int64,UInt32}
BoolRle RLEVector{Bool,UInt32}
StringRle RLEVector{String,UInt32}
RLEVectorList{T1,T2} Vector{ RLEVector{T1,T2} }
Constructors
RLEVector
s can be created by specifying a vector to compress or the runvalues and run ends.
x = RLEVector([1,1,2,2,3,3,4,4,4])
x = RLEVector([4,5,6],[3,6,9])
Describing RLEVector
objects
RLEVector
s implement the usual descriptive functions for an array as well as some that are specific to the type.
length(x)
The full length of the vector, uncompressedsize(x)
Same aslength
, as for any other vectorsize(x,dim)
Returns(length(x),1) for dim == 1
starts(x)
The index of the beginning of each runwidths(x)
The width of each runends(x)
The index of the end of each runvalues(x)
The data value for each runisempty(x)
Returns boolean, as for any other vectornrun(x)
Returns the number of runs represented in the arrayeltype(x)
Returns the element type of the runsendtype(x)
Returns the element type of the run ends
#
RLEVectors.widths
— Function.
RLEVectors
RLEVectors
is an alternate implementation of the Rle type from Bioconductor's IRanges package by H. Pages, P. Aboyoun and M. Lawrence. RLEVectors represent a vector with repeated values as the ordered set of values and repeat extents. In the field of genomics, data of various types measured across the ~3 billion letters in the human genome can often be represented in a few thousand runs. It is useful to know the bounds of genome regions covered by these runs, the values associated with these runs, and to be able to perform various mathematical operations on these values.
RLEVectors
can be created from a single vector or a vector of values and a vector of run ends. In either case runs of values or zero length runs will be compressed out. RLEVectors can be expanded to a full vector with collect
.
Aliases
Several aliases are defined for specific types of RLEVector (or collections thereof).
FloatRle RLEVector{Float64,UInt32}
IntegerRle RLEVector{Int64,UInt32}
BoolRle RLEVector{Bool,UInt32}
StringRle RLEVector{String,UInt32}
RLEVectorList{T1,T2} Vector{ RLEVector{T1,T2} }
Constructors
RLEVector
s can be created by specifying a vector to compress or the runvalues and run ends.
x = RLEVector([1,1,2,2,3,3,4,4,4])
x = RLEVector([4,5,6],[3,6,9])
Describing RLEVector
objects
RLEVector
s implement the usual descriptive functions for an array as well as some that are specific to the type.
length(x)
The full length of the vector, uncompressedsize(x)
Same aslength
, as for any other vectorsize(x,dim)
Returns(length(x),1) for dim == 1
starts(x)
The index of the beginning of each runwidths(x)
The width of each runends(x)
The index of the end of each runvalues(x)
The data value for each runisempty(x)
Returns boolean, as for any other vectornrun(x)
Returns the number of runs represented in the arrayeltype(x)
Returns the element type of the runsendtype(x)
Returns the element type of the run ends
#
RLEVectors.ends
— Function.
RLEVectors
RLEVectors
is an alternate implementation of the Rle type from Bioconductor's IRanges package by H. Pages, P. Aboyoun and M. Lawrence. RLEVectors represent a vector with repeated values as the ordered set of values and repeat extents. In the field of genomics, data of various types measured across the ~3 billion letters in the human genome can often be represented in a few thousand runs. It is useful to know the bounds of genome regions covered by these runs, the values associated with these runs, and to be able to perform various mathematical operations on these values.
RLEVectors
can be created from a single vector or a vector of values and a vector of run ends. In either case runs of values or zero length runs will be compressed out. RLEVectors can be expanded to a full vector with collect
.
Aliases
Several aliases are defined for specific types of RLEVector (or collections thereof).
FloatRle RLEVector{Float64,UInt32}
IntegerRle RLEVector{Int64,UInt32}
BoolRle RLEVector{Bool,UInt32}
StringRle RLEVector{String,UInt32}
RLEVectorList{T1,T2} Vector{ RLEVector{T1,T2} }
Constructors
RLEVector
s can be created by specifying a vector to compress or the runvalues and run ends.
x = RLEVector([1,1,2,2,3,3,4,4,4])
x = RLEVector([4,5,6],[3,6,9])
Describing RLEVector
objects
RLEVector
s implement the usual descriptive functions for an array as well as some that are specific to the type.
length(x)
The full length of the vector, uncompressedsize(x)
Same aslength
, as for any other vectorsize(x,dim)
Returns(length(x),1) for dim == 1
starts(x)
The index of the beginning of each runwidths(x)
The width of each runends(x)
The index of the end of each runvalues(x)
The data value for each runisempty(x)
Returns boolean, as for any other vectornrun(x)
Returns the number of runs represented in the arrayeltype(x)
Returns the element type of the runsendtype(x)
Returns the element type of the run ends
#
GenomicVectors.genopos
— Function.
Given chromosome and chromosome position information and a description of the chromosomes (a GenoPos object), calculate the corresponding positions in the linear genome.
#
GenomicVectors.chrpos
— Function.
Given positions in the linear genome, calculate the position on the relevant chromosome.
#
GenomicVectors.chromosomes
— Function.
Given positions in the linear genome, calculate the position on the relevant chromosome.
Modifying positions
slide
slide!
Querying positions
As in Bioconductor, location query operations discriminate between exact and overlapping matches. In addition to exact versus overlapping coordinates, exact matching includes strand, while overlap matching does not. In GenomcVectors.jl
, the standard set operations use exact matching and custom overlap functions are defined for AbstractGenomicVector
.
Overlap function
overlap
overlapin
overlapindex
nearest