Welcome to pymer’s documentation!¶

Contents

Welcome to pymer’s documentation!
- Examples
- Data Structures

This package provides several classes and utilities for counting k-mers in DNA sequences.

Examples ¶

Note

The API demonstrated below applies to all Counters, though Counter intialisation varies.

>>> ksize = 4
>>> kc = ExactKmerCounter(ksize)

DNA sequences are counted using the consume method:

>>> kc.consume('ACGTACGTACGTAC')
>>> kc['ACGT']
3

Sequences can be subtracted using the unconsume method:

>>> kc.unconsume('ACGTA')
>>> kc['ACGT']
2
>>> kc['CGTA']
2
>>> kc['GTAC']
3

Counters can be added and subtracted:

>>> kc += kc
>>> kc['GTAC']
6
>>> kc -= kc
>>> kc['GTAC']
0

Counters may be read and written to a file, using bcolz.

>>> from tempfile import mkdtemp
>>> from shutil import rmtree
>>> tmpdir = mkdtemp()
>>> filename = tmpdir + '/kc.bcz'

(Above we simply create a temporary directory to hold the saved counts.)

>>> kc.write(filename)
>>> new_kc = ExactKmerCounter.read(filename, ksize)
>>> (kc.array == new_kc.array).all()
True
>>> rmtree(tmpdir)

Data Structures ¶

Summary ¶

ExactKmerCounter(k[, alphabet, array]) Count k-mers in DNA sequences exactly using an array.

Exact K-mer Counting ¶

class pymer.ExactKmerCounter(k, alphabet='ACGT', array=None)¶

Count k-mers in DNA sequences exactly using an array.

Parameters:

k : int

K-mer length

alphabet : list-like (str, bytes, list, set, tuple) of letters

Alphabet over which values are defined, defaults to “ACGT”

Methods

`consume`(seq)	Counts all k-mers in sequence.
`consume_file`(filename)	Counts all kmers in all sequences in a FASTA/FASTQ file.
`print_table`([sparse, file, sep])
`read`(filename, kmersize)
`readall`(filename)
`to_dict`([sparse])
`unconsume`(seq)	Subtracts all k-mers in sequence.
`write`(filename)

Markovian K-mer Counting ¶

class pymer.TransitionKmerCounter(k, alphabet='ACGT', array=None)¶

Counts markovian state transitions in DNA sequences.

This class counts transtions between (k-1)-mers (or stems) and their following bases. This represents the k-1’th order markov process that (may have) generated the underlying DNA sequences.

A normalised, condensed transtion matrix of shape (4^(k-1), 4) or sparse complete transtion matrix (shape (4^(k-1), 4^(k-1)) can be returned. In addition, the steady-state vector is calculated from the complete transition matrix via eigendecomposition.

Parameters:

k : int

K-mer length

alphabet : str

Alphabet over which values are defined, defaults to “ACGT”

Attributes

`P`
`steady_state`
`stem_frequencies`	Compute the frequencies of each stem, i.e.
`transitions`

Methods

`consume`(seq)	Counts all k-mers in sequence.
`consume_file`(filename)	Counts all kmers in all sequences in a FASTA/FASTQ file.
`read`(filename, kmersize)
`readall`(filename)
`unconsume`(seq)	Subtracts all k-mers in sequence.
`write`(filename)

Welcome to pymer’s documentation!¶

Examples ¶

Data Structures ¶

Summary ¶

Exact K-mer Counting ¶

Markovian K-mer Counting ¶

Table Of Contents

Related Topics

This Page

Welcome to pymer’s documentation!¶

Examples¶

Data Structures¶

Summary¶

Exact K-mer Counting¶

Markovian K-mer Counting¶

Examples ¶

Data Structures ¶

Summary ¶

Exact K-mer Counting ¶

Markovian K-mer Counting ¶