get_emission_prob_mat module

itrails.get_emission_prob_mat.JC69_analytical_integral(aa, bb, cc, dd, t, mu, k)[source]

Calculate the integrated probability for observing nucleotides bb, cc, and dd given a starting nucleotide aa, a mutation rate mu, and a coalescent rate k. This corresponds to computing: P(b = bb, c = cc, d = dd | a = aa, mu, t) by integrating the coalescent process over the interval t.

Parameters:
  • aa (int or str) – Nucleotide at position a (integer or string).

  • bb (int or str) – Nucleotide at position b (integer or string).

  • cc (int or str) – Nucleotide at position c (integer or string).

  • dd (int or str) – Nucleotide at position d (integer or string).

  • t (numeric) – Total time of the interval (from positions a/b/c to d).

  • mu (numeric) – Mutation rate for the JC69 model.

  • k (numeric) – Coalescent rate.

Returns:

The integrated probability.

Return type:

numeric

itrails.get_emission_prob_mat.JC69_analytical_integral_double(aa, bb, cc, dd, ee, ff, t, mu)[source]

Calculate the integrated probability for observing nucleotides bb, cc, dd, ee, and ff given a starting nucleotide aa, using the JC69 model. This computes P(b = bb, c = cc, d = dd, e = ee, f = ff | a = aa, mu, t) by integrating over the coalescent process for two coalescent events. Note: The coalescent rate is 1 for two sequences and 3 for three sequences.

Parameters:
  • aa (int or str) – Nucleotide at position a (integer or string).

  • bb (int or str) – Nucleotide at position b (integer or string).

  • cc (int or str) – Nucleotide at position c (integer or string).

  • dd (int or str) – Nucleotide at position d (integer or string).

  • ee (int or str) – Nucleotide at the first coalescent event (integer or string).

  • ff (int or str) – Nucleotide at the second coalescent event (integer or string).

  • t (numeric) – Total time of the interval.

  • mu (numeric) – Mutation rate for the JC69 model.

Returns:

The integrated probability.

Return type:

numeric

itrails.get_emission_prob_mat.b_c_d_given_a_to_dict_a_b_c_d(data)[source]

Convert a list of tuples (a, b, c, d, probability) into a nested dictionary. The resulting dictionary is structured as: dct[a][b][c][d] = probability

Parameters:

data (list) – List of tuples (a, b, c, d, probability), where a, b, c, d are nucleotide letters.

Returns:

Nested dictionary mapping nucleotide combinations to probability.

Return type:

dict

itrails.get_emission_prob_mat.b_c_given_a_to_dict_a_b_c(data)[source]

Convert a list of tuples (a, b, c, probability) into a nested dictionary. The resulting dictionary is structured as: dct[a][b][c] = probability

Parameters:

data (list) – List of tuples (a, b, c, probability).

Returns:

Nested dictionary mapping nucleotide combinations to probability.

Return type:

dict

itrails.get_emission_prob_mat.b_given_a_to_dict_a_b(data)[source]

Convert a list of tuples (a, b, probability) into a nested dictionary. The resulting dictionary is structured as: dct[a][b] = probability

Parameters:

data (list) – List of tuples (a, b, probability).

Returns:

Dictionary mapping nucleotide ‘a’ to a dictionary mapping ‘b’ to probability.

Return type:

dict

itrails.get_emission_prob_mat.calc_emissions_double_JC69(a0_a1_t_vec, b0_b1_t_vec, c0_c1_t_vec, a1b1c1_abc0_t, d0_abc0_t_vec, a0_a1_mu_vec, b0_b1_mu_vec, c0_c1_mu_vec, a1b1c1_abc0_mu, d0_abc0_mu_vec)[source]

Compute the emission probabilities for a hidden state containing two coalescent events occurring in the same time interval.

Parameters:
  • a0_a1_t_vec (list of numeric) – List of time intervals for mutation from a0 to a1.

  • b0_b1_t_vec (list of numeric) – List of time intervals for mutation from b0 to b1.

  • c0_c1_t_vec (list of numeric) – List of time intervals for mutation from c0 to c1.

  • d0_abc0_t_vec (list of numeric) – List of time intervals for mutation from d0 to abc0.

  • a1b1c1_abc0_t (numeric) – Time interval for the coalescent events.

  • a0_a1_mu_vec (list of numeric) – Mutation rates for intervals in a0_a1_t_vec.

  • b0_b1_mu_vec (list of numeric) – Mutation rates for intervals in b0_b1_t_vec.

  • c0_c1_mu_vec (list of numeric) – Mutation rates for intervals in c0_c1_t_vec.

  • a1b1c1_abc0_mu (numeric) – Mutation rate for the coalescent interval.

  • d0_abc0_mu_vec (list of numeric) – Mutation rates for intervals in d0_abc0_t_vec.

Returns:

A dictionary mapping concatenated nucleotide strings to their emission probabilities.

Return type:

dict

itrails.get_emission_prob_mat.calc_emissions_single_JC69(a0_a1_t_vec, b0_b1_t_vec, a1b1_ab0_t, ab0_ab1_t_vec, ab1c1_abc0_t, c0_c1_t_vec, d0_abc0_t_vec, a0_a1_mu_vec, b0_b1_mu_vec, a1b1_ab0_mu, ab0_ab1_mu_vec, ab1c1_abc0_mu, c0_c1_mu_vec, d0_abc0_mu_vec, coal_rate_1, coal_rate_2)[source]

Compute the emission probabilities for a hidden state containing two coalescent events occurring at different time intervals.

Parameters:
  • a0_a1_t_vec (list of numeric) – List of time intervals for mutation from a0 to a1.

  • b0_b1_t_vec (list of numeric) – List of time intervals for mutation from b0 to b1.

  • c0_c1_t_vec (list of numeric) – List of time intervals for mutation from c0 to c1.

  • d0_abc0_t_vec (list of numeric) – List of time intervals for mutation from d0 to abc0.

  • ab0_ab1_t_vec (list of numeric) – List of time intervals for mutation from ab0 to ab1.

  • a1b1_ab0_t (numeric) – Time interval for the first coalescent event.

  • ab1c1_abc0_t (numeric) – Time interval for the second coalescent event.

  • a0_a1_mu_vec (list of numeric) – Mutation rates for the intervals in a0_a1_t_vec.

  • b0_b1_mu_vec (list of numeric) – Mutation rates for the intervals in b0_b1_t_vec.

  • c0_c1_mu_vec (list of numeric) – Mutation rates for the intervals in c0_c1_t_vec.

  • d0_abc0_mu_vec (list of numeric) – Mutation rates for the intervals in d0_abc0_t_vec.

  • ab0_ab1_mu_vec (list of numeric) – Mutation rates for the intervals in ab0_ab1_t_vec.

  • a1b1_ab0_mu (numeric) – Mutation rate for the first coalescent interval.

  • ab1c1_abc0_mu (numeric) – Mutation rate for the second coalescent interval.

  • coal_rate_1 (numeric) – Coalescent rate for the first coalescent event.

  • coal_rate_2 (numeric) – Coalescent rate for the second coalescent event.

Returns:

A dictionary mapping concatenated nucleotide strings to their emission probabilities.

Return type:

dict

itrails.get_emission_prob_mat.get_emission_prob_mat(t_A, t_B, t_AB, t_C, t_upper, t_out, rho_A, rho_B, rho_AB, rho_C, rho_ABC, coal_A, coal_B, coal_AB, coal_C, coal_ABC, n_int_AB, n_int_ABC, mu_A, mu_B, mu_C, mu_D, mu_AB, mu_ABC, cut_AB='standard', cut_ABC='standard')[source]

Compute the emission probabilities for all hidden states given a set of population genetics parameters.

Parameters:
  • t_A (numeric) – Time between present and the first speciation event for species A.

  • t_B (numeric) – Time between present and the first speciation event for species B (should equal t_A).

  • t_AB (numeric) – Time between speciation events.

  • t_C (numeric) – Time between present and the second speciation event for species C (should equal t_A + t_AB).

  • t_upper (numeric) – Time between the last ABC interval and the third speciation event.

  • t_out (numeric) – Time from present to the third speciation event for species D (includes divergence time).

  • rho_A (numeric) – Recombination rate for species A.

  • rho_B (numeric) – Recombination rate for species B.

  • rho_AB (numeric) – Recombination rate for the AB interval.

  • rho_C (numeric) – Recombination rate for species C.

  • rho_ABC (numeric) – Recombination rate for the ABC interval.

  • coal_A (numeric) – Coalescent rate for species A.

  • coal_B (numeric) – Coalescent rate for species B.

  • coal_AB (numeric) – Coalescent rate for the AB interval.

  • coal_C (numeric) – Coalescent rate for species C.

  • coal_ABC (numeric) – Coalescent rate for the ABC interval.

  • n_int_AB (int) – Number of intervals in the AB portion of the tree.

  • n_int_ABC (int) – Number of intervals in the ABC portion of the tree.

  • mu_A (numeric) – Mutation rate for species A.

  • mu_B (numeric) – Mutation rate for species B.

  • mu_C (numeric) – Mutation rate for species C.

  • mu_D (numeric) – Mutation rate for species D.

  • mu_AB (numeric) – Mutation rate for the AB interval.

  • mu_ABC (numeric) – Mutation rate for the ABC interval.

  • cut_AB (str or array-like) – Option for cutpoints in the AB interval; if “standard”, default cutpoints are computed.

  • cut_ABC (str or array-like) – Option for cutpoints in the ABC interval; if “standard”, default cutpoints are computed.

Returns:

A tuple containing: - A list of state identifiers (tuples). - A list of corresponding emission probability dictionaries.

Return type:

tuple(list, list)

itrails.get_emission_prob_mat.p_b_c_d_given_a_JC69_analytical(t, mu)[source]

Compute the probability P(b, c, d | a) for all nucleotide combinations under the JC69 model.

Parameters:
  • t (numeric) – Total time of the interval (from positions a/b/c to d).

  • mu (numeric) – Mutation rate for the JC69 model.

Returns:

A list of tuples (a, b, c, d, probability) where a, b, c, and d are nucleotide letters.

Return type:

list

itrails.get_emission_prob_mat.p_b_c_given_a_JC69_analytical(t, mu, k)[source]

Compute the probability P(b, c | a) for all combinations of nucleotides under the JC69 model.

Parameters:
  • t (numeric) – Total time of the interval (from positions a/b/c to d).

  • mu (numeric) – Mutation rate for the JC69 model.

  • k (numeric) – Coalescent rate.

Returns:

A list of tuples (a, b, c, probability) where a, b, and c are nucleotide letters.

Return type:

list

itrails.get_emission_prob_mat.p_b_given_a(t, Q)[source]

Calculate the probability of observing a nucleotide b given a starting nucleotide a, a set of time intervals, and corresponding rate matrices.

For each pair of nucleotides (a, b), the function computes the probability by first summing the products of each time interval with its corresponding rate matrix, and then applying the matrix exponential.

Parameters:
  • t (numeric or list of numeric) – Total time intervals (list or array-like) over which the process is considered.

  • Q (list or numpy.ndarray) – A list or array of 4x4 rate matrices for a substitution model.

Returns:

A list of tuples of the form (starting nucleotide, ending nucleotide, probability).

Return type:

list

itrails.get_emission_prob_mat.rate_mat_JC69(mu)[source]

Return the rate matrix for the JC69 model. The JC69 model assumes equal base frequencies and equal substitution rates.

Parameters:

mu (numeric) – Mutation rate.

Returns:

A 4x4 numpy array representing the JC69 rate matrix.

Return type:

numpy.ndarray