int_get_emission_prob_mat module

itrails.int_get_emission_prob_mat.JC69_analytical_integral(aa, bb, cc, dd, t, mu, k)[source]

This function calculates the probability of observing the nucleotides bb, cc and dd given aa, t and mu. aa and bb are the starting nucleotides, while cc is the end nucleotide. dd is the nucleotide at the time of coalescent. t is the total time of the interval. The returned value corresponds to integrating the coalescent to d over the entirety of t.

P(b = bb, c == cc, d == dd | a == aa, mu, t)

c ^ | |

__d__ | t

| |

a b |

Parameters:
  • aa (integer or string) – nucleotide at a, b, c and d respectively

  • bb (integer or string) – nucleotide at a, b, c and d respectively

  • cc (integer or string) – nucleotide at a, b, c and d respectively

  • dd (integer or string) – nucleotide at a, b, c and d respectively

  • t (numeric) – Total time of the interval (from a/b/c to d)

  • mu (numeric) – The mutation rate for the JC69 model

  • k (numeric) – The coalescent rate

itrails.int_get_emission_prob_mat.JC69_analytical_integral_double(aa, bb, cc, dd, ee, ff, t, mu)[source]

This function calculates the probability of observing the nucleotides bb, cc, dd, ee and ff given aa, t and mu. aa, bb and cc are the starting nucleotides, while dd is the end nucleotide. ee is the nucleotide at the time of the first coalescent, while ff is the nucleotide at the time of the second coalescent. t is the total time of the interval. The returned value corresponds to integrating the coalescent to e and f over the entirety of t.

Note that the coalescent rate is always 1 between two sequences, and 3 between 3 sequences.

P(b = bb, c == cc, d == dd, e == ee, f == ff | a == aa, Q, t)

d ^ | |

___f___ |

| | t

__e__ | |

| | |

a b c |

Parameters:
  • aa (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • bb (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • cc (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • dd (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • ee (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • ff (integer or string) – nucleotide at a, b, c, d, e and f, respectively

  • t (numeric) – Total time of the interval (from a/b/c to d)

  • mu (numeric) – The mutation rate for the JC69 model

itrails.int_get_emission_prob_mat.b_c_d_given_a_to_dict_a_b_c_d(df)[source]

This function converts the data frame as outputted by p_b_c_given_a_single_coal or p_b_c_given_a_double_coal into a dictionary. How to use the dictionary:

P(b, c, d | a) = dct[a][b][c][d]

Parameters:

df (data frame) – As outputted by p_b_c_given_a_double_coal

itrails.int_get_emission_prob_mat.b_c_given_a_to_dict_a_b_c(df)[source]

This function converts the data frame as outputted by p_b_c_given_a_single_coal or p_b_c_given_a_double_coal into a dictionary. How to use the dictionary:

P(b, c | a) = dct[a][b][c]

Parameters:

df (data frame) – As outputted by p_b_c_given_a_single_coal

itrails.int_get_emission_prob_mat.b_given_a_to_dict_a_b(df)[source]

This function converts the data frame as outputted by p_b_given_a into a dictionary. How to use the dictionary:

P(b | a) = dct[a][b]

Parameters:

df (data frame) – As outputted by p_b_given_a

itrails.int_get_emission_prob_mat.calc_emissions_double_JC69(a0_a1_t_vec, b0_b1_t_vec, c0_c1_t_vec, a1b1c1_abc0_t, d0_abc0_t_vec, a0_a1_mu_vec, b0_b1_mu_vec, c0_c1_mu_vec, a1b1c1_abc0_mu, d0_abc0_mu_vec)[source]

This function returns the emission probabilities of a hidden state contining two coalescent events at the same time interval.

———abc0—– |

____|___ |

__|__ | |

| | |
--a1----b1----c1--

| | |

a0 b0 c0 d0

Parameters:
  • a0_a1_t_vec (numeric list) – Each list contains the interval time for a site to mutate with a certain mutation rate, specified by *mu_vec

  • b0_b1_t_vec (numeric list) – Each list contains the interval time for a site to mutate with a certain mutation rate, specified by *mu_vec

  • c0_c1_t_vec (numeric list) – Each list contains the interval time for a site to mutate with a certain mutation rate, specified by *mu_vec

  • d0_abc0_t_vec (numeric list) – Each list contains the interval time for a site to mutate with a certain mutation rate, specified by *mu_vec

  • a1b1c1_abc0_t (numeric) – Time interval for the coalescent events to happen.

  • a0_a1_mu_vec (numeric list) – Each list contains the mutation rates for each interval defined in *t_vec

  • b0_b1_mu_vec (numeric list) – Each list contains the mutation rates for each interval defined in *t_vec

  • c0_c1_mu_vec (numeric list) – Each list contains the mutation rates for each interval defined in *t_vec

  • d0_abc0_mu_vec (numeric list) – Each list contains the mutation rates for each interval defined in *t_vec

  • a1b1c1_abc0_mu (numeric) – Mutation rates for the interval where coalescents happen.

itrails.int_get_emission_prob_mat.calc_emissions_single_JC69(a0_a1_t_vec, b0_b1_t_vec, a1b1_ab0_t, ab0_ab1_t_vec, ab1c1_abc0_t, c0_c1_t_vec, d0_abc0_t_vec, a0_a1_mu_vec, b0_b1_mu_vec, a1b1_ab0_mu, ab0_ab1_mu_vec, ab1c1_abc0_mu, c0_c1_mu_vec, d0_abc0_mu_vec, coal_rate_1, coal_rate_2)[source]

This function returns the emission probabilities of a hidden state contining two coalescent events at different time intervals.

———abc0—– |

____|___ |

| |
—-ab1——-c1– |
| |
—-ab0—– | |

__|__ | |

| | |
--a1----b1--
| | |

a0 b0 c0 d0

Parameters:
  • a0_a1_t_vec

  • b0_b1_t_vec

  • c0_c1_t_vec

:param : :type d0_abc0_t_vec: numeric list :param d0_abc0_t_vec: Each list contains the interval time for a site to mutate

with a certain mutation rate, specified by *mu_vec

Parameters:
  • ab0_ab1_t_vec (numeric list) – Each list contains the interval time for a site to mutate with a certain mutation rate, specified by *mu_vec

  • a1b1_ab0_t (numeric) – Time interval when the first and the second coalescent can happen, respectively.

  • ab1c1_abc0_t (numeric) – Time interval when the first and the second coalescent can happen, respectively.

  • a0_a1_mu_vec

  • b0_b1_mu_vec

  • c0_c1_mu_vec

:param : :type d0_abc0_mu_vec: numeric list :param d0_abc0_mu_vec: Each list contains the mutation rates for each interval

defined in *t_vec

Parameters:
  • ab0_ab1_mu_vec (numeric list) – Each list contains the mutation rates for each interval defined in *t_vec

  • a1b1_ab0_mu (numeric) – Mutation rates for the first and second coalescent intervals, respectively.

  • ab1c1_abc0_mu (numeric) – Mutation rates for the first and second coalescent intervals, respectively.

  • coal_rate_1 (numeric) – Coalescent rate of the first and second coalescent events, respectively.

  • coal_rate_2 (numeric) – Coalescent rate of the first and second coalescent events, respectively.

itrails.int_get_emission_prob_mat.get_emission_prob_mat_introgression(t_A, t_B, t_AB, t_C, t_upper, t_out, t_m, rho_A, rho_B, rho_AB, rho_C, rho_ABC, coal_A, coal_B, coal_AB, coal_BC, coal_C, coal_ABC, n_int_AB, n_int_ABC, mu_A, mu_B, mu_C, mu_D, mu_AB, mu_ABC, cut_AB, cut_ABC)[source]

This function returns the emission probabilities of all hidden states given a set of population genetics parameters.

# | | # | ABC |# | AB |# / /# / / # A B C D

Parameters:
  • t_A (numeric) – Time between present time and the first speciation time for species A.

  • t_B (numeric) – Time between present time and the migration event for species A.

  • t_AB (numeric) – Time between speciation events.

  • t_C (numeric) – Time between present time and the migration event for species C.

  • t_upper (numeric) – Time between the last ABC interval and the third speciation time.

  • t_peak (numeric) – Mean divergence time between ABC and D after the third speciation time. It should be 4*coal_ABC (or it can be estimated instead).

  • rho_A (numeric) – Recombination rates for the A, B, AB, C and ABC intervals.

  • rho_B (numeric) – Recombination rates for the A, B, AB, C and ABC intervals.

  • rho_AB (numeric) – Recombination rates for the A, B, AB, C and ABC intervals.

  • rho_C (numeric) – Recombination rates for the A, B, AB, C and ABC intervals.

  • rho_ABC (numeric) – Recombination rates for the A, B, AB, C and ABC intervals.

  • coal_A (numeric) – Coalescent rates for the A, B, AB, C and ABC intervals.

  • coal_B (numeric) – Coalescent rates for the A, B, AB, C and ABC intervals.

  • coal_AB (numeric) – Coalescent rates for the A, B, AB, C and ABC intervals.

  • coal_C (numeric) – Coalescent rates for the A, B, AB, C and ABC intervals.

  • coal_ABC (numeric) – Coalescent rates for the A, B, AB, C and ABC intervals.

  • n_int_AB (integer) – Number of intervals in the AB and ABC parts of the tree.

  • n_int_ABC (integer) – Number of intervals in the AB and ABC parts of the tree.

  • mu_A (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

  • mu_B (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

  • mu_C (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

  • mu_D (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

  • mu_AB (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

  • mu_ABC (numeric) – Mutation rate for the A, B, C, D, AB and ABC intervals.

itrails.int_get_emission_prob_mat.p_b_c_d_given_a_JC69_analytical(t, mu)[source]

This function returns a data frame with the values of P(b, c, d | a) for all combinations of nucleotides.

Parameters:
  • t (numeric) – Total time of the interval (from a/b/c to d)

  • mu (numeric) – The mutation rate for the JC69 model

itrails.int_get_emission_prob_mat.p_b_c_given_a_JC69_analytical(t, mu, k)[source]

This function returns a data frame with the values of P(b, c | a) for all combinations of nucleotides.

Parameters:
  • t (numeric) – Total time of the interval (from a/b/c to d)

  • mu (numeric) – The mutation rate for the JC69 model

  • k (numeric) – The coalescent rate

itrails.int_get_emission_prob_mat.p_b_given_a(t, Q)[source]

This function calculates the probability of observing the nucleotide b given a, t and Q. a is the starting nucleotide, while b is the end nucleotide. t is the total time of the interval.

P(b = bb | a == aa, Q, t)

Parameters:
  • t (numeric) – Total time of the interval (from a/b to c)

  • Q (numpy array) – A 4x4 rate matrix for any substitution model

itrails.int_get_emission_prob_mat.rate_mat_JC69(mu)[source]

This function returns the rate matrix for the JC69 model.

Parameters:

mu (numeric) – Mutation rate