The library provides a fast dynamic programming minimum free energy folding algorithm as described by Zuker & Stiegler (1981).
Associated functions are:
float fold (char* sequence, char* structure);
float circfold (char* sequence, char* structure);
float energy_of_structure(const char *string, const char *structure, int verbosity_level);
float energy_of_circ_structure( const char *string, const char *structure, int verbosity_level);
void update_fold_params(void);
void free_arrays(void);
Instead of the minimum free energy structure the partition function of all possible structures and from that the pairing probability for every possible pair can be calculated, using a dynamic programming algorithm as described by McCaskill (1990). The following functions are provided:
float pf_fold ( char* sequence, char* structure)
void free_pf_arrays (void)
void update_pf_params (int length)
char *get_centroid_struct_pl( int length, double *dist, plist *pl);
char *get_centroid_struct_pr( int length, double *dist, double *pr);
double mean_bp_distance_pr( int length, double *pr);
We provide two functions that search for sequences with a given structure, thereby inverting the folding routines.
float inverse_fold (char *start, char *target)
float inverse_pf_fold ( char *start, char *target)
The following global variables define the behavior or show the results of the inverse folding routines:
char *symbolset
SOLUTION *subopt (char *sequence, char *constraint, int *delta, FILE *fp)
SOLUTION *subopt_circ ( char *sequence, char *constraint, int *delta, FILE *fp)
SOLUTION *zukersubopt(const char *string);
char *TwoDpfold_pbacktrack ( TwoDpfold_vars *vars, unsigned int d1, unsigned int d2)
char *alipbacktrack (double *prob)
char *pbacktrack(char *sequence);
char *pbacktrack_circ(char *sequence);
The function of an RNA molecule often depends on its interaction with other RNAs. The following routines therefore allow to predict structures formed by two RNA molecules upon hybridization.
One approach to co-folding two RNAs consists of concatenating the two sequences and keeping track of the concatenation point in all energy evaluations. Correspondingly, many of the cofold() and co_pf_fold() routines below take one sequence string as argument and use the the global variable cut_point to mark the concatenation point. Note that while the RNAcofold program uses the '&' character to mark the chain break in its input, you should not use an '&' when using the library routines (set cut_point instead).
In a second approach to co-folding two RNAs, cofolding is seen as a stepwise process. In the first step the probability of an unpaired region is calculated and in a second step this probability of an unpaired region is multiplied with the probability of an interaction between the two RNAs. This approach is implemented for the interaction between a long target sequence and a short ligand RNA. Function pf_unstru() calculates the partition function over all unpaired regions in the input sequence. Function pf_interact(), which calculates the partition function over all possible interactions between two sequences, needs both sequence as separate strings as input.
int cut_point
float cofold (char *sequence, char *structure)
void free_co_arrays (void)
Partition Function Cofolding
To simplify the implementation the partition function computation is done internally in a null model that does not include the duplex initiation energy, i.e. the entropic penalty for producing a dimer from two monomers). The resulting free energies and pair probabilities are initially relative to that null model. In a second step the free energies can be corrected to include the dimerization penalty, and the pair probabilities can be divided into the conditional pair probabilities given that a re dimer is formed or not formed.
cofoldF co_pf_fold( char *sequence, char *structure);
void free_co_pf_arrays(void);
Cofolding all Dimeres, Concentrations
After computing the partition functions of all possible dimeres one can compute the probabilities of base pairs, the concentrations out of start concentrations and sofar and soaway.
void compute_probabilities( double FAB, double FEA, double FEB, struct plist *prAB, struct plist *prA, struct plist *prB, int Alength)
ConcEnt *get_concentrations(double FEAB, double FEAA, double FEBB, double FEA, double FEB, double * startconc)
Partition Function Cofolding as a stepwise process
In this approach to cofolding the interaction between two RNA molecules is seen as a stepwise process. In a first step, the target molecule has to adopt a structure in which a binding site is accessible. In a second step, the ligand molecule will hybridize with a region accessible to an interaction. Consequently the algorithm is designed as a two step process: The first step is the calculation of the probability that a region within the target is unpaired, or equivalently, the calculation of the free energy needed to expose a region. In the second step we compute the free energy of an interaction for every possible binding site. Associated functions are:
pu_contrib *pf_unstru ( char *sequence, int max_w)
void free_pu_contrib_struct (pu_contrib *pu)
interact *pf_interact( const char *s1, const char *s2, pu_contrib *p_c, pu_contrib *p_c2, int max_w, char *cstruc, int incr3, int incr5)
void free_interact (interact *pin)
Local structures can be predicted by a modified version of the fold() algorithm that restricts the span of all base pairs.
float Lfold ( const char *string, char *structure, int maxdist)
float aliLfold( const char **strings, char *structure, int maxdist)
float Lfoldz (const char *string, char *structure, int maxdist, int zsc, double min_z)
plist *pfl_fold ( char *sequence, int winSize, int pairSize, float cutoffb, double **pU, struct plist **dpp2, FILE *pUfp, FILE *spup)
Consensus structures can be predicted by a modified version of the fold() algorithm that takes a set of aligned sequences instead of a single sequence. The energy function consists of the mean energy averaged over the sequences, plus a covariance term that favors pairs with consistent and compensatory mutations and penalizes pairs that cannot be formed by all structures. For details see Hofacker (2002).
float alifold (const char **strings, char *structure)
float circalifold (const char **strings, char *structure)
void free_alifold_arrays (void)
float energy_of_alistruct ( const char **sequences, const char *structure, int n_seq, float *energy)
struct pair_info
double cv_fact
double nc_fact
The following global variables change the behavior the folding algorithms or contain additional information after folding.
int noGU
int no_closingGU
int noLonelyPairs
int tetra_loop
int energy_set
float temperature
int dangles
char *nonstandards
int cut_point
float pf_scale
int fold_constrained
int do_backtrack
char backtrack_type
include fold_vars.h if you want to change any of these variables from their defaults.
A default set of parameters, identical to the one described in Mathews et.al. (2004), is compiled into the library.
Alternately, parameters can be read from and written to a file.
void read_parameter_file (const char fname[])
void write_parameter_file (const char fname[])
To preserve some backward compatibility the RNAlib also provides functions to convert energy parameter files from the format used in version 1.4-1.8 into the new format used since version 2.0
void convert_parameter_file ( const char *iname, const char *oname, unsigned int options)