R: NEWS file for the rpart package

NEWS	R Documentation

NEWS file for the rpart package

Changed example data solder to solder.balance. The full version of the data is available in the survival package.

Rpart would fail with a formula having ~. - x on the right hand side. A simple bookkeeping error in creating an index.
Added a section to the vignette on user written functions, which explains why and when one can avoid checking all 2^k splits for a categorical predictor with k levels.

The C and R code has been reformatted for legibility.
The old compatibility function rpconvert() has been removed.
The cross-validation functions allow for user interrupt at the end of evaluating each split.
Variable Reliability in data set car90 is corrected to be an ordered factor, as documented.
Surrogate splits are now considered only if they send two or more cases with non-zero weight each way. For numeric/ordinal variables the restriction to non-zero weights is new: for categorical variables this is a new restriction.
Surrogate splits which improve only by rounding error over the default split are no longer returned. Where weights and missing values are present, the splits component for some of these was not returned correctly.

A fit of class ‘⁠"rpart"⁠’ now contains a component for variable ‘importance’, which is reported by the summary() method.
The text() method gains a minlength argument, like the labels() method. This adds finer control: the default remains pretty = NULL, minlength = 1L.
The handling of fits with zero and fractional weights has been corrected: the results may be slightly different (or even substantially different when the proportion of zero weights is large).
Some memory leaks have been plugged.
There is a second vignette, ‘longintro.Rnw’, a version of the original Mayo Tecnical Report on rpart.

Added dataset car90, a corrected version of the S-PLUS dataset car.all (used with permission).
This version does not use paste0{} and so works with R 2.14.x.

Merged in a set of Splus code changes that had accumulated at Mayo over the course of a decade. The primary one is a change in how indexing is done in the underlying C code, which leads to a major speed increase for large data sets. Essentially, for the lower leaves all our time used to be eaten up by bookkeeping, and this was replaced by a different approach. The primary routine also uses .Call{} so as to be more memory efficient.
The other major change was an error for asymmetric loss matrices, prompted by a user query. With L=loss asymmetric, the altered priors were computed incorrectly – they were using L' instead of L. Upshot – the tree would not not necessarily choose optimal splits for the given loss matrix. Once chosen, splits were evaluated correctly. The printed “improvement” values are of course the wrong ones as well. It is interesting that for my little test case, with L quite asymmetric, the early splits in the tree are unchanged – a good split still looks good.
Add the return.all argument to xpred.rpart().
Added a set of formal tests, i.e., cases with known answers to which we can compare.
Add a ‘usercode’ vignette, explaining how to add user defined splitting functions.
The class method now also returns the node probability.
Add the stagec data set, used in some tests.
The plot.rpart routine needs to store a value that will be visible to the rpartco routine at a later time. This is now done in an environment in the namespace.

Change description of ‘⁠margin⁠’ in ?plot.rpart as suggested by Bill Venables.