Invited Speaker - Florencia Leonardi (São Paulo)
Title:Context tree selection and linguistic rhythm retrieval from written texts.
Abstract:We introduce a new criterion to select in a consistent way the probabilistic context tree generating a sample.
The basic idea is to construct a totally ordered set of candidate trees. This set is composed by the ``champion trees'', the ones that maximize the likelihood of the sample for each number of degrees of freedom. The
smallest maximizer criterion selects the infimum of the subset of champion trees whose gain in likelihood is negligible. In addition, we propose a new algorithm based on resampling to implement this criterion. This study was motivated by the linguistic challenge of retrieving rhythmic features from written texts. Applied
to a data set consisting of texts extracted from daily newspapers, our algorithm identifies different context trees for European Portuguese and Brazilian Portuguese. This is compatible with the long standing conjecture that European Portuguese and Brazilian Portuguese belong to different rhythmic classes. Moreover, these context trees have several interesting properties which are linguistically meaningful. This is a joint work with A. Galves, C. Galves and N. Garcia.
NUMEC - USP, São Paulo, Brasil, 2009 - Designer: Sara Müller