Expected cases (beta version)

The purpose of this tool is to produce the expected number of cases for a certain population based on the data of the population data file .

This certain population could be the same of the data file but for another time period. In this situation, the expected cases are the projected ones for our population in, for example, a future time period. This application can also be used to predict the number of cases in another area taking into account the time trend of the data file population. For example, if our data file contains incidence data of a certain region of Catalonia covered by a cancer registry and using these data we want to predict the incidence for another region not covered by a cancer registry for a certain period, we need an expected population file with the combinations of sex, age-group, time period and person-years at risk during the period and geographic zone of this area.

Predictions are uncertain, and the results should always be interpreted with caution. The authors do not accept any responsibility in regard of the reliance on, and/or use of the results.

It is advisable not to project more than 5 years in the future and only if you have, at least, the same number of years observed.

This tool allows user to obtain a table, a file and an optional type of graph.

The table describes the statistical model selected for each specified sex and group.

The file contains for each sex, age group and year the expected cases calculated with the model described in the table and the population included in the population data file.

The graphs from each described cause are optional and only could be plotted if years included in population file are later years from data file. Likewise, the sex-causes are not plotted when models do fit.

Methodology (PDF)

Files must be ascii type, ";" separated and unquoted values (i.e: ".txt" or ".csv" files).

Variables included in those files are:

Data file (click here to view an example):
a) sex: 1=Male ; 2=Female ; 0=Both sexes. It is not necessary to include all the categories.
b) age.group: ID number of the age group. Age groups must be consecutive and homogeneus in order to use the descriptive function afterwards.
c) year: Year or reference year (in case of period).
d) group: Name of the groups to study. For example, when studying cancer sites it could be "Bladder", "Pancreas", "All sites"... Or when studying a pathology by geographical areas it could be "Italy", "France", "Spain"...
e) cases: Number of cases in each group.
f) population: Population at risk corresponding to each group.

Population data file (click here to view an example):
a) sex: 1=Male ; 2=Female ; 0=Both sexes. The same categories as Data file.
b) age.group: It has to match the age values from the age group of data file (defined above).
c) year: Year or reference year (in case of period).
d) population: Popultation at risk to study projected values. It would be a future period or/and other zone.

Weights file (optional) (click here to view an example):
a) age.group: Age grouping must be matched with that from the age.group of data file (defined above).
b) weights: Weights of the reference population (must sum 1). 

Confidence value (1-α): The confidence value to calculate the Goodness of fit. It has to be between 0 and 1 (usually 0.95).

NOTE: This process could take a few minutes because it tests 4 models for each group and sex.

Warning: file size must be less than 10MB.

Data file:

Population data file:

Confidence value:


Only in case you are computing future projections:
Check to generate trends graph.    Weights file: