Ticket #684 (assigned Task)

Opened 8 years ago

Last modified 8 years ago

Documenting non-overlapping "ensembles" for CMIP5

Reported by: charlotte Owned by: charlotte
Priority: critical Milestone: V1.2 Questionnaire release
Component: WP6 - CMIP5 Questionnaire Version: 1.0
Keywords: Cc: Mark, Gerry, Bryan, allyn, paco

Description (last modified by charlotte) (diff)

Re-think how we document partially-overlapping and non-overlapping "ensembles" for CMIP5 such as the 1.1 decadal hindcasts and possiblly also the TAMIP experiments #683

It is not necessary to implement a solution to this for V1.0, but we will need an agreed strategy as it may impact CMOR and the DRS.

See Discussion at tickets/684

Proposed naming convention for TAMIP: TAMIP
also includes application to the decadal hindcast/forecast experiments

How will the ensemble page in the questionnaire look? - #712


ActivityRequirements.png Download (61.9 KB) - added by charlotte 8 years ago.
CIM Experiments
ActivityImplementation.png Download (89.8 KB) - added by charlotte 8 years ago.
CIM Simulations
Calendar.png Download (36.7 KB) - added by charlotte 8 years ago.
CIM Calendar

Change History

Changed 8 years ago by charlotte

CIM Experiments

Changed 8 years ago by charlotte

CIM Simulations

Changed 8 years ago by charlotte

CIM Calendar

comment:1 Changed 8 years ago by charlotte

  • Status changed from new to assigned
  • Requirement modified (diff)

Mark Elkington Wrote (#683):

My thought was that we could have a single experiment and single simulation with 64 ensemble members, but its not really an ensemble in the normal sense unless we can somehow define the period covered as an 'initialisation method'.

Second thought was we could have a single experiment with multiple simulations. We can specify the start and end of each simulation in the cmip5Q. The simulation duration is not reflected directly in the DRS, except that in the case of these very short runs we can probably make the assumption that an atomic dataset will be in a single file and the DRS <time-period> would record the different simulation period. Its not clear to me that the DRS supports hours in the time period however.

comment:2 Changed 8 years ago by charlotte

  • Cc Mark, Gerry, Bryan, allyn added; Mark Gerry Bryan removed

Method 1. We could define a new ensemble "type" in the CIM say "climatology ensemble" to define a set of ensembles that have no temporal overlap but nevertheless cover the same portion of the annual cycle and so could be considered together to create ensemble statistics.

Method 2. We could utilise the separation between "Experiment" and "NumericalExperiment?" in the CIM then we can show users a much reduced list of experiments in the questionnaire.  So for example we could display just one 1.1_DecadalHindcast experiment in the inital experiment view and only when the user chooses to view this experiment do they see the full list of NumericalExperiments?  associated with it.

comment:3 Changed 8 years ago by mark

  • Description modified (diff)

I do think we should try and make sure that the 'experiment' can include all of the model runs as a group.  In the case of TAMIP and the decadal experiment, the conformance should also be done for all the model runs as a group against the experiment requirements.

For the decadal experiment you could have one experiment with multiple simulations each with a different start and end time.  The problem with this is that you have to do the conformance for each simulation separately and I think the DRS has a problem since it can't distinguish between the atomic datasets of the same parameter within each run as the simulation information is not stored in the DRS structure (Note:  I don't think the DRS issue is a problem for TAMIP as the runs are so short that each atomic dataset will be a single file and therefore the start and end datetime for file (as recorded in the DRS) will also be the start and end of the model run)

I don't think it is feasible to have a single experiment and 64 simulations and do the conformance for each simulation as would be required for TAMIP.

Defining a non-overlapping ensemble is an interesting approach, and would have the least impact on the rest of the CMIP5 system as the rip mechanism can distinguish, but we would need to be able to specify start and end dates for the ensemble element and pass those through to the curator interface somehow

comment:4 Changed 8 years ago by charlotte

Right now the DRS requires a different experiment name for each of the decadal hindcast simulations.  So we have documented them as separate experiments to ensure the correct mapping between simulation and experiment name.  This makes it more difficult to change the way we document the decadal simulations without making big changes to the questionnaire.

However, the TAMIP simulations are yet to be defined in the DRS so we have the oportunity to make things simpler for the user by having just one experiment description.  This is how I would present it in the questionnaire:

When the user describes an ensemble member we can present them with a "Duration" pannel, as on the simulation page - this only appears if the ensemble type is intitialisation.  There is no need to ask the user about "related simulations" so the duration information can fit on a single line and be placed alongside "member # uses" in the questionnaire interface.

comment:5 Changed 8 years ago by mark

Seems OK to me as long - are the duration start/end dates a) going to support times, and b) is the content entered into these fields going to be verified as a date/time.

I think I would prefer that we don't use an 'initialisation' ensemble type; I would prefer that we use a different name to indicate that it is a different type of ensemble to the normal ensembles which have elements covering the same time periods e.g. 'discrete period' or some such.

I assume we will just us r1, r2, r3 .... r64 to cover the 64 elements of this ensemble

If we are going to suggest a different TAMIP experiment name we need to get this to Karl/Charles? quickly before they start changing MIP tables. The TAMIP runs will be starting next week here.

comment:6 Changed 8 years ago by charlotte

Verifying for date/time should be no problem as we can tell Django to expect a date/time entry. I guess we could even provide the user with drop down lists to define the date and time.

I am happy to go with the r1, r2 ...r64 method to label the ensemble members of the TAMIP experiment.

I don't know what experiment names have been suggested for TAMIP in CMIP5 but if I was Karl I would recomend using 4 tamip experiment names

  • 8.1 tamip200810
  • 8.2 tamip200901
  • 8.3 tamip200904
  • 8.4 tamip200907

with a drs url looking something like:


Extract form  TAMIP experiment design:
4 sets of 16 hindcasts are to be run, the first in each set starting at 00Z on the 15th of the following months and then subsequently at 30 hour intervals: October 2008, January 2009, April 2009 and July 2009. This ensures sampling throughout the annual and diurnal cycles for each grid-point for a given lead times. These periods have been chosen to tie in with the  Year of Tropical Convection (YOTC) and various IOPs (see below).

comment:7 Changed 8 years ago by charlotte

I think I got to many zeros on that example TAMIP filename, it should have read:


where the temporal range has the form yyyymmddhh

comment:8 Changed 8 years ago by charlotte

Comment from Mark Elkington:I agree this would be a lot better than 64 experiment names.  But if Q developmetn team can handle the non-overlapping ensemble type and allow us to put a different start  and end date/time into the ensemble I think we could go to a single experiment called tamip.


This would then be in DRS




I have spoken to the tamip coordinator here and he told me:
- its likely that most organisations will deliver all 64 runs - so I'm not sure there is much merit in dividing the experiment into 4 sets
- the only reason they went with the 64 experiments was to be compliant with the decadal experiment approach
- they may be adding another 64 'experiments' - I don't think we want 128 additional experiments in CMIP5Q

  • the simulation runs do not overlap so the start and end date/times in the DRS filenames will identify which files belong to which runs
  • the highest frequency diagnostics run at 30 minute intervals (I know CMOR supports hours in the start and end times for the filename, but I haven't checked it with sub-hour information)


Given that we seem to be homing in on a possible solution I think we need to:


1) Check with the questionnaire team that they can implement what is necessary for V1.2 - your email at below


2) Check with an ESG representative (I think you have a tame one at home ;-)) whether this will have any knock on effects that we have missed in ESG and DRS.


3) Talk to PCMDI and the TAMIP team about implementing it this way


I can do (3) - you are probably better placed to do (1) and (2).


The parallel question also arises that if this approach works for tamip - can it also be made to work for decadal which may improve the questionnaire.  So  should we try and open up the consideration of this question to decadal as well.


comment:9 Changed 8 years ago by charlotte

Suggested names for this type of ensemble:

  • temporal shift ensemble (t#)
  • hindcast ensemble (h#)

I think hindcast works for TAMIP and the decadal hindcast experiments but may not be general enough.

Also the experiments do overlap (5 day experiments initialised at 30 hour intervals)

And DRS does have a 30min interval

comment:10 Changed 8 years ago by charlotte

Comment from Mark Elkington We need to try and get some resolution on the TAMIP experiment issue. At the moment the proposal on the table is for 64 experiments named tamip1.....tamip64 - this was recommended by Karl on the grounds that it follows the approach used for the decadal experiments. This doesn't feel like a good solution to me and I can't see many modelling groups willingly filling in 64 experiments - that are essentially similar apart from the start date. I don't think this is going to sit too well in the CMIP5Q either - especially as the 64 experiments might be growing to 128. The proposals I have heard are:

  • give TAMIP a separate instance of the questionnaire - solves the CMIP5 problem but doesn't really help the TAMIP folks (but is that our problem?)
  • allow simulation cloning within an experiment - i.e. the experiment would have 64 simulations with the duration changed. But simulation is not reflected in the DRS
  • overloading the ensemble data structure to deal with non-ensemble related model runs - this looks OK from a Q point of view but does it just move the problem to data access/use
  • are there any others?

It would be nice if the solution also worked to allow the decadal experiments to be collapsed into a single experiment.

comment:11 Changed 8 years ago by bryan

  • Description modified (diff)

comment:12 Changed 8 years ago by charlotte

I have made a first stab at documenting experiment 1.1 DecadalHindcasts as a staggered start ensemble.


This document makes use of "requirement set" to describe the staggerd start initial conditions and it also uses  "requirement option" to make the choice between emissions and concentrations more intuitive for users.

Please do comment on how I have implemented the requirement set and requirement option concepts.  Can they be used in this form in the conformance pages of the questionnaire?

comment:13 Changed 8 years ago by charlotte

  • Cc paco added
  • Description modified (diff)

comment:14 Changed 8 years ago by charlotte

Bryan, Gerry - Are we going to implement requirementOption in the next release of the questionnaire? I would like to know as it determines how I resolve #561 - emissions/concentrations.

comment:15 Changed 8 years ago by charlotte

Two Similar Questions about staggered start ensemble axes.

How/where do we describe the staggered start ensemble "axis"?

  1. Hard coded in the conformance pages - using requirement set.
  2. Entered by users when they describe the characteristics of the staggered start ensemble.

Where do we ask users to tell us about the initial conditions for each staggered start ensemble member?

  1. Hard coded as a requirement set in the conformance part of the questionnaire.
  2. User enters initial condition information when they hit the "Describe Ensemble" button and tell us about the characteristics of each ensemble member

These might seem like the same question but I think they may have different answers and I keep vascillating ;-)

I would like to see how Bryan expects us to implement requirement sets first.

comment:16 Changed 8 years ago by charlotte

Spoke with Gerry, we are going to implement requirement option in the next release of the questionnaire.

comment:17 Changed 8 years ago by charlotte

  • Description modified (diff)

comment:18 Changed 8 years ago by charlotte

Comment from Paco:

I've had a look at the documentation available for the ticket 684 and have a few comments about it.

  • I still have the impression that both TAMIP and the decadal hindcasts are examples of the same problem: initialized simulations that can have ensembles, i.e., simulations started the same date with different initial conditions. I think this is important because if there is the risk that with TAMIP there will be 128 experiments, with the decadal hindcasts there might be a lot of experiments too as soon as some contributors to CMIP5 decide to run the ARGO period runs (starting once per year over the XXIst Century). Also, some of the decadal hindcast experiments might construct their ensemble by lagging by a few hours the start date, just as in TAMIP.


  • In Charlotte's attempt to document experiment 1.1 DecadalHindcasts, the requirementSet that describes the ocean initial conditions does not allow, in my opinion for staggered starts. Some CMIP5 contributors might create their ensemble hindcasts by starting the ocean model in a staggered way, say a member with data from 30th of October at 00 GMT, the next one from 31st of October and the next one from the 1st of November. In all these cases, the soil, atmosphere and sea-ice initial conditions might be from a different date, say for all three members from the 1st of November. I know this sounds complicated to document, but it reflects a very likely case.


  • Also in Charlotte's document, are the soil, atmosphere and sea-ice initial conditions going to be included in a similar requirementSet or is it going to be a requirementOption? The initialization of the components other than the ocean might be an important piece of information. I realize that sea ice appears as a BoundaryCondition type later in the document, but most of the decadal hindcast systems won't use specified sea-ice, but dynamical sea-ice models coupled to the ocean and the atmosphere.


  • In the calendar section of this document, I noticed that the start date is 19601231, but many contributors might start their simulations as early as September 1960. In those cases, they might run their simulations for a little longer than 10 years to complete until the end of December of the last year.
Note: See TracTickets for help on using tickets.