Ticket #930 (closed Issue: fixed)

Opened 2 years ago

Last modified 2 years ago

Experiment Names and RIP

Reported by: charlotte Owned by: charlotte
Priority: blocker Milestone: V1.1 Questionnaire Release
Component: WP6 - CMIP5 Questionnaire Version: 1.1
Keywords: Cc: gerry, bryan, rupert, paul
Requirement: http://metaforclimate.eu/Work-Package-2/Developing-the-CIM/Project-Requirements-summary.htm

Description

Where is the best place to address the complexity of CMIP5 experiments?
Experiment names or the r i p indices. 

Dear Metafor:

I noticed that the list of experiments for CMIP5 in the questionnaire is quite long and not as consistent as it might be with what we specify in the appendices of:
 http://cmip-pcmdi.llnl.gov/cmip5/docs/cmip5_data_reference_syntax.pdf

Here are some things to consider:
1. There should be no need for an I or E prefix on any of the runs.
2. I don't think you need separate entries for 1.1, 1.2, and 1.5 decadal expts.
3. I would omit for now the 1.6 decadalChemistry simulation since it is just a place-holder.
4. What is 3.1-S piControl? I don't think this belongs.
5. There should be no need for the "L"-prefixed rcp runs.
6. there should be no need for 6.1S 1pctCo2
7. The variously forced historical runs are now being considered part of a family (see the DRS document)

I think it would be clearer if the user would initially only see the list of experiment names shown in Appendix 1 of the DRS document. Sometimes these represent a set of closely-related experiments that are distinguished by different "rip" values in the DRS ensemble designation. In the case of the decadal runs, the initialization year may differ across the family. After selecting the the experiment family name, the user should be asked to input the "rip" value for the simulation. Much of the information entered for a family applies across the family. The user would only have to alter:
1. if the "r" value differs from previously entered simulations, information about the initial conditions and possibly the length of the run. (Note this remove the need for E- and S-prefixed runs, I think.)
2. if the "i" value differs from previously entered simulations, information about the initialization method would be asked for.
3. if the "p" value differs from previously entered simulations, information about the model physics and/or model forcing would be asked for.

In the case of decadal runs, the user would be asked to enter the initialization year before moving to the "rip" question page.

I don't think information about the length of simulations should appear in the experiment name. This is a independent piece of information that should be recorded elsewhere. Invariably an E-prefix run is the same experiment just carried out for a longer period of time. Note that different control simulations will run for different lengths of time, but we don't distinguish among these by including time information in the experiment name.

I hope you find these comments useful.

Best regards,
Karl

Change History

comment:1 Changed 2 years ago by charlotte

Hi Karl, Thanks for the clarification. As Bryan says, some of the changes you ask for we already do and we can handle the rest of your suggested changes to experiment names - but there are consequences. I've listed the changes you requested in terms of things that are: Done (or in hand), relatively easy, harder, would prefer not to.

That said the consequences of changes to experiment names to the rest of the data and metadata pipeline for CMIP5 must be considered - I'm going to draft a message to GOESSP-tech after this.

Done

  • RIP - we capture this information for all simulations and ensemble members.

We will have an extra page added by the end of the week where users can tell us how to interpret their RIP indices.

  • Experiment numbers - the experiment numbers do not (at least I hope they don't) form part of the experiment names in the CIM documents that are output from the questionnaire. We keep them there in the list of experiments that users see because it is helpful for the list of experiments to appear in numerical order. The experiment list would be more difficult to negotiate if it was in alphabetical order!

Things we can do "relatively" easily

  • removing the E, I and S suffixes
  • removing the length of the experiment from the experiment name There is some behind the scenes database issues to get straight before we can do this but we have a solution in the pipeline.

Harder things Removing the start year from the decadal experiment names. The questionnaire hard codes experiment names - if we remove the start year from the decadal experiments names we will have to build some kind of mechanism to capture the start date from the simulation description and assemble a new experiment name. We capture all the information needed to re-assemble the decadal experiment names right now - good thing, but making that happen is something that will have to be implemented in the post processing code and may take time. Things we don't want to do historicalMisc If we change the individual forcing experiment names to historicalMisc then the work flow for conformance becomes really messy. When a user comes to tell us about how their simulation conformed to the requirements of the historicalMisc experiment they will be faced with a list of all possible forcings and they will have to check all the ones that don't apply as well as telling us which ones do. We think it would be more straight forward for users to only see the requirements that are relevant to the forcing they are applying - this would happen naturally if the individual forcing experiments keep their individual names.

All the best Charlotte --- --- --- --- --- --- Charlotte Pascoe +44 (0)1235 445869 BADC - CEDA - RAL - STFC --- --- --- --- --- ---


From: Bryan Lawrence bryan.lawrence@… Sent: 26 January 2011 07:17 To: Karl Taylor; Gerry Devine Cc: Pascoe, Charlotte (STFC,RAL,SSTD); Eric Guilyardi Subject: Re: CMIP5 simulation names

Hi Karl

It's not just a "list of experiments", it's rather more sophisticated than that ... and what you ask about ensemble types is already in there.

That said, Gerry is more than half way to dealing with this, but it's my sense that we should do this after the rest of the questionnaire is bullet proof. I will talk to Charlotte and Gerry about this tomorrow.

It might help to recognise that this is an *entry* tool, how one searches on the informatoin and how one lays it out once collected, will be very different.

Cheers Bryan

Dear Charlotte,

I said on the phone today that the suggestions I made in my previous email (copied below) shouldn't hold up release of the data. However, to avoid confusion by those filling out the questionnaire, it should be a reasonably high priority to address at least some of them.

All the valid experiment names in the CMIP5 (and defined by the DRS) are listed in the Appendix (see "Short Name of Experiment") of the DRS document available at:

 http://cmip-pcmdi.llnl.gov/cmip5/output_req.html?submenuheader=2#req _format

I don't see that there is any added value in listing additional experiments on the questionnaire. (Reiterating what I said earlier, you're already going to ask modelers to record the length of the simulation, so including any indication of this in the experiment name is unnecessary. Similarly, we encourage ensembles of all experiments, so there is no need to indicate this in the experiment name.)

In general the number prefixes (like 1.1, 1.2, 3.3, 1.1-E, etc.) on the experiments are not very useful any more. The "Short Names" given in the DRS document are definitive. The length of each simulation should recorded in the metadata, not as part of the name of the run. Whether or not a run is the first or subsequent member of an ensemble shouldn't place it in a different category (they're all equal). By omitting the numbers, you can reduce the number of experiment names considerably without any important loss of information.

I would hope that simply changing the names appearing in the list of experiments wouldn't be too difficult. Please let me know if it is. I think these names should be limited to and consistent with what is allowed by the DRS.

A more difficult issue relates to ensembles. There are 3 different types of ensembles that need to be distinguished, and it would be very helpful to users if it were obvious which simulations are considered part of the same ensemble. [In the DRS, directory structure, and filenames ensemble members are distinguishable only the the "rip" values.

Here are the ensemble types:

1) initial condition ensemble: members are started from different, but equally realistic, initial conditions. The model used in each simulation is exactly the same. In the DRS, each of these simulations is assigned a different "r" value in the "rip" designation of ensemble member, but all members share the same "experiment" name and "model" name (among other things). [Note, however, that in decadal runs different prescriptions of ocean temperatures or land surface characteristics based on different observational datasets would be considered part of the "initialization method", discussed next; if the only difference in the decadal runs is in the atmosphere's initial state, however, these would all be considered to be the same "initialization method", but the members would comprise an "initial condition" ensemble.] For Transpose AMIP, the 16 member initial condition ensembles are initialized from observations spaced 30 hours apart.

2) initialization method ensemble (needed only for "decadal" runs): members differ only in being initialized from different observational data sets or initialization methods that might be expected to affect the "decadal time scale trajectory of climate" that is being investigated in the decadal experiments. In the DRS, each of these simulations is assigned a different "i" value in the "rip" designation of ensemble member.

3) perturbed physics, perturbed forcing ensembles: members differ only in a relatively small difference in model formulation (e.g., a slight perturbation to an uncertain parameter in the model) or a difference in the "forcings" included in a historical run (to distinguish among these in the historicalMisc experiment). In the case of perturbed physics, all members share the same experiment name and model name (even though the models aren't identical). In the perturbed forcing case, all members share the same model name and experiment name (even though the forcing isn't identical). In the DRS, each of these simulations is assigned a different "p" value in the "rip" designation of ensemble member.

The bottom line is that if I simply ask you for documentation about a model and experiment, you can't tell me everything I'd like to know unless I also specify the "rip" values.

When asking a modeler to fill out the questionnaire, under each experiment, they should be asked whether an ensemble of simulations has been performed, and the differences should be described for each r, i, and p value. Note that from one model to the next, the meaning of each r, i, and p value won't be standardized. For example, an rip value of r1i1p1 for model A will not normally carry over to r1i1p1 for model B. They could start from different initial conditions, use different initialization methods, and in the case of historicalMisc, they could include different "forcings".

I hope this clarifies what's needed. My sense is that it shouldn't be too hard to change the list of experiments, but making it possible to describe the various ensemble members is much more complicated. Let me know how long you think it might take to fix any current problems in this regard.

comment:2 Changed 2 years ago by gerry

Suggested Plan for capturing/displaying r.i.p value:

1) 3 new separate models/database tables for storing the integer value and description of the r, i, and p values of the drs 2) A new tab page for entering details of each where the entry page has a seperate 'block' for each of rip. In each block, there will be a straightforward name/value form field entry. May try to introduce a little ajax to allow 'add new' without the need for a save. 3) On the ensembles page (or simulation page), there will be a link/button to open a new window that lists all the rip values currently associated with that centre - this will mean that the user is not redirected off the current page.

comment:3 Changed 2 years ago by bryan

See ticket/930 for a discussion of experiment naming!

comment:4 Changed 2 years ago by bryan

So, I've finished my notes for the moment.

I think we should decide on something, and get Taylor et al rewritten consistently, so that the consumer can begin with Taylor et al, go via metadata and end up in DRS land.

comment:5 Changed 2 years ago by charlotte

  • Cc bryan, rupert added

I've updated the wiki #ticket/930 with my understanding of how Bryan's proposal would work in practice.

Bryan I have my head around this now but no time left to email Karl until much later this evening. 

Rupert I'm cc-ing you because you'll be conflating on this
- conflate: combine (two or more texts, ideas, etc.) into one. 

comment:6 Changed 2 years ago by charlotte

Link to Bob Drach's controlled vocab for CMIP5 derived from DRS specification

 http://esg-pcmdi.llnl.gov/internal/esg-data-node-documentation/cmip5_controlled_vocab.txt/view

comment:7 Changed 2 years ago by charlotte

added a controversial proposal for historical Misc experiments to the wiki

comment:8 Changed 2 years ago by charlotte

  • Cc paul added
  • Status changed from new to assigned

I have revised the historical experiment xml documents r2605
 http://metaforclimate.eu/trac/browser/cmip5q/trunk/cmip5q/cmip5q/data/experiments/revised
There are now 8 experiments for input which become 3 experiments for output: decadalXXXX, noVolcXXXX and volc2010, where XXXX is the start year of the simulation.  

XML file nameexperiment shortname (input)experiment name (ouptut)
1.1_Decadal_10yr1.1 decadaldecadalXXXX
1.1E_Decadal_10yr_O101.1-E decadaldecadalXXXX
1.1I_Decadal_Initial1.1-I decadaldecadalXXXX
1.2_Decadal_30yr1.2 decadaldecadalXXXX
1.2E_Decadal_30yr1.2-E decadaldecadalXXXX
1.3_Decadal_NoVolc1.3 noVolcnoVolcXXXX
1.4_Decadal_Pinatubo1.4 volc2010volc2010
1.5_Decadal_AlterInit1.5 decadaldecadalXXXX

The experiment basename (second field of the input shortname) needs to be concatenated with the start year of the simulation to create the experiment name for that simulation in the output xml.

comment:9 Changed 2 years ago by charlotte

All the experiment documents have been revised and are now in the /revised directory r2612

One outstanding experiment 7.4 historicalExt has no xml description at time of writing.

There were ~119 experiments, there are now ~43.

comment:10 Changed 2 years ago by charlotte

  • Status changed from assigned to closed
  • Resolution set to fixed

experiment documents have been revised and will go into the live questionnaire in the next day or so.

Note: See TracTickets for help on using tickets.