Ticket #1004 (assigned Issue)

Opened 2 years ago

Last modified 2 years ago

No common key between TDS-DRS data and CIM instances

Reported by: spascoe Owned by: gerry
Priority: blocker Milestone:
Component: WP6 - CMIP5 Questionnaire Version:
Keywords: Cc: charlotte, bryan, gerry
Requirement: http://metaforclimate.eu/Work-Package-2/Developing-the-CIM/Project-Requirements-summary.htm

Description (last modified by spascoe) (diff)

As discussed on the telco 2012-01-05.

To merge data and metadata and to link to the CIM simulation instances, we need a common key between the TDS-DRS oriented data and the CIM instances. If a common key does not exist we need to find another way of achieving the link.

There are 2 problems to overcome.

1. Experiment mapping

The Decadal experiment name in CIM has been collapsed into a single name whereas DRS is using a set of names of the form decadalXXXX where XXXX is a start year.

For instance the NCAR Gateway currently lists these decadal experiments for CMIP5. Dataset count in parentheses:

decadal1959 (123), decadal1960 (1061), decadal1961 (54) decadal1962 (54), decadal1963 (54), decadal1964 (177), decadal1965 (1104), 
decadal1966 (54), decadal1967 (54), decadal1968 (39), decadal1969 (177), decadal1970 (1104), decadal1971 (54), decadal1972 (54), 
decadal1973 (54), decadal1974 (177), decadal1975 (1104), decadal1976 (54), decadal1977 (31), decadal1978 (186), decadal1979 (310), 
decadal1980 (1204), decadal1981 (195), decadal1982 (163), decadal1983 (195), decadal1984 (286), decadal1985 (1188), 
decadal1986 (163), decadal1987 (163), decadal1988 (163), decadal1989 (274), decadal1990 (1193), decadal1991 (163), decadal1992 (163), 
decadal1993 (195), decadal1994 (286), decadal1995 (1191), decadal1996 (195), decadal1997 (163), decadal1998 (198), decadal1999 (287), 
decadal2000 (1190), decadal2001 (499), decadal2002 (499), decadal2003 (531), decadal2004 (619), decadal2005 (1183), decadal2006 (519), 
decadal2007 (441), decadal2008 (442), decadal2009 (193), decadal2010 (262)

2. Ensemble RIP to Simulation mapping

DRS has no concept of a simulation. We might assume that we could map simulations to DRS like this:

cim-simulation == (drs-institute, drs-model, drs-experiment)

However, this assumes all ensemble members for this model/experiment are in the same simulation. This is not the case (do we have examples?). Two CIM records may refer to the same institute/model/experiment but with different collections of ensemble rip values. Therefore in general:

cim-simulation == (drs-institute, drs-model, drs-experiment, [drs-ensemble, drs-ensemble, ...])

This cannot be represented as a single key without some syntax for a collection of ensembles. E.g. a wild-card or comma-separated list. Alternatively somewhere there needs to be a 1-to-many mapping between cim-simulation and drs-ensemble.

Change History

comment:1 Changed 2 years ago by spascoe

  • Description modified (diff)
  • Summary changed from Questionnaire decadal experiment names are not compatible with DRS experiment names to No common key between TDS-DRS data and CIM instances

comment:2 Changed 2 years ago by spascoe

  • Description modified (diff)

comment:3 Changed 2 years ago by bryan

Well, in theory case 2 could exist, but I'm not sure of any cases. In all cases I'm aware of (except those associated with your case 1), all ensemble members should be described by one "simulation" CIM record.

comment:4 Changed 2 years ago by bryan

  • Cc bryan added

comment:5 Changed 2 years ago by spascoe

Sylvia wrote by email:

If simulation and experiment are the same thing for CMIP5, then the simple solution here is to create an experiment for every CMIP5 experiment. I know that Charlotte mentioned that the decadals were grouped together for fear of intimidating users with the volume of things they need to do. Alas they are already intimidated and if each of the decadals really is a separate experiment then we should call it as such. We already have the metadata (groups are submitting simulations) for each of those we just need the experiments to match.

  1. Strictly speaking a CMIP5 "experiment" is the design of a simulation done by multiple modelling groups. I.e. Pre-industrial control, RCP 4.5, etc. When we say "simulation and experiment are the same thing in CMIP5" we mean (institute, model, experiment).
  2. Item #2 on this ticket disputes this equivalence, at least in theory. It says the simulation is also related to a subset of ensemble members. Bryan thinks this hasn't happened in practice. I think it is a very high priority to establish whether this is true. If it is true we should enforce it somewhere to stop it happening.

comment:6 Changed 2 years ago by spascoe

From Mark:

For example some organisations have defined their metadata for the rcp45 experiment as a 'main' rcp45 experiment (i.e. r1i1p1) and an 'ensemble' rcp45 experiment (i.r. r2/r3/r4) in the questionnaire. They are sort of pushed in this direction because currently there is no way to indicate that different ensemble members can cover different periods - and typically r1 covers the period 2006-2300 and the other elements only 2006-2100. ... It is a relatively easy fix in the questionnaire to allow the data provider to have one simulation record for each DRS experiment - we just need to allow different length elements.

comment:7 Changed 2 years ago by spascoe

  • Owner changed from sylvia to gerry

Assigning to Gerry to comment on this suggested course of action:

  1. Change the questionnaire to allow only one simulation record for each (institute, model, experiment)
  2. Identify all CIM Simulation records that relate to the same DRS (institute, model, experiment) but different sets of ensemble members.
  3. Help modelling centres merge these records into a single simulation

comment:8 Changed 2 years ago by gerry

  • Status changed from new to assigned

Stephen,

The changes needed to the questionnaire (i.e. number 1) should be possible to do, particularly since this is how the questionnaire was set up previously to some degree. I'll need to chat some more about the need for ensemble member duration, i.e. Mark's comments - I've already looked into the plausibility of doing this and think we can get this into the CIM (if this is still the case that we want to). The second task will be a little more cumbersome, and we will have to have a look at what has already been published in terms of the decadal-type experiments/simulations to see what can be salvaged/modified against what needs to be republished. I'll set up a telco for next week to discuss this, and in the mean time try and get a handle of how much we will be needing to change.

comment:9 Changed 2 years ago by sylvia

I had a conversation with Bryan yesterday and received one clarification:

1) The DOI must point to a human readable version of the metadata, so pointing to the instance in the atom feed will not work, the DOI must point, at least for now, to the ESG trackback. This complicates things...

Currently, the trackback takes the model name + the simulation name (cuz it was the unique identifier up till now) to create the URL instance. If the questionnaire gets changed so that simulation = experiment (in fact why not remove the concept of a simulation name and automatically replace it with the experiment name), then the ESG software is going to require a major modification to do the same.

Also, we will need a new experiment.owl file. This was generated by rupert from the experiment xml files in the repository.

So to get this to really work we have to:

1) Change the way experiments are represented 2) Change the way the questionnaire is structured 3) Make sure ensembles are handled correctly 4) Change the ESG harvesting software UNLESS you all just do it in the XML 5) Somehow (and this is not easy) remove all the old instances with the old names in the system so the new names come through.

Sylvia

comment:10 Changed 2 years ago by spascoe

Following the telco on 16th Jan we made the following decisions:

The Ensemble issue can be mainly solved with Gerry's solution above which will result in a single CIM Simulation per modelled drs-experiment. Gerry will implement this and traps to ensure this constraint holds. Simulation CIM already has DRS experiment identifiers (e.g. decadal1960), however, trackback is based on the Experiment CIM used as input to the questionnaire which contains "decadal". If Trackback can be adapted to create URLs from (model, drs-experiment) then we can use Trackback as our metadata DOI link. However, if this is impractical an alternative is to use the CIM metadata services portal which is planned to be operational by the end of Feb (MarkM: correct me if that's wrong).

Therefore Sylvia needs to indicate whether this can be implemented.

comment:11 Changed 2 years ago by bryan

As it happens when I was at NCAR last week we discussed the necessity of having a restful URL for metadata records, and I'm given to understand it wont be a major problem. More on that, and related issues, by the end of the week!

comment:12 Changed 2 years ago by spascoe

  • Cc gerry added

An example of an Simulation feed entry is below. It indicates the drs-experiment as "decadal1961" whereas the description says "dacadal1990". Is there a reason for this inconsistency?

<entry cmip5qn:qnDRS="MPI-M_MPI-ESM-LR_1.1 decadal1961" cmip5qn:centre="MPI-M">
    <id>urn:uuid:06d9cbfa-e44c-11e0-8519-00163e9152a5</id>
    <title>decadal1990-LR (decadal1990-LR) - Version 1</title>
    <updated>2011-12-05T13:27:39Z</updated>
    <published>2011-12-05T13:27:39Z</published>
    <author>
      <name>Marco Giorgetta</name>
      <email>marco.giorgetta@zmaw.de</email>
    </author>
    <link href="http://q.cmip5.ceda.ac.uk/cmip5/simulation/06d9cbfa-e44c-11e0-8519-00163e9152a5/1/"/>
    <summary>decadal hindcast experiment</summary>
    <content src="/cmip5/simulation/06d9cbfa-e44c-11e0-8519-00163e9152a5/1/" type="application/xml"/>
  </entry>
Note: See TracTickets for help on using tickets.