| Version 15 (modified by rupert, 4 years ago) (diff) |
|---|
Processing the mindmaps (bundled version) to create XML suitable for ingestion by the CMIP5 questionnaire
At the telco on the 29th September 2009 it was decided to change the format of the mindmaps to support a (hopefully) improved version of the questionnaire.
Background
Marie-Pierre et al have been keeping two versions of the mindmaps, the original "scientific" mindmaps and a set of "flattened" mindmaps. In the flattened mindmaps component properties have no hierarchy, hence the term flattened. Marie-Pierre has been manually translating from the scientific mindmaps to the flattened mindmaps. The flattened mindmaps are subject to a set of rules which allow them to be translated into an XML format that is more suitable for ingestion by the CMIP5 questionnaire.
It has been proposed that the flattened mindmaps are replaced by a "bundled" mindmap. The bundled mindmaps allow component properties to be grouped together by a label (hence the term bundle). They also allow component properties to be grouped together by a constraint.
The bundled mindmap rules are described below. The associated CMIP5 questionnaire XML rules are documented in ticket 343.
Rules
Golden Rule: Don't create ambiguity where no ambiguity exists by using different names for the same thing.
1: The mindmaps should be named <name>_bdl.xml where <name> is one of the seven realms.
2: Controlled Vocabulary (CV) nodes should be to the right on the MindMap? (MM). Anything to the left will be ignored. [We can't actually check this, all we can do is check that nodes to the right conform to the rest of the rules defined here]
3: Specifying work in progress
- A CV node and its children will be ignored if it includes the yellow triangular warning icon (messagebox_warning). This icon is meant to be used when the node is not yet complete. The xsl stylesheet MMtodo.xsl outputs any "to be completed" nodes.
- A CV node and its children will be ignored if its text is in an italic font. Eg. enumerations of controlled vocabularies that have not yet been agreed.
4: All CV nodes that are not covered by rules 2: and 3: must conform to the following visual rules
a: component Bold b: parameter bundle Purple(#990099) c: constraint Blue(#0033ff) d: parameter Brown(#996600) e: value nodestyle fork
5: Hierarchy rules:
- There is one mindmap for each of the seven realm components.
- a component contains
- [0..n] components
- [0..n] parameter bundles
- at least one component or parameter bundle
- a parameter bundle contains
- [0..n] parameters
- [0..n] constraints
- at least one parameter or constraint
- a constraint contains
- [1..n] parameters
- a parameter contains
- [1..n] values
- values are atomic
6: Specifying different types of values (controlled vocab, keyboard input, coupled):
Different types of values are indicated by adding the following icons to a value node
- keyboard input (numerical value) is indicated by the Purple-1 icon [name + format?]
- keyboard input (string value) is indicated by the pencil icon [name + format?]
- controlled vocabulary definition is the default when no icons are found
- coupled parameter is indicated by the [TBD]
7: Specifying XOR, OR or AND for parameter values:
If a parameter has more than one potential value associated with it then the mindmaps must define whether only one value can be specified for the parameter or whether more than one may be specified. This is indicated by including the tick icon (button_ok) for OR, the cross icon (button_cancel) for XOR or the yellow star icon (bookmark) for AND for all of the value nodes that are children of the parameter in question. A consequence of this rule is that a value node and all its siblings will have the same icon.
* UP TO HERE *
8: parameters which do not naturally fit into a parameter group will be placed in a default parameter group which has the name "<component name> Attributes".
9: Vocabulary Definitions:
- a: A controlled vocabulary value node can include a note (pen and book icon) which provides a description of the value. --This is particularly useful for CV lists which contain "other", the note can be used to inform the user of the information we require about the other item--
- b: A leaf parameter can include a note (pen and book icon) which provides information about the parameter.
- c: Notes associated with any other nodes will be ignored.
8: Component names must not end in Scheme (produces a warning in the checker but does cause an error).
- Avoid component names that end with "Scheme" (eg. TracersLateralAdvectionScheme? -> TracerLateralAdvection?). To know whether it is a scheme or not rule 9 is sufficient.
9: If a component describes a parametrization/numerical scheme then it must have a SchemeType? parameter (SchemeName? is not mandatory as some schemes have no usual name and are only identified by their type).
10: There must be a one to one mapping between a parameter name and the type of numerical value it contains. This is to reduce confusion when we process the questionnaire responses.
11: the mindmap will interpret the syntax: [name](units) as follows...
- the contents of [] is the name/description of the parameter that appears next to the keyboard input box in the questionnaire
- the contents of () are the units that correspond to a numerical parameter.
Checking
Translating
Add info about mmversion???]
[Add dummy definitions if they do not exists]
BELOW ARE THE OLD FLATTENED MINDMAP RULES FOR POSTERITY
Processing mindmaps to create xml for the CMIP5 questionnaire
Rupert, Marie-Pierre and Charlotte had a long email discussion about the rules that Rupert will be using to process the mindmaps we are using to gather controlled vocabulary for the questionnaire (subject: mm to xml). The proposed rules for MindMap? conformance below are the outcome of this discussion.
Proposed rules for MindMap? Conformance
Golden Rule: Don't create ambiguity where no ambiguity exists by using different names for the same thing.
1: Controlled Vocabulary (CV) nodes should be to the right on the MindMap? (MM). Anything to the left will be ignored. [We can't actually check this, all we can do is check that nodes to the right conform to the rest of the rules defined here]
2: Work in progress
- A CV node and its children will be checked but not go through to the questionnaire if it includes the yellow triangular warning icon (messagebox_warning). This icon is meant to be used when the node is not yet complete. The xsl stylesheet MMtodo.xsl outputs any "to be completed" nodes.
- In the development phase there will be a version of the questionnaire that does include the work in progress components (so long as they obey the rules). These nodes will come with a warning.
- A CV node and its children will be ignored if its text is in an italic font. Eg. enumerations of controlled vocabularies that have not yet been agreed.
3: All CV nodes that are not covered by rules 1: and 2: must conform to the following visual rules
a: component Bold b: component ref as a: + red arrow (LINK attribute defined in MM xml) c: leaf parameter Brown(#996600) d: complex parameter Purple(#990099) NO LONGER ALLOWED ? e: common property Blue(#0033ff) f: common property ref as e: + red arrow (LINK attribute defined in MM xml) g: value NodeStyle? fork
Comment: The mm distinguishes between components based on their position in the hierarchy. A root component is 18pt, a child of a root component is 14pt, any other children are 14pt and Purple(#990099). This information is purely a visual aid and is not required in the questionnaire.
4: A value may include the pencil icon or the Purple-1 icon. These icons indicate that a keyboard input is required. The pencil icon indicates that a 'string' is to be input. The Purple-1 icon indicates that a 'numerical' value is to be input. In both cases the text is enclosed in square brackets. This text acts as a description rather than being controlled vocabulary. If a value does not include either icon it is assumed to be a controlled vocabulary definition.
5: If a value node has value node siblings (i.e. more than one value node have the same parent) then it, and all its siblings, must include either the tick icon (button_ok), the cross icon (button_cancel) or the yellow star icon (bookmark). A value node and all its siblings must have the same icon. The tick icon indicates an OR group, the cross icon indicates a XOR group and the yellow star icon indicates an AND group.
6: Notes:
- a: A controlled vocabulary value node can include a note (pen and book icon) which provides a description of the value. --This is particularly useful for CV lists which contain "other", the note can be used to inform the user of the information we require about the other item--
- b: A leaf parameter can include a note (pen and book icon) which provides information about the parameter.
- c: Notes associated with any other nodes will be ignored.
7: Hierarchy rules:
- a: There is one mindmap per level 1 parent component and one for the root (level 0) component. (This rule implies that it is not valid to have both atmos and ocean in the same mindmap)
- b: A component will contain 0 or more components and/or component references.
- c: A component will contain 0 or more common CV nodes. Common CV nodes may also be children of the central node. In this case they apply to all component nodes. Any common CV nodes may be references. Rupert made this up we need to talk about it, we need to come back to this.
- d: Leaf components (components not containing other components) will contain 0 or more parameters.
- e: Common CV nodes will contain 1 or more parameters. hmmmm
- f: complex parameters (a choice based parameter) will contain 1 or more complex parameters and/or parameters NO LONGER REQUIRED but we may be able to flatten these using a piece of code if we come up with rules to govern it.
- g: parameters will contain one or more values (of the same type - see rule 10) Oh dear. Suggestion that we should have more control over user responses.
- h: values are atomic, they may not contain any other nodes. [MMCheckValues.xsl] (True for atmosphere not true for ocean right now, we may be able to get round this with the flattening rules in point f)
8: Component names must not end in Scheme.
- Avoid component names that end with "Scheme" (eg. TracersLateralAdvectionScheme? -> TracerLateralAdvection?). To know whether it is a scheme or not rule 9 is sufficient.
9: If a component describes a parametrization/numerical scheme then it must have a SchemeType? parameter (SchemeName? is not mandatory as some schemes have no usual name and are only identified by their type). '
10: There must be a one to one mapping between a parameter name and the type of numerical value it contains. This is to reduce confusion when we process the questionnaire responses.
11: the mindmap will interpret the syntax: [name](units) as follows...
- the contents of [] is the name/description of the parameter that appears next to the keyboard input box in the questionnaire
- the contents of () are the units that correspond to a numerical parameter.
We need a check for this.
12: Express the option "no/none" with the parameter "modelled" This rule replaces the "no X phenomenon / scheme" in (Scheme)Type drop-list of values.
- "Modelled" maps onto what is called "represented" in the CIM
- The suggested possible values of modelled are (to be disscussed):
- yes, activated
- yes, but not activated
- no.
13: Icons we are using in the mindmaps
triangle caution There is still work to do pencil A string keyboard input value is required Purple number 1 A numerical keyboard input value is required Question Mark Author is not sure about this Green tick Boolean OR Red cross Boolean XOR Star Boolean AND red traffic light * a current ticket 244 rule does not match electric light bulb * suggestion of modification for a current 244 rule Letter * 2 options proposed (to be voted... or suggest an alternative representation)
- indicates that these icons are used for communciation with each other not for implementation
Outstanding questions about the MindMap? rules
- Do we care about the font, font size and/or choice of bubble or fork visual representation being consistent for the same types of nodes in the mindmaps? Currently font sizes are consistent (should sizes therefore be part of all clauses in rule 4?). However, the type of font is currently not consistent (in Atmosphere.mm). Should we enforce consistency? If so, for everything, or a subset?
- Hmmm, I hadn't even realised that I had been using sans-serif and not ariel for parts of the mindmap, I guess that shows us that it doesn't really matter.
I am not sure what it means to be common so I'm reserving comment on these questions until I have had more time to think.
- Is it correct to assume that common CV is shared by all components at a lower level of the tree than its definition. If not how is it determined which components treat it as common and which do not?
- What does it actually mean to be common? Does it mean that each component shares these definitions i.e. we define them once for all these components?
- Can a component override a common parameter by declaring a local parameter of the same name or must they be distinct?
- Is it correct to allow common CV to exist at any level (not just at the "top" level?
- Is it correct to limit reference nodes to be common CV or a component (of any type) or can other types of nodes be references too e.g. a parameter?
A prototype schema has been created and is in the repository
The translator (MMReader.xsl) has been written and is in the repository. The xml output that the xsl creates from the Atmosphere mindmap has also been added to the repository.
Two constraint checks have been written so far ...
The easiest way to run the xsl is to use the command line program xsltproc although you can obviously use whatever xsl processor you like. To produce the Atmosphere.xml file I ran the following.
xsltproc xsl/MMReader.xsl Atmosphere.mm > Atmosphere.xml
So here is a quick summary of what I am doing.
!: I am finding out what the constraints on the mindmaps should be
2: I am writing a set of checks that test that the mindmaps conform to these constraints. I have written one check so far: MMCheckSchemeAndType.xsl
3: Part of these checks includes the output of all of the "to be done" parts of the mindmaps (MMtodo.xsl).
4: I am writing a translator (MMReader.xsl) that converts the mindmap xml to a more generic xml form as long as they conform to the constraints
5: I will write a schema which defines the more generic xml structure
6: I am writing a translator (MMWriter.xsl) that converts the more generic form into a mindmap.
Charlotte wrote:
I get what you mean now when you talk about complex parameters.
ComponentDomain?, Space, HorizontalDiscretisation? in the Atmosphere mindmap has three parameters and we need to ask for different information depending on which one the user chooses to describe their model. So yes, we do need to have a concept of Complex Parameter.
Rupert wrote:
Regarding complex parameter I was assuming that a complex parameter is a parameter which contains other parameters but perhaps this should be a component? Such things are coloured purple (but are not bold). There is one example in the atmosphere mindmap which is "Space" in "ComponentDomain". I can't see how that could be a component. Such entities are also used in the ocean mindmap e.g. "Lateral" in "Tracers Scheme" in "OceanAdvection". Advice anyone?
Charlotte wrote:
Here are my amendments/comments on the constraints for MM/xml
1: All leaf components must have Scheme and Type parameters (changed)
2: All MM nodes must conform to the specified formats (agreed)
3: A component may contain another component and/or parameters (agreed)
4: I don't understand what a complex parameter is. (???)
5: A parameter only contains controlled values (agreed, unless it's value has been specified as user defined by a pencil icon or number1 icon)
6: Vocab is to the right. (agreed)
Now for your questions...
- Must a parameter have a name? Ocean has examples that do not, or are they parameter values?
- I removed "name" from the Atmosphere mindmap and replaced it with Scheme because "name" was simply referring to the name of the scheme.
- What is a common property and why is it needed? Ocean says "mostly common"! as a comment so how does this help?
- I will need to look more closely at the ocean mindmap to answer that, maybe Eric and tell you what he means. However, there are some examples of common properties in the Atmosphere mindmap too. See the properties of the components in AtmosClouds where some of the properties of the different types of cloud components are the same. We may decide that it makes more sense to move these properties to the left and call them the common properties of AtmosClouds.
- Is a complex parameter a required concept?
- I don't understand the what a complex parameter is.
- Do all (XOR,AND,OR) options have to be the same at a particular level or can they be mixed?
- Yes, they have to be the same.
- Can a value contain another value?
- No, we need to flatten our mindmaps so this is not the case.
- If something is not a string or float then what is it - it must be a set of defined values?
- What a marvellously open question.... I don't know, do you have an example?
- Do we want to maintain component/subcomponent/scheme info in the xml?
- Yes
- Presumably we want to keep the "notes" in the xml too?
- Yes, if a user selects "other" as a parameter value I want a text box to appear which uses the note to tell them what further information is required.
- How does the keyboard entry keyword relate to the String and Integer options?
- Keyboard entry is denoted by square brackets. The number 1 icon indicates that a number is required and a pencil icon indicates that a string is required.
- valorisation vs. choices for valorisation? what does this mean? Is that the name of a parameter vs the values it can take?
- Valorisation is French and I think it means valuation in this context. So I reckon "valorisation vs. Choices for valorisation" means "a user enters a number vs. we give the user a list of numbers to choose from".
Hope this helps you Rupert, do shout if you need to know more.
I guess I should wiki-fy this email exchange.
Charlotte
From: Pascoe, CL (Charlotte)
Sent: 28 April 2009 !13:02
Subject: RE: MindMap constraints to produce reasonable xml for use by questionaire
Hi Rupert,
I have just committed my latest version of the atmosphere mindmap Atmosphere.mm to the subversion repository at revision 453.
Each leaf component has a Scheme and a Type (exception for the GW parameterisations which have a type for propagation and a type for dissipation). Scheme and Type are offered to the user as XOR choices.
Each leaf component may also have some properties. For many components the properties are called "properties" (having been known as keywords yesterday) but for other components the properties have names like "PrecipitatingHydrometeors" and "NumberOfChannels". If choices are required then properties are offered to the user as OR choices.
I'll review what this means in terms of your constraints when I've had my lunch :-) Charlotte
From: Rupert Ford [mailto:rupert@manchester.ac.uk]
Sent: 28 April 2009 !12:26
To: Pascoe, CL (Charlotte); Lawrence, BN (Bryan)
Subject: MindMap constraints to produce reasonable xml for use by questionaire
Hi Charlotte and Bryan,
I said I would pester you about the sort of constraints you would like to have in the MindMaps so here I am. Should I be emailing a larger group of people i.e. the people identified in the telco?
Here are some obvious constraints I've thought of (which may be wrong I
suppose) and I've written xsl to check the one we discussed at the telco. Are there any more that you can think of?
Any constraints for MM/xml?
1: All leaf components must have Name and Type parameters - attached xsl checks for this
2: All MM nodes must conform to the specified formats
3: A component may contain another component and/or parameters
4: A complexparameter only contains parameters
5: A parameter only contains controlled values
6: vocab is to the right
I also have a load of questions which relate to how I am going to translate the mindmaps into a cleaner xml structure.
Must a parameter have a name? Ocean has examples that do not, or are they parameter values?
What is a common property and why is it needed? Ocean says "mostly common"! as a comment so how does this help?
Is a complex parameter a required concept?
Do all (XOR,AND,OR) options have to be the same at a particular level or can they be mixed?
Can a value contain another value?
If something is not a string or float then what is it - it must be a set of defined values?
Do we want to maintain component/subcomponent/scheme info in the xml?
Presumably we want to keep the "notes" in the xml too?
How does the keyboard entry keyword relate to the String and Integer options?
valorisation vs. choices for valorisation? what does this mean? Is that the name of a parameter vs the values it can take?
Many thanks for your wisdom and sorry for all the questions.
-- Rupert
