Changes between Version 30 and Version 31 of tickets/78


Ignore:
Timestamp:
30/09/09 14:15:58 (4 years ago)
Author:
allyn
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • tickets/78

    v30 v31  
    6969Other users may benefit from a more structured query.   In order to support anything more complicated than an unrestricted search, a subset of elements or attributes within the CIM must be identified as “facets,” or dimensions to search on.  ''Advanced search'' uses a set of pre-constrained queries into which a user can plug search terms into.  For example, “Find me all simulations which use a forcing condition with double CO2.”  In that example, the fact that the query should search the forcing conditions of all simulations for the text “double CO2” is built into the query.  It is obvious what the user expects the query to return: simulation documents.  An advanced query could in principle be arbitrarily complex.  But, in practise, searching more than a few dimensions at once is likely to be confusing for anybody other than “hard-core” users who will be using the query tool on a regular basis and therefore willing to invest more time in understanding a complicated interface.  It would be useful to use advanced search to process the sorts of common searches that are made by regular users. 
    7070 
    71 ''Faceted search'' allows users to browse the full breadth of CIM instances.  It successively narrows down the search space along multiple dimensions.  At any stage a facet can be removed from the search and the previous set of query results will be active.  This is different from advanced search where a single (albeit a potentially very complex) query is performed all at once.   
     71''Faceted search'' allows users to browse the full breadth of CIM instances.  It successively narrows down the search space along multiple dimensions.  At any stage a facet can be removed from the search and the previous set of query results will be active.  This is different from advanced search where a single (albeit a potentially very complex) query is performed all at once. 
    7272 
    7373Faceted search is most useful when there are three or more dimensions of a classification.  Otherwise a simpler hierarchical or tree classification system, where each new group is a sub-type of its parent, is preferred.  Using facets also has the advantage that users are not required to have complete knowledge of the entities being classified nor their relationships.  This is precisely one of the issues that METAFOR aims to address.  However, the CIM may not map well onto faceted search because the dimensions that users will want to search on (based on the use-cases above) are not necessarily restricted to a “closed” CV^[http://metaforclimate.eu/#sdfootnote4sym 4]^.  Thus, the facets are hard to define.  Not only are the facet values not well-understood, but there is an overwhelming set of elements and attributes in the CIM to consider searching as facets.  Identifying the most effective subset to support is difficult.  This makes focusing early effort on faceted search risky. 
     
    115115 7. a “viewer” for secondary results 
    116116 
    117 This may be something as simple as a stylesheet applied to incoming XML representations of resutls.  Again, the results should be able to be fetched via a REST API.  For example, facets and facetValues can exist as parameters embedded within a URL. 
     117This may be something as simple as a stylesheet applied to incoming XML representations of resutls.  Again, the results should be able to be fetched via a REST API.  For example, facets and facetValues can exist as parameters embedded within a URL. 
    118118 
    119119Regarding the list of primary results, it is unlikely to come from the same storage artifact as the CIM instances themselves so some technique of identifying, storing, and accessing the subset of CIM documents which need to be displayed in the primary viewer is needed: 
     
    133133![TODO: ANOTHER DIAGRAM, THIS ONE WITH TECHNOLOGY CHOICES LABELED] 
    134134 
    135 The CIM is already very XML-centric.  And the emerging code from other WPs are already very Python-centric.  This will influence my choice of technologies in the first instance, since I am interested in rapid prototyping.  I will be assuming that the portal is built on Pylons. 
     135The CIM is already very XML-centric.  And the emerging code from other WPs are already very Python-centric.  This will influence my choice of technologies in the first instance, since I am interested in rapid prototyping.  I will be assuming that the portal is built on Pylons.  I will also assume a javascript front-end for the query interface. 
    136136 
    137  1. Database of CIM instances - a native XML database, eXist [http://www.exist-db.org/] will be used 
     137 1. Database of CIM instances - a native XML database, eXist [http://www.exist-db.org/] will be used; queries into eXist can use JQuery [http://jquery.com/] javascript library. 
    138138 1. unrestricted search interface - 
    139139 1. advanced search interface; this involves creating one or more         pre-constrained queries - This can be a webform which builds a query.  In the short-term, I am only concerned with searching on a single dimension and so a simplistic GUI will do. 
    140  1. faceted search interface - TBD.  Previous discussions have suggested that using SPARQL queries against an RDF implementation of the CIM is a good way to do faceted search.  Indeed, this is the approach that Curator took. ![TODO: PROS & CONS OF RDF/OWL/SPARQL]  
     140 1. faceted search interface - TBD.  Previous discussions have suggested that using SPARQL queries against an RDF implementation of the CIM is a good way to do faceted search.  Indeed, this is the approach that Curator took. ![TODO: PROS & CONS OF RDF/OWL/SPARQL] 
    141141 1. identify the facets; store them (may be a separate store from the         aforementioned instance database) - initially elements and attributes within the CIM could be identified as <<facets>>, this information could then be extracted from the CIM to whatever format is appropriate as a storage medium. 
    142142 1. a “viewer” for primary results - XSLT 
     
    154154Controlled Vocabulary Server - look at what BODC has done. 
    155155 
    156 Faceted Search using OWL/RDF requires, obviously, converting from UML/XSD.  There are existing tools (such as GRDDL [http://www.w3.org/TR/grddl-primer/]) which generates RDF from HTML/XML (but not UML). 
     156Faceted Search using OWL/RDF requires, obviously, converting from UML/XSD.  There are existing tools (such as GRDDL [http://www.w3.org/TR/grddl-primer/]) which generates RDF from HTML/XML (but not UML). 
    157157 
    158 Faceted Search using an XML-specific technology such as XFML?  
     158Faceted Search using an XML-specific technology such as XFML? 
    159159 
    160160Faceted Search via standard RDMS?