Main Menu

Nice Social Bookmark

FacebookMySpaceTwitterDiggDeliciousGoogle BookmarksLinkedinRSS Feed
PDF Print E-mail

Conclusions of the Discussion Panel

 

The discussion during the panel session focused on 3 key questions:

 

Questions 1 - How to engage community at large into common ontologies and vocabularies development and use?

 

Before generating ontological components, we first have to encourage scientists to invest in open source data and to share their own data. We must therefore pay sufficient attention to prevent data owners from feeling deprived. We have to consider how to enhance the sharing: data set licensing and/or data set publication? It is also useful to promote the value-added of such sharing.

 

The group indicated that it is not appropriate to develop a large-scale ontology across domains of Biodiversity, ecology, agriculture. We can’t replicate the examples of AGROVOC, Gene Ontology, SNOMED, as the community involvement needed would be excessive, the cost of maintenance would be too high and practical interest cannot be guaranteed. Gene ontology and SNOMED are model ontologies that concentrate lot of knowledge but cost a lot of money to maintain and require regular community workshops.  It is more efficient to well define a research question or a specific use before developing an ontology that will be focused and usable. In fact, our communities hold already a series of applied or reference ontologies, annotated data and the semantic use of these elements is still to be demonstrated through key use cases.

 

 

Questions 2 - Shall we aim at developing a thematic and multilingual ontology repository for biodiversity, agriculture, environment on the model of Bioportal ?

Biodiversity Information Standards: http://bis.bioportal.bioontology.org/ontologies

 

 

The question was posed on what would be the value added of a customized thematic repository within the Bioportal, which already exists aside a lot of other repositories. Accessing only thematic slices in the Bioportal may hide to user the full range of the available ontologies.

 

Semantic Web does not need all data in one place but need much more cool URIs that resolve, discovering information when needed.  Getting all our RDF data documented in a single place could be a challenge to start with for the interest group. The SPARQL endpoints could also be registered in the Bioportal.

 

Facilitating the sharing of data in Biodiversity will help accelerating the discovery, access, and integration of data sets, and it seems more appropriate to propose an organization of these data sets through metadata schemes and knowledge models of smaller size that will provide a solution suitable for modeling realistic problems. In this context, we need to emphasize the objectives to be served by the ontological components that must be developed and to select use-cases relevant to biodiversity.

 

Question 3: How to support a community of practice in biodiversity in the promotion and adherence to standards and/or best-practices for automating the use of Semantic Web? E.g. Linked Data/Cool URIs

The challenge is to unlock the data and map them to the ontologies, keeping them updated. It was noted that, for example, plant data are not yet published as Linked Open Data because it involves formatting, community engagement.
One solution that was suggested is to publish data as SPARQL endpoints that the community will use and annotate.  SPARQL endpoints necessitate to publish the data and knowledge model and to define on demand data mediation services.

Mention was made of the SemaGrow project that aims at developing an infrastructure to support LOD should be represented in the interest group. It was suggested to organize a biohackathon to push data in LOD and have more ‘yummy data’ consumable by the semantic web. (http://yummydata.org/list.php )

 

Pooling requires a lot of discipline. The data sets will be organized using the knowledge models and metadata schemas. It is also necessary to support the quality and provenance of the datasets. Metadata standards can contribute significantly to the management of quality indicators. It is also desirable to consider environments enabling the peer reviewing, as it can be done in the world of publishing. Newspapers such as Semantic Web log or BMC Bioinformatics already offer facilities for publication and revision of data sets.

 

Conclusions have emerged on the steps forward:

 

A.    Develop medium size ontologies fitting for use with multilingual versions - It is important to provide ontologies, controlled vocabularies and metadata standards that will assist in the organization of the data acquired by the community.  It seems more appropriate for practical use to propose an organization of the data sets through metadata schemes and knowledge models of smaller size that will provide a solution suitable for modeling realistic problems.

B.    Open source data promotion to encourage scientists to invest in open source data and to share their own data

C.    Creation an interest group within TDWG – The group will be presented at the next TDWG. The workshop participants will be invited to join the group aside experts from TDWG and GBIF to participate. Every other interested scientist in is welcome as well. This group will engage in the promotion of open data standards and will discuss the best solutions to propose for the various domains of Biodiversity.

D.    Development of tools for the group communication and exchange that will be  easy to implement, like a wiki page on the TDWG site, and simple to use, such as mailing lists, dedicated wiki. The main interest essentially focuses on the dynamic set up of the interest group.

E.    Definition of a research question by the group members that could be tested as a use case on multiple platforms (SSWAP, SADI, FAO, etc.), using existing ontologies/controlled vocabularies and annotated data.

F.     Promote URIs that resolve.

G.    Engage the group to publish the availability of its data and metadata in RDF in one place

H.    Stimulate peer reviewing of published annotated data to support solving integration problems

I.      Promote the use of metadata to support the quality and document the provenance of the datasets

 

It was noted the results of our discussion panel will be shared with the group that held the ‘Semantics of Biodiversity Workshop’ May 16 - 18, 2012  - University of Kansas Biodiversity Institute  -

 
logo S4BIODIV2013 logo ird logo cirad logo UM2 logo bioversity logo IBC logo GDR BIM