Overview
Tutorials
Tuesday, December 9
Location: Seminaris Campus Hotel (see useful info)
Tutorials in Parallel Tracks 1 & 2
9:00 |
|
10:00 | Coffee break |
10:30 | |
11:30 | Lunch |
13:00 |
|
15:00 | Coffee break |
15:30 |
|
18:00-21:00 | [Conference Center Ground floor] Pre-workshop poster session with the Berlin Semantic Web Meetup. Presenters of posters and demo are invited to expose their posters already on Tuesday night, where SWAT4LS will be open to the Berlin Semantic Web Meetup group to share experiences in an informal setting. Drinks and finger-food will be provided. |
Workshop
Wednesday, December 10, 2014
Location: Seminaris Campus Hotel (see useful info)
Preliminary Program
08:30-09:00 | Registration |
09:00-09:10 | Welcome and introduction |
09:10-09:55 | Keynote: Knowledge Engineering in Radiation Oncology – A real-world application of Semantic Web technology Andre Dekker, MAASTRO Clinic, NL |
09:55-09:35 | session 1 (2 full papers)
|
10:35-11:00 | Coffee break |
11:00-12:05 | session 2 (2 full papers + 2 short/position papers)
|
12:05-13:00 | Lunch |
13:00-13:45 | Keynote: Stefan Decker, Insight Centre for Data Analytics, IE |
13:45-14:50 | session 3 (2 full papers + 2 short/position papers)
|
14:50-15:15 | Coffee break |
15:15-16:00 | Keynote: Data-Knowledge Transition Zones within the Biomedical Research Ecosystem Maryann Martone, UCSD, USA |
16:00-17:05 | session 4 (2 full papers + 2 short/position papers)
|
17:05-17:45 | Industry session: Self-service Semantic Data Federation for the Life Scientist. Chris Baker, IPSNP Computing Inc. Ontoforce Hans, Constandt, Ontoforce |
17:45-19:25 |
Posters and demos session Posters
Demos
|
19:25-19:30 | Closing remarks |
20:00- | Social dinner (registered participants) |
Hackathon
Thursday, December 11
Location: Konrad-Zuse-Zentrum für Informationstechnik Berlin (ZIB), Computer Science Campus, Freie University Berlin
all day |
The SWAT4LS Hackathon is a roundtable bottom-up clustering organization that is an informal way for people to team up and tackle problems or otherwise get advice. See/edit the Google Doc. Programme
A Hackathon Challenge has been provided by Ontoforce: define a linked-data representation of people, so that author names from PubMed affiliations can be linked to a URI. We offer an iPad mini to that participant that is able to link most author names in PubMed to a URI. It is allowed to update on existing sources. |
Confirmed Keynotes
Andre Dekker, MAASTRO clinic, Maastricht, NL
Knowledge Engineering in Radiation Oncology – A real-world application of Semantic Web technology
Since 2005 MAASTRO Clinic (Maastricht, The Netherlands) has a research program called Computer Assisted Theragnostics or CAT. In various CAT projects (euroCAT, duCAT, chinaCAT, ozCAT, ukCAT), an IT infrastructure is developed in which radiation oncology centers are being connected. The aim of CAT is to enable cross-institute data sharing & machine learning and more efficient clinical trials: a concept now commonly referred to as “Rapid Learning”. Semantic Web technology is one of the foundations of our approach. The seminar will describe the need and the challenges for rapid learning, show some results of our work and the challenges we solved and encountered when using Semantic Web technology on a global scale for sharing routine patient data.
Andre Dekker was born in 1974 and studied Physics (MSc) with minors in Business Administration and Biomedical Engineering at Twente University, The Netherlands. During his studies he worked in the radiotherapy department of the Royal Adelaide Hospital, Adelaide, Australia. His master thesis described photo-acoustic detection of small blood vessels. From 1998 to 2002 he was trained as a board-certified medical physicist at the University Hospital of Maastricht, The Netherlands and during this time worked in the pulse-oximetry division of Datex-Ohmeda, Louisville, CO, USA. In 2003 he obtained his PhD in Cardiac Surgery at the University Hospital Maastricht on thermodynamics of the heart during surgical procedures. Since 2003 he works at MAASTRO Clinic, Maastricht, The Netherlands, first as a medical physicist, then as the head of medical physics and currently as the head of knowledge engineering and IT and a member of MAASTRO’s management team. He has worked as a visiting scientist at the Princess Margaret Hospital, University of Toronto, Canada, at the Sacred Heart University of Rome, Italy, the Radiation Therapy Oncology Group (RTOG) in Philadelphia, PA, USA, the University of Sydney, Australia and at The Christie Hospital in Manchester, UK. He has published over 75 peer-reviewed publications and 10 patents for an h-index of 31, has supervised more than 10 PhD students and is currently the (principal) investigator of 8 competitive grant-funded projects. His main research interest is the development of global data sharing infrastructures using Semantic Web technology that are used to machine learn personalized outcome prediction models. These models have their clinical relevance in decision support systems that help patients and clinicians choose the treatment with the best outcome.
Maryann Martone, University of California, San Diego
Data-Knowledge Transition Zones within the Biomedical Research Ecosystem
In the past few years, data and data science has exploded into the academic and public consciousness, as “Big data” and “Data science” have taken hold. But the transformation of scholarship from primarily a paper-based system to one that accommodates the many digital and physical objects produced by researchers has been on going since the advent of the World Wide Web. In this presentation, I will present an overview of the Neuroscience Information Framework and its allied projects (SciCrunch-the generic form of NIF, dkNET and the Monarch Initiative). NIF has been surveying and tracking the biomedical resource landscape, with an emphasis on but not restricted to, neuroscience since 2008. Through a monitoring and tracking system, we see the fluidity of the resource ecosystem, with resources coming and going and “flowing” from one resource or location to the next. NIF/SciCrunch (generic NIF) provides an ideal platform in which to explore the resource ecosystem as it evolves over time, and to take advantage of the connectedness among research tools and data and differing points of view about the same data. An integral part of this ecosystem is not just the data and databases, but the knowledge and knowledge-structures that allow us to compare what we think we know, embodied in community ontologies, human curation and domain expertise, to data on a large scale. The interface between data and knowledge is a critical piece of the biomedical ecosystem, as it allows us to identify uncertainties, potential sources of bias and potential data-knowledge mismatches in the current biomedical data space. Thus, despite the catchy title of the US National Institutes of Health’s BD2K (Big Data to Knowledge), we think it equally important to bring K2BD: Knowledge to Big Data.
Maryann Martone received her BA from Wellesley College in biological psychology and her Ph. D. in neuroscience in 1990 from the University of California, San Diego, where she is currently a Professor-in-Residence in the Department of Neuroscience. She is the-principal investigator of the Neuroinformatics Framework project, a national project to establish a uniform resource description framework for neuroscience, and the dkNET network coordinating center, which utilizes this resource framework in support of metabolic and digestive system research. Her research focuses on building and using ontologies and other frameworks for data integration across distributed, heterogeneous data. She is currently the president FORCE11, the Future of Research Communications and e-Scholarship, a grass-roots community dedicated to transforming scholarly communication.
Stefan Decker, Insight Centre for Data Analytics, Ireland
Life Sciences at the Insight Center for Data Analytics
Industry session
Self-service Semantic Data Federation for the Life Scientist.
State-of-the-art approaches to data integration; data warehousing and workflow scripting are limited in scope, brittle and poorly reusable, and depend on highly trained technical staff for their efficient use. Data federation based on SADI semantic web services is a significant and cost
effective alternative technology offering interoperable access to life science data.
IPSNP’s HYDRA is a SPARQL engine for querying SADI Web Services. HYDRA enables non-technical users to run multiple “self-service” and “ad hoc” queries over networks of web services. Combined with graphical query composition end users can now formulate queries in the terminology of their application domain, without knowing how the underlying data is structured or specific mechanisms of access to the data and thereby perform complex knowledge discovery tasks.
This talk will illustrate a range of complex queries made possible with SADI and HYDRA for online or enterprise scale data integration with examples from the domains of Clinical Intelligence and Bioinformatics.
Presented by IPSNP Computing Inc. IPSNP is an early stage data integration company commercializing a unique query engine that provides fully integrated access to analytical software, online and enterprise databases. IPSNP middleware technology is broadly targeted to the biomedical sector and has found initial application in clinical surveillance for healthcare acquired infections.
Confirmed tutorials
RDF linked data at the European Bioinformatics Institute
The European Bioinfomatics Institute (http://www.ebi.ac.uk) provides freely available data from life science experiments through various databases and services. In a bid to provide alternative access to the data, several groups are now publishing this data as RDF linked data. This tutorial will provide an overview of RDF resources at the EBI and introduce our SPARQL services that can be used to explore either individual or combined resources; the latter achieved by means of federated queries. This years tutorial will introduce several new resources that have been added since last year that expand on the amount on linked life science data available. We encourage participants to think of ideas for the hackathon that utilise EBI RDF data.
This tutorial will cover:
- Overview of RDF services at the EBI
- The EBI linked data publishing platform
- The EBI SPARQL endpoints
- Data integration with EBI RDF
presented by Simon Jupp and James Malone, EMBL-EBI
Ontologising the Health Level Seven (HL7) Standard
The need for semantics preserving integration of complex data has been widely recognized in the Healthcare and Life Sciences (HCLS) domain. While standards (such as HL7, SNOMED, ICD, LOINC, etc.) have been developed in this direction, they have mostly been applied in controlled environments, still being used incoherently across countries, organisations, or hospitals. In a more mobile and global society, patient data and knowledge about medication, treatment, medical history, drug adverse event, etc. are going to be commonly exchanged between various systems at Web scale.
This tutorial (1) briefly introduces key concepts and technologies of using ontologies for the HL7 standard; (2) discusses a concrete use case based ontology building and alignment methodology for the Health Level Seven (HL7) standard; (3) highlights challenges and issues often faced by domain specialists in integrating local or proprietary clinical terminologies with the globally defined universal concepts or terminologies; (4) provides a reality check based on a real life scenario − how far ontologies can tackle interoperability issues arising in the clinical applications, for instance, high level ontologies that are shared by all systems cannot describe all possible subdomains that may be needed by local clinics. Therefore, they must extend common knowledge with domain or application ontologies, as well as define internal policies, rules and vocabularies that are specific to their context. Even globally shared knowledge may be used differently according to the context; and finally (5) describes open research issues that may be of interest to knowledge system developers and broader research community.
Background: The material and discussion presented in this tutorial is based on (1) HL7 ontologies developed in the Plug and Play Electronic Patients Records (PPEPR) project; and (3) HL7 Reference Information Model (RIM) Ontology developed by the HL7 OWL working group.
Prerequisite: The tutorial assumes – but does not require – a basic knowledge of HL7 (Version 2 & Version 3) Messing Framework, OWL, RDF, fundamentals of Description Logic and Rules.
presented by Ratnesh Sahay, Insight Centre for Data Analytics
Presentation Slides, Prerequisite material
In life science (as in other domains) information is spread among
several data sources, such as hospital databases, lab databases,
spreadsheets, etc. Moreover, the complexity of each of these data sources
might make it difficult for end-users to access them, and even
more, to query all of them at the same time.
A new solution that has been proposed to this problem is
ontology-based data access (OBDA).
OBDA is a popular paradigm, developed since the mid 2000s, to query
various types of data sources
using a common vocabulary familiar to the end-users. In a nutshell
OBDA separates the user
from the data sources (relational databases, CVS files, etc.) by means
of an ontology, which is a common terminology that provides the user with a
convenient query vocabulary, hides the structure of the data sources,
and can enrich incomplete data with background knowledge. About a
dozen OBDA systems have been implemented in both academia and
industry.
In this tutorial we will give an overview of OBDA, and our system -ontop-
which is currently being used in the context of the European project
Optique. We will discuss how to use -ontop- for data integration,
in particular concentrating on:
– How to create an ontology (common vocabulary) for a life science domain.
– How to map available data sources to this ontology.
– How to query the database using the terms in the ontology.
– How to check consistency of the data sources w.r.t. the ontology.
presented by Martin Rezk, Free University of Bozen-Bolzano
Natural Language Interfaces to SPARQL Endpoints
Presentation Slides , Hands-On Material
The amount of Linked Open Data (LOD) is increasing rapidly,
particularly in the form of RDF triples stored in SPARQL endpoints.
However, the vast mount of data that are publicly available in theory
are not really available in practice to “the public”, besides Semantic
Web experts, mostly due to the complexity of authoring SPARQL queries.
While there are a few approaches to ease the access to LOD, e.g.,
graphical query interfaces or agent-based systems, natural language
interfaces (NLI) are receiving an increasing interest due to its high
expressive power and low cost for educational purposes. In the
tutorial, after exploring current technology of NLI to LOD, a hands-on
demonstration of actual working systems will be carried out.
presented by Jin-Dong Kim (DBCLS)
Presentation Slides
Rule based languages have been investigated comprehensively in the realms of declarative knowledge representation and expert systems. The basic idea is that users employ rules to express what they want, the responsibility to interpret this and to decide on how to do it is delegated to an interpreter – a rule engine.
Rule markup languages are the vehicle for using rules on the Web and in other distributed rule-based systems. They allow publishing, deploying, executing and communicating rules in a network. They may also play the role of a lingua franca for exchanging rules between different systems and tools. In a narrow sense, a rule markup language is a concrete (XML-based) rule syntax for the Web. In a broader sense, it should have an abstract syntax as a common basis for defining various concrete languages addressing different consumers. The main purposes of a rule markup language are to permit the publication, interchange and reuse of rules.
This tutorial gives an introduction to rule technologies and systems. It will address the basics of rule-based knowledge representation and it will introduce the RuleML language family which comprises standards such as W3C SWRL, W3C RIF, RuleML (Deliberation and Reaction RuleML). Practical tools and rule systems such as Protege SWRL, MYNG and Prova will be demonstrated.
presented by Adrian Paschke and Ralph Schäfermeier (Frei Universitaet Berlin)
RDF validation and transformation using Shape Expressions
Shape Expressions provides an intuitive, high-level representation of an RDF graph structure. Based on patterns from regular expressions and relaxNG Compact Syntax, ShExC is like a BNF for RDF data.
Uses include:
- Validating a specified node or nodes with respect to a ShEx schema.
- Finding conformant nodes in a dataset.
- Matching data against a set of schema.
This tutorial will cover use of existing ShEx implementations for:
- Verifying an instance document in a clinical or scientific domain.
- Isolating well-structured data over which analytical queries can be safely executed.
- Transforming clinical data between different representations.
Feel free to try out a ShEx demo to see if this fits your needs.
Slides here: http://www.w3.org/2014/Talks/1209-shex-egp/
presented by Eric Prud’hommeaux (World Wide Web Consortium)
Full Papers
- Yasar Khan, Muhammad Saleem, Aftab Iqbal, Muntazir Mehdi, Aidan Hogan, Axel-Cyrille Ngonga Ngomo, Stefan Decker and Ratnesh Sahay. SAFE: Policy Aware SPARQL Query Federation Over RDF Data Cubes
- Soumia Melzi and Clement Jonquet. Scoring semantic annotations returned by the NCBO Annotator
- Angel Esteban-Gil, Jesualdo Tomas Fernandez-Breis and Martin Boeker. Analysis and visualization of disease courses in a semantic enabled cancer registry
- Yasunori Yamamoto. A SPARQL Endpoint Profiler for an Efficient Question Answering System
- Guoqian Jiang, Deepak Sharma, Harold Solbrig, Cui Tao, Chunhua Weng and Christopher Chute. Building A Semantic Web-based Metadata Repository for Facilitating Detailed Clinical Modeling in Cancer Genome Studies
- Efthymios Chondrogiannis, Vassiliki Andronikou, Efstathios Karanastasis and Theodora Varvarigou. An Intelligent Ontology Alignment Tool Dealing with Complicated Mismatches
- Vassilis Koutkias and Marie-Christine Jaulent. Leveraging Post-marketing Drug Safety Research through Semantic Technologies: The PharmacoVigilance Signal Detectors Ontology
- Daniel Schober, Remy Choquet, Kristof Depraetere, Frank Enders, Philipp Daumke, Marie-Christine Jaulent, Douglas Teodoro, Emilie Pasche, Christian Lovis and Martin Boeker. DebugIT: Ontology-mediated layered Data Integration for real-time Antibiotics Resistance Surveillance
Position Papers
- Jim Zheng, Jingcheng Du, Jijun Tang and Cui Tao. GSO: A Semantic Web Ontology to Capture 3D Genome Structure
- Syed Ahmad Chan Bukhari, Michael Krauthammer and Christopher Baker. SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability Based on Semantic Enrichment
Short Papers
- Cui Tao, Nnaemeka Okafor, Amit Mehta, Charles L. Maddow, David Robinson, Brent King, Jiajie Zhang and Amy Franklin. A Work Domain Ontology for Modeling Emergency Department Workflow
- Pablo López-García, Stefan Schulz and Roman Kern. Automatic Summarization for Terminology Recommendation: the case of the NCBO Ontology Recommender
- Katerina Gkirtzou, Thanasis Vergoulis, Artemis Hatzigeorgiou, Timos Sellis, and Theodore Dalamagas. Publishing Diachronic Life Science Linked Data
- Andre Freitas, Kartik Asooja, Joao Jares, Stefan Decker and Ratnesh Sahay. A Semantic Web Platform for Improving the Automation and Reproducibility of Finite Element Bio-simulations
Accepted Posters and Demos
Posters
- Marco Roos, Alasdair J G Gray, Andra Waagmeester, Mark Thompson, Rajaram Kaliyaperumal, Eelke van der Horst, Barend Mons and Mark Wilkinson. Bring Your Own Data workshops: a mechanism to aid data owners to comply with Linked Data best practices
- Rajaram Kaliyaperumal, Peter Bram ‘T Hoen, Mark Thompson, Eelke van der Horst and Marco Roos. Genome Annotation using Nanopublications: An Approach to Interoperability of Genetic Data
- Norio Kobayashi, Kai Lenz, Hongyan Wu, Kouji Kozaki and Atsuko Yamaguchi. Prototype implementation of SPARQL Builder for Life-science Databases by intelligent schema analysis on RDF datasets
- Martin Scharm, Florian Wendland, Martin Peters, Markus Wolfien, Tom Theile and Dagmar Waltemath. The CombineArchive Toolkit – facilitating the transfer of research results.
- Christian Rosenke and Dagmar Waltemath. How can semantic annotations support the identification of network similarities?
- Nihad Omerbegovic and Nedim Omerbegovic. Effects of two-way data binding on better user experience and easier development of Clinical information systems
- Robert Hoehndorf, Luke Slater, Paul N. Schofield and Georgios Gkoutos: Aber-OWL: a framework for ontology-based data access in biology
- Olga Krebs, Katy Wolstencroft, Natalie Stanford, Martin Golebiewski, Stuart Owen, Quyen Nguyen, Peter Kunszt, Bernd Rinn, Jacky Snoep, Wolfgang Mueller and Carole Goble. FAIRDOM approach for semantic interoperability of systems biology data and models
- Soumia Melzi and Clement Jonquet. Representing NCBO Annotator results in standard RDF with the Annotation Ontology
Demos
- Bernardo Cuenca Grau, Evgeny Kharlamov, Sarunas Marciuska, Dmitriy Zheleznyakov and Yujiao Zhou. Querying Life Science Ontologies with SemFacet
- Paolo Ciccarese and Tim Clark. Annotopia: An Open Source Universal Annotation Server for Biomedical Research
- Timo Böhme, Matthias Irmer, Anett Püschel, Claudia Bobach, Ulf Laube and Lutz Weber. OCMiner: Text Processing, Annotation and Relation Extraction for the Life Sciences
- Martynas Jusevičius. Graphity: generic processor for declarative Linked Data applications
- Emily Merrill, Shannan Ho Sui, Stéphane Corlosquet, Tim Clark and Sudeshna Das. Using eXframe to build Semantic Web Genomics Repositories
- Ron Henkel and Dagmar Waltemath. MaSyMoS: Finding hidden treasures in model repositories
- Toshiaki Katayama. D3SPARQL: JavaScript library for visualization of SPARQL results
- Andreas Schwarte, Hanka Venselaar, Peter Haase and Gert Vriend. The NewProt Self-Service Portal for Protein Engineering
- Martin Scharm, Florian Wendland, Martin Peters, Markus Wolfien, Tom Theile and Dagmar Waltemath. The CombineArchive Toolkit – facilitating the transfer of research results.