Introducing Linked Data as a Service
Guest post by Peter Haase
Linked Data has become a powerful technology for businesses across virtually every industry. With a high degree of interoperability and ease of reuse, Linked Data makes it possible to semantically interlink and connect resources at data level no matter what the structure of the data is, who created it, or where it came from.
Linked Data has become a prominent choice for publishing data on the web with over 200 Linked Open Data sets currently available to the public. The range of information covered in these sets includes media, geography data, publication information, open government data, life-science ontologies, and numerous cross-domain data resources.
Linked Data principles are also becoming increasingly popular in the enterprise world. Businesses are representing their internal data as Linked Enterprise Data, allowing them to semantically integrate and interlink data scattered among different information systems, thus breaking down isolated data silos. This helps generate the most useful--not to mention, valuable--knowledge out of enterprise data.
The fusion of Linked Open Data and Linked Enterprise Data paves the way for modern and innovative data management with such features as intelligent data aggregation; discovery and integration; simplified publishing and sharing of data; enrichment and contextualization of data; and improved user experience for search, presentation, and visualization. As more institutions, companies and governments continue to open up their data, the possibilities derived from combining Linked Open Data and Linked Enterprise Data will only continue to grow.
A core enabler for the automated development of Linked Open Data applications is the concept of Data-as-a-Service (DaaS) for virtualized data access. Following the paradigm of other XaaS (Everything as a Service) concepts where the product or the service is provided in a self-service fashion, DaaS is based on the idea that data can be provided on-demand to the user regardless of geographic or organizational separation of provider and consumer. With the right DaaS solution, a company could combine Linked Open Data from around the world with their own data and use the resulting knowledge to improve their business.
Yet, the development of specific Linked Open Data as a Service applications remains a time-consuming and costly task. Three distinct challenges stand in the way:
On the data integration and management side, developers are faced with a variety of new data formats and query languages (such as RDF, OWL, and SPARQL). Developers struggle with a lack of heterogeneity between different data sets (i.e. facing Linked Data available via HTTP lookups, RDF dumps, and SPARQL endpoints). Solving these problems can require various new database systems and tools to store, process, and access disparate data.
Developers need tools, which allow for the fast integration of company-internal data with external data sources regardless of origin, format, or author. With the right tool, companies could potentially link their internal data with Linked Open Data data sources, legacy data sources, internal systems, as well as data sources on the Web (including social media such as Twitter, Facebook, and YouTube).
After integration, Linked Data applications require new data interaction paradigms to deal with the specific challenges--and opportunities--of the underlying data formats, such as schema flexibility and data semantics. Leveraging the benefits of Linked Data requires the dynamic discovery of available data sources, seamless integration of Linked Data from multiple sources, provenance, and information quality assessment. The Data as a Service paradigm must be followed here. That is, the users should be able to discover, integrate, and consume available heterogeneous data sources ad-hoc and on-demand.
- The importance of end-user interfaces that implement generic visualization, exploration, and interaction paradigms for Linked Data cannot be overlooked. Users should be provided with widgets for visualization and analytic capabilities which allow them to dynamically define charts and reports based on individual queries. An ideal user interface should also allow users to sort and filter data sets and to visualize results in various formats. This gives users the freedom not only to view the data they are interested in, but also to visualize and explore it according to their personal needs.
A tool offering Linked Open Data as a Service could have uses in industries ranging from health care to personal shopping. Consider, for example, a web application for organizing your music library. With the power of Linked Data as a Service, a user could extract metadata from his media files and enrich that data with other relevant information from sources such as DBpedia, Flickr, Last.fm, BBC Music, YouTube, Facebook, and Twitter. The user could then perform searches, do analysis, and generate reports. For example, when searching a music database for a specific song, the results page could include links to the corresponding music video, other songs by the same artist, a cover picture, upcoming concerts the artist will be playing at, the artist's Facebook and Twitter feeds, and more.
Peter Haase is a lead architect at fluid Operations, where he is leading the research and development activities at the interface of semantic technologies and cloud computing. Haase will be giving two presentations at SemTechBiz Conference in San Francisco (June 3-7), one discussing the potential of Linked Enterprise Data in particular when combined with Linked Open Data (Linked Data as a Service with the Information Workbench), and Dynamic Semantic Publishing Empowering the BBC Sports Site and the 2012 Olympics, with Haase; Jem Rayfield, Senior Technical Architect at the BBC; and Borislav Popov, Head of Semantic Annotation and Search Group at Ontotext Lab.