Springe zum Hauptinhalt
Professur Datenmanagement
Publikationen

Publikationen

  1. Base Platform for Knowledge Graphs with Free Software by Simon Bin, Claus Stadler, Norman Radtke, Kurt Junghanns, Sabine Gründer-Fahrer und Michael Martin in Proceedings of the International Workshop on Linked Data-driven Resilience Research 2023 (Editors: Sebastian Tramp, Ricardo Usbeck, Natanael Arndt, Julia Holze und Sören Auer) [Bibsonomy of Base Platform for Knowledge Graphs with Free Software]
    Abstract We present an Open Source base platform for the CoyPu knowledge graph project in the resilience domain. We report on our experiences with several tools which are used to create, maintain, serve, view and explore a modular large-scale knowledge graph, as well as the adaptions that were necessary to enable frictionless interaction from both performance and usability perspectives. For this purpose, several adjustments had to be made. We provide a broad view of different programs which are of relevance to this domain. We demonstrate that while it is already possible to achieve good results with free software, there are still several pain points that need to be addressed. Resolution of these issues is often not only a matter of configuration but requires modification of the source code as well.
  2. Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering by Lars-Peter Meyer, Johannes Frey, Kurt Junghanns, Felix Brei, Kirill Bulert, Sabine Gründer-Fahrer und Michael Martin in Proceedings of Poster Track of Semantics 2023 [Bibsonomy of Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering]
    Abstract As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.
  3. LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT by Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Junghanns, Roy Meissner, Gordian Dziwis, Kirill Bulert und Michael Martin [Bibsonomy of LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT]
    Abstract Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.
  4. Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark by Claus Stadler, Lorenz Bühmann, Lars-Peter Meyer und Michael Martin in 4th International Workshop on Knowledge Graph Construction @ ESWC 2023 [Bibsonomy of Scaling RML and SPARQL-based Knowledge Graph Construction with Apache Spark]
    Abstract Approaches for the construction of knowledge graphs from heterogeneous data sources range from ad-hoc scripts to dedicated mapping languages. Two common foundations are thereby RML and SPARQL. So far, both approaches are treated as different: On the one hand there are tools specifically for processing RML whereas on the other hand there are tools that extend SPARQL in order to incorporate additional data sources. In this work, we first show how this gap can be bridged by translating RML to a sequence of SPARQL CONSTRUCT queries and introduce the necessary SPARQL extensions. In a subsequent step, we employ techniques to optimize SPARQL query workloads as well as individual query execution times in order to obtain an optimized sequence of queries with respect to the order and uniqueness of the generated triples. Finally, we present a corresponding SPARQL query execution engine based on the Apache Spark Big Data framework. In our evaluation on benchmarks we show that our approach is capable of achieving RML mapping execution performance that surpasses the current state of the art.
  5. Ontoflow: A User-Friendly Ontology Development Workflow by Gordian Dziwis, Lisa Wenige, Lars-Peter Meyer und Michael Martin in Proceedings of International Workshop on Semantic Industrial Information Modelling (SemIIM) @ ESWC22 [Bibsonomy of Ontoflow: A User-Friendly Ontology Development Workflow]
  6. Semantification of Geospatial Information for Enriched Knowledge Representation in Context of Crisis Informatics by Claus Stadler, Simon Bin, Lorenz Bühmann, Norman Radtke, Kurt Junghanns, Sabine Gründer-Fahrer und Michael Martin in Proceedings of the International Workshop on Data-driven Resilience Research 2022 (Editors: Natanael Arndt, Sabine Gründer-Fahrer, Julia Holze, Michael Martin und Sebastian Tramp) [Bibsonomy of Semantification of Geospatial Information for Enriched Knowledge Representation in Context of Crisis Informatics]
    Abstract In the context of crisis informatics, the integration and exploitation of high volumes of heterogeneous data from multiple sources is one of the big chances as well as challenges up to now. Semantic Web technologies have proven a valuable means to integrate and represent knowledge on the basis of domain concepts which improves interoperability and interpretability of information resources and allows deriving more knowledge via semantic relations and reasoning. In this paper, we investigate the potential of representing and processing geospatial information within the semantic paradigm. We show, on the technical level, how existing open source means can be used and supplemented as to efficiently handle geographic information and to convey exemplary results highly relevant in context of crisis management applications. When given semantic resources get enriched with geospatial information, new information can be retrieved combining the concepts of multi-polygons and geo-coordinates and using the power of GeoSPARQL queries. Custom SPARQL extension functions and data types for JSON, XML and CSV as well as for dialects such as GeoJSON and GML allow for succinct integration of heterogeneous data. We implemented these features for the Apache Jena Semantic Web framework by leveraging its plugin systems. Furthermore, significant improvements w.r.t. GeoSPARQL query performance have been contributed to the framework.
  7. Automatic Subject Indexing with Knowledge Graphs by Lisa Wenige, Claus Stadler, Simon Bin, Lorenz Bühmann, Kurt Junghanns und Michael Martin in LASCAR Workshop at the Extended Semantic Web Conference (ESWC) [Bibsonomy of Automatic Subject Indexing with Knowledge Graphs]
  8. Decentralized Collaborative Knowledge Management using Git (Extended Abstract) by Natanael Arndt und Michael Martin in Companion Proceedings of the 2019 World Wide Web Conference (WWW '19 Companion) [Bibsonomy of Decentralized Collaborative Knowledge Management using Git (Extended Abstract)]
  9. ONTOLOGY-DRIVEN SERVICE INTEGRATION INTO WEB APPLICATIONS: A DECLARATIVE APPROACH by Andreas Both; Didier Cherix; Michael Martin in IADIS International Conference WWW/Internet 2019 (Editors: Pedro Isaías) [Bibsonomy of ONTOLOGY-DRIVEN SERVICE INTEGRATION INTO WEB APPLICATIONS: A DECLARATIVE APPROACH]
    Abstract The majority of web applications nowadays are data-driven. However, that does not mean that all data is available while launching the respective application. While considering Web 2.0 applications, data is often fetched on-demand from remote web services, for example, after a location was provided, weather data could be fetched, and local news is loaded. This mashup approach is highly dynamical, i.e., based on the data input of the user, completely different execution paths might be performed. Currently, such workflows are implemented within the application logic requiring high development effort and maintenance of the implemented logic to prevent unintentional behavior. In this paper, we present a novel approach to integrate web services dynamically, to decrease deployment and maintenance costs and to enable the next generation of interlinked data web applications to enables application architects to (re)define the data integration in a descriptive way in an ontology, validate the workflows and define logical requirements. However, our approach is not just a design method but also a method for ad hoc integration of new services. Our approach has a significant impact on the effort for generating and maintaining dynamic applications.
  10. RDF-based Deployment Pipelining for Efficient Dataset Release Management by Claus Stadler, Lisa Wenige, Sebastian Tramp, Kurt Junghanns und Michael Martin in Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTICS'19) [Bibsonomy of RDF-based Deployment Pipelining for Efficient Dataset Release Management]
    Abstract Open Data portals often struggle to provide release features (i.e., stable versioning, up-to-date download links, rich metadata descritions) for their datasets. By this means, wide adoption of publicly available datasets is hindered, since consuming applications cannot access fresh data sources or might break due to data quality issues. While there exists a variety of tools to efficiently control release processes in software development, the management of dataset releases is not as clear. This paper proposes a deployment pipeline for efficient dataset releases that is based on automated enrichment of DCAT/DATAID metadata and is a first step towards efficient deployment pipelining for Open Data publishing.
  11. Smarte Daten im Knowledge Graph, die Grundlage einer zukunftssicheren Bereitstellung Offener Daten by Richard Figura, Alexander Willner und Michael Martin [Bibsonomy of Smarte Daten im Knowledge Graph, die Grundlage einer zukunftssicheren Bereitstellung Offener Daten]
    Abstract Offene Daten sind einer der wichtigsten Rohstoffe der digitalen Welt, mit wachsender wirtschaftlicher und gesellschaftlicher Bedeutung. Trotz zahlreicher Bemühungen konnten prognostizierte Mehrwerte noch nicht erreicht werden, was unter anderem auf eine unvollständige Vernetzung der Daten zurückzuführen ist. In diesem Vortrag werden Technologien und Prozesse vorgestellt, um Daten zu einem öffentlichen verfügbaren Knowledge Graph hinzuzufügen und dort mit Daten anderer Quellen zu verknüpfen.
  12. A Decentralized and Remote Controlled Webinar Approach, Utilizing Client-side Capabilities: To Increase Participant Limits and Reduce Operating Costs by Roy Meissner, Kurt Junghanns und Michael Martin in Proceedings of the 14th International Conference on Web Information Systems and Technologies - Volume 1: WEBIST [Bibsonomy of A Decentralized and Remote Controlled Webinar Approach, Utilizing Client-side Capabilities: To Increase Participant Limits and Reduce Operating Costs]
    Abstract We present a concept and implementation on increasing the efficiency of webinar software by a remote control approach using the technology WebRTC. This technology enables strong security and privacy, is cross-device usable, uses open-source technology and enables a new level of interactiveness to webinars. We used SlideWiki, WebRTC, and browser speech to text engines to provide innovative accessibility features like multilingual presentations and live subtitles. Our solution was rated for real world usage aspects, tested within the SlideWiki project and we determined technological limits. Such measurements are currently not available and show that our approach outperforms open-source market competitors by efficiency and costs.
  13. Applying Linked Data Paradigms for Reagional Weather Data Reanalysis by Richard Figura, Alexander Willner und Michael Martin (Editors: International Symposium on Regional Reanalysis (ISSR) 2018) [Bibsonomy of Applying Linked Data Paradigms for Reagional Weather Data Reanalysis]
    Abstract Data is the new oil, this quote ascribed to Clive Humby most clearly describes the increasing impact of information on our society and economy. More and more data sets from various sources are published and used for different kinds of applications. Atmospheric reanalysis represents one of the richest and most valuable data sets for the open source community. However, transforming it into valuable information and linking it to other data sets is a challenge, especially for users from non-meteorological domains. In this presentation, we discuss the advantages of applying Linked (Open) Data principles to meteorological data in order to improve data acquisition for regional reanalysis (COSMO-REA2). By converting a COSMO-REA2 subset and linking it to further converted linked data, we illustrate how to gain much more knowledge using this approach. Different demonstrated scenarios, such as infrastructure planning for wind farming or transportation underline the advantage of this approach. Based on that, we argue that data in general and meteorological data in particular should be accessible by following the Linked Data paradigms.
  14. Decentralized Collaborative Knowledge Management using Git by Natanael Arndt, Patrick Naumann, Norman Radtke, Michael Martin und Edgard Marx in Journal of Web Semantics [Bibsonomy of Decentralized Collaborative Knowledge Management using Git]
    Abstract The World Wide Web and the Semantic Web are designed as a network of distributed services and datasets. The distributed character of the Web brings manifold collaborative possibilities to interchange data. The commonly adopted collaborative solutions for RDF data are centralized (e.g. SPARQL endpoints and wiki systems). But to support distributed collaboration, a system is needed, that supports divergence of datasets, brings the possibility to conflate diverged states, and allows distributed datasets to be synchronized. In this paper, we present Quit Store, it was inspired by and it builds upon the successful Git system. The approach is based on a formal expression of evolution and consolidation of distributed datasets. During the collaborative curation process, the system automatically versions the RDF dataset and tracks provenance information. It also provides support to branch, merge, and synchronize distributed RDF datasets. The merging process is guarded by specific merge strategies for RDF data. Finally, we use our reference implementation to show overall good performance and demonstrate the practical usability of the system.
  15. CubeViz.js: A Lightweight Framework for Discovering and Visualizing RDF Data Cubes by Konrad Abicht, Georges Alkhouri, Natanael Arndt, Roy Meissner und Michael Martin in INFORMATIK 2017, Lecture Notes in Informatics (LNI) (Editors: Maximilian Eibl und Martin Gaedke) [Bibsonomy of CubeViz.js: A Lightweight Framework for Discovering and Visualizing RDF Data Cubes]
    Abstract In this paper we present CubeViz.js, the successor of CubeViz, as an approach for lightweight visualization and exploration of statistical data using the RDF Data Cube vocabulary. In several use cases, such as the European Unions Open Data Portal, in which we deployed CubeViz, we were able to gather various requirements that eventually led to the decision of reimplementing CubeViz as JavaScript-only application. As part of this paper we showcase major functionalities of CubeViz.js and its improvements in comparison to the prior version.
  16. Decentralized Evolution and Consolidation of RDF Graphs by Natanael Arndt und Michael Martin in 17th International Conference on Web Engineering (ICWE 2017) [Bibsonomy of Decentralized Evolution and Consolidation of RDF Graphs]
    Abstract The World Wide Web and the Semantic Web are designed as a network of distributed services and datasets. In this network and its genesis, collaboration played and still plays a crucial role. But currently we only have central collaboration solutions for RDF data, such as SPARQL endpoints and wiki systems, while decentralized solutions can enable applications for many more use-cases. Inspired by a successful distributed source code management methodology in software engineering a framework to support distributed evolution is proposed. The system is based on Git and provides distributed collaboration on RDF graphs. This paper covers the formal expression of the evolution and consolidation of distributed datasets, the synchronization, as well as other supporting operations.
  17. Discover Barrier-free Accessible Locations with the Location Navigator by Konrad Abicht, Simeon Ackermann und Michael Martin in INFORMATIK 2017, Lecture Notes in Informatics (LNI) (Editors: Maximilian Eibl und Martin Gaedke) [Bibsonomy of Discover Barrier-free Accessible Locations with the Location Navigator]
    Abstract We present the current version of the Location Navigator, which supports users by finding locations in Leipzig, that can be accessed without barriers. Besides this current version of the prototype we present additionally experiences regarding its engineering process and the previously performed conversion of Open Data provided by the registered association Behindertenverband Leipzig e.V. (BVL). Our vision of the underlying data is an inter-commune data network, in order to support persons with special needs and, furthermore, to apply developments such as the Location Navigator to other municipalities. For this purpose, RDF will be used for the representation and linking of data in the future. Besides the presentation of the Location Navigator, we sketch some approaches we evaluated during the creation of the respective data model.
  18. SPARQL Update queries over R2RML mapped data sources by Joerg Unbehauen und Michael Martin in INFORMATIK 2017, Lecture Notes in Informatics (LNI) (Editors: Maximilian Eibl und Martin Gaedke) [Bibsonomy of SPARQL Update queries over R2RML mapped data sources]
    Abstract In the Linked Data Life Cycle mapping and extracting data from structured sources is an essential step in building a knowledge graph. In existing data life cycles this process is unidirectional, i.e. the data is extracted from the source but changes like cleaning and linking are not fed back into the originating system. SPARQL-to-SQL rewriters create virtual RDF without materializing data by exposing SPARQL endpoints. With the Update extension of our SparqlMap system we provide read/write access to structured data sources to enable a tighter integration of the source systems in knowledge refinement process. in this paper, we discuss three different update methods and further describe in two scenarios how the source system can benefit from feed back from the Linked Data integration.
  19. Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store by Natanael Arndt, Norman Radtke und Michael Martin in 12th International Conference on Semantic Systems Proceedings (SEMANTiCS 2016) [Bibsonomy of Distributed Collaboration on RDF Datasets Using Git: Towards the Quit Store]
    Abstract Collaboration is one of the most important topics regarding the evolution of the World Wide Web and thus also for the Web of Data. In scenarios of distributed collaboration on datasets it is necessary to provide support for multiple different versions of datasets to exist simultaneously, while also providing support for merging diverged datasets. In this paper we present an approach that uses SPARQL 1.1 in combination with the version control system Git, that creates commits for all changes applied to an RDF dataset containing multiple named graphs. Further the operations provided by Git are used to distribute the commits among collaborators and merge diverged versions of the dataset. We show the advantages of (public) Git repositories for RDF datasets and how this represents a way to collaborate on RDF data and consume it. With SPARQL 1.1 and Git in combination, users are given several opportunities to participate in the evolution of RDF data.
  20. Enforcing scalable authorization on SPARQL queries by Jörg Unbehauen, Marvin Frommhold und Michael Martin in Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS'16) [Bibsonomy of Enforcing scalable authorization on SPARQL queries]
    Abstract With the adoption of the Linked Data Paradigm in the enterprise context effective measures for securing sensitive data are in higher demand than ever before. Exemplary, integrating enterprise systems containing millions of assets and fine granular access control rules with large public background knowledge graphs leads to both a high number of triples and a high number of access control axioms, which traditional methods struggle to process. Therefore, we introduce novel approaches for enforcing access control on SPARQL queries and evaluate their implementation using an extension of the Berlin SPARQL Benchmark. Additionally, we discuss strengths and weaknesses of the respective approaches and outline future work.
  21. Executing SPARQL queries over Mapped Document Stores with SparqlMap-M by Jörg Unbehauen und Michael Martin in 12th International Conference on Semantic Systems Proceedings (SEMANTiCS 2016) [Bibsonomy of Executing SPARQL queries over Mapped Document Stores with SparqlMap-M]
    Abstract With the increasing adoption of NoSQL data base systems like MongoDB or CouchDB more and more applications store structured data according to a non-relational, document oriented model. Exposing this structured data as Linked Data is currently inhibited by a lack of standards as well as tools and requires the implementation of custom solutions. While recent efforts aim at expressing transformations of such data models into RDF in a standardized manner, there is a lack of approaches which facilitate SPARQL execution over mapped non-relational data sources. With SparqlMap-M we show how dynamic SPARQL access to non-relational data can be achieved. SparqlMap-M is an extension to our SPARQL-to-SQL rewriter SparqlMap that performs a (partial) transformation of SPARQL queries by using a relational abstraction over a document store. Further, duplicate data in the document store is used to reduce the number of joins and custom optimizations are introduced. Our showcase scenario employs the Berlin SPARQL Benchmark (BSBM) with different adaptions to a document data model. We use this scenario to demonstrate the viability of our approach and compare it to different MongoDB setups and native SQL.
  22. LODStats: The Data Web Census Dataset by Ivan Ermilov, Jens Lehmann, Michael Martin und Sören Auer in Proceedings of 15th International Semantic Web Conference - Resources Track (ISWC'2016) [Bibsonomy of LODStats: The Data Web Census Dataset]
    Abstract Over the past years, the size of the Data Web has increased significantly, which makes obtaining general insights into its growth and structure both more challenging and more desirable. The lack of such insights hinders important data management tasks such as quality, privacy and coverage analysis. In this paper, we present LODStats, which provides a comprehensive picture of the current state of a significant part of the Data Web. LODStats integrates RDF datasets from data.gov, publicdata.eu and datahub.io data catalogs and at the time of writing lists over 9 000 RDF datasets. For each RDF dataset, LODStats collects comprehensive statistics and makes these available in adhering to the LDSO vocabulary. This analysis has been regularly published and enhanced over the past four years at the public platform lodstats.aksw.org. We give a comprehensive overview over the resulting dataset.
  23. OntoWiki 1.0: 10 Years of Development - What's New in OntoWiki by Philipp Frischmuth, Natanael Arndt und Michael Martin in Joint Proceedings of the Posters and Demos Track of the 12th International Conference on Semantic Systems - SEMANTiCS2016 and the 1st International Workshop on Semantic Change & Evolving Semantics (SuCCESS'16) [Bibsonomy of OntoWiki 1.0: 10 Years of Development - What's New in OntoWiki]
    Abstract In this demonstration (with supportive poster) we present the semantic data wiki OntoWiki, which was released in version 1.0 just recently. We focus on the changes introduced to the tool in the latest release and showcase the generic data wiki, improvements we made with regard to the documentation as well as three success stories where OntoWiki was adapted and deployed.
  24. Structured Feedback: A Distributed Protocol for Feedback and Patches on the Web of Data by Natanael Arndt, Kurt Junghanns, Roy Meissner, Philipp Frischmuth, Norman Radtke, Marvin Frommhold und Michael Martin in Proceedings of the Workshop on Linked Data on the Web co-located with the 25th International World Wide Web Conference (WWW 2016) [Bibsonomy of Structured Feedback: A Distributed Protocol for Feedback and Patches on the Web of Data]
    Abstract The World Wide Web is an infrastructure to publish and retrieve information through web resources. It evolved from a static Web 1.0 to a multimodal and interactive communication and information space which is used to collaboratively contribute and discuss web resources, which is better known as Web 2.0. The evolution into a Semantic Web (Web 3.0) proceeds. One of its remarkable advantages is the decentralized and interlinked data composition. Hence, in contrast to its data distribution, workflows and technologies for decentralized collaborative contribution are missing. In this paper we propose the Structured Feedback protocol as an interactive addition to the Web of Data. It offers support for users to contribute to the evolution of web resources, by providing structured data artifacts as patches for web resources, as well as simple plain text comments. Based on this approach it enables crowd-supported quality assessment and web data cleansing processes in an ad-hoc fashion most web users are familiar with.
  25. Towards Versioning of Arbitrary RDF Data by Marvin Frommhold, Rubén Navarro Piris, Natanael Arndt, Sebastian Tramp, Niklas Petersen und Michael Martin in 12th International Conference on Semantic Systems Proceedings (SEMANTiCS 2016) [Bibsonomy of Towards Versioning of Arbitrary RDF Data]
    Abstract Coherent and consistent tracking of provenance data and in particular update history information is a crucial building block for any serious information system architecture. Version Control Systems can be a part of such an architecture enabling users to query and manipulate versioning information as well as content revisions. In this paper, we introduce an RDF versioning approach as a foundation for a full featured RDF Version Control System. We argue that such a system needs support for all concepts of the RDF specification including support for RDF datasets and blank nodes. Furthermore, we placed special emphasis on the protection against unperceived history manipulation by hashing the resulting patches. In addition to the conceptual analysis and an RDF vocabulary for representing versioning information, we present a mature implementation which captures versioning information for changes to arbitrary RDF datasets.
  26. CubeViz -- Exploration and Visualization of Statistical Linked Data by Michael Martin, Konrad Abicht, Claus Stadler, Sören Auer, Axel-C. Ngonga Ngomo und Tommaso Soru in Proceedings of the 24th International Conference on World Wide Web, WWW 2015 [Bibsonomy of CubeViz -- Exploration and Visualization of Statistical Linked Data]
    Abstract CubeViz is a flexible exploration and visualization platform for statistical data represented adhering to the RDF Data Cube vocabulary. If statistical data is provided adhering to the Data Cube vocabulary, CubeViz exhibits a faceted browsing widget allowing to interactively filter observations to be visualized in charts. Based on the selected structural part, CubeViz offers suitable chart types and options for configuring the visualization by users. In this demo we present the CubeViz visualization architecture and components, sketch its underlying API and the libraries used to generate the desired output. By employing advanced introspection, analysis and visualization bootstrapping techniques CubeViz hides the schema complexity of the encoded data in order to support a user-friendly exploration experience.
  27. Improving the Interoperability of Open Data Portals by Dietmar Gattwinkel, Konrad Abicht, Rene Pietzsch und Michael Martin in Share PSI [Bibsonomy of Improving the Interoperability of Open Data Portals]
    Abstract In the context of the European Digital Agenda governmental authorities as well as virtual communities (e.g. datahub.io) have published a large amount of open datasets. This is a fundamentally positive development, however one can observe that many different that both for the metadata and the data itself different vocabularies are in use. Furthermore, the established vocabularies are often augmented with portal specific metadata standards and published in different (local) languages. If Open Data are to be integrated and aggregated across portals, this entails a lot of effort. In this paper we present examples how the problems of interoperability and multilingualism could be addressed for key open data asset descriptors. We focus on the analysis of today's problems and ways to solve them.
  28. LinkedSpending: OpenSpending becomes Linked Open Data by Konrad Höffner, Michael Martin und Jens Lehmann in Semantic Web Journal [Bibsonomy of LinkedSpending: OpenSpending becomes Linked Open Data]
    Abstract There is a high public demand to increase transparency in government spending. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because it is more likely to commit to large projects. OpenSpending.org is a an open platform that provides public finance data from governments around the world. In this article, we present its RDF conversion LinkedSpending which provides more than five million planned and carried out financial transactions in 627 datasets from all over the world from 2005 to 2035 as Linked Open Data. This data is represented in the RDF Data Cube vocabulary and is freely available and openly licensed.
  29. RDF Editing on the Web by Claus Stadler, Natanael Arndt, Michael Martin und Jens Lehmann in SEMANTICS 2015 [Bibsonomy of RDF Editing on the Web]
    Abstract While several tools for simplifying the task of visualizing (SPARQL accessible) RDF data on the Web are available today, there is a lack of corresponding tools for exploiting standard HTML forms directly for RDF editing. The few related existing systems roughly fall in the categories of (a) applications that are not aimed at being reused as components, (b) form generators, which automatically create forms from a given schema -- possibly derived from instance data -- or (c) form template processors which create forms from a manually created specification. Furthermore, these systems usually come with their own widget library, which can only be extended by wrapping existing widgets. In this paper, we present the AngularJS-based Rdf Edit eXtension (REX) system, which facilitates the enhancement of standard HTML forms as well as many existing AngularJS widgets with RDF editing support by means of a set of HTML attributes. We demonstrate our system though the realization of several usage scenarios.
  30. Entwicklung laufzeitoptimierter semantischer Web-Applikationen: Konzepte, Lösungsansätze und Anwendungsfälle by Michael Martin [Bibsonomy of Entwicklung laufzeitoptimierter semantischer Web-Applikationen: Konzepte, Lösungsansätze und Anwendungsfälle]
    Note: http://d-nb.info/1059148110
    Abstract The main criteria for the successful use of Semantic Web technologies are a user-friendly and intuitive design of user interfaces (especially for web applications) and an acceptable performance with regard to the production, processing and publication of semantically represented data. Data management schemata used in the Semantic Web (Triple Stores) generally offer a high degree of flexibility for the management of information by means of RDF graphs, taxonomies, vocabularies or ontologies. However, this aspect is accompanied by challenges concerning the usability and performance in the development of Semantic Web applications, especially when complex information structures and corresponding queries have to be processed. Therefore, if priority is given to easing the use and performance of the software, development risks have to be taken into account. To minimize these risks, this thesis proposes a categorization model which can be used to assist in the specification of requirements. Furthermore, approaches are presented that foster the reduction and optimization of SPARQL queries on the application side, and thus positively influence the process of run-time optimization of Semantic Web applications. Dedicated strategies are developed for the exploration and visualization of specific data modalities, such as spatial, statistical, and multilingual data. Based on these concepts, software components are developed, optimized and integrated into existing web applications. The approaches elaborated in this work are evaluated by using the Berlin SPARQL Benchmark as well as Web applications from different domains such as tourism, finance and statistics.
  31. Exploring the Web of Spatial Data with Facete by Claus Stadler, Michael Martin und Sören Auer in Companion proceedings of 23rd International World Wide Web Conference (WWW) [Bibsonomy of Exploring the Web of Spatial Data with Facete]
  32. Facilitating the Exploration and Visualization of Linked Data by Christian Mader, Michael Martin und Claus Stadler in Linked Open Data---Creating Knowledge Out of Interlinked Data (Editors: Sören Auer, Volha Bryl und Sebastian Tramp) [Bibsonomy of Facilitating the Exploration and Visualization of Linked Data]
    Abstract The creation and the improvement of tools that cover exploratory and visualization tasks for Linked Data were one of the major goals focused in the LOD2 project. Tools that support those tasks are regarded as essential for the Web of Data, since they can act as a user-oriented starting point for data customers. During the project, several development efforts were made, whose results either facilitate the exploration and visualization directly (such as OntoWiki, the Pivot Browser) or can be used to support such tasks. In this chapter we present the three selected solutions rsine, CubeViz and Facete.
  33. Supporting the Linked Data Life Cycle Using an Integrated Tool Stack by Bert Van Nuffelen, Valentina Janev, Michael Martin, Vuk Mijovic und Sebastian Tramp in Linked Open Data - Creating Knowledge Out of Interlinked Data (Editors: Sören Auer, Volha Bryl und Sebastian Tramp) [Bibsonomy of Supporting the Linked Data Life Cycle Using an Integrated Tool Stack]
    Abstract The core of a Linked Data application is the processing of the knowledge expressed as Linked Data. Therefore the creation, management, curation and publication of Linked Data are critical aspects for an application's success. For all of these aspects the LOD2 project provides components. These components have been collected and placed under one distribution umbrella: the LOD2 stack. In this chapter we will introduce this component stack. We will show how to get access; which component covers which aspect of the Linked Data life cycle and how using the stack eases the access to Linked Data management tools. Furthermore we will elaborate how the stack can be used to support a knowledge domain. The illustrated domain is statistical data.
  34. Increasing the Financial Transparency of European Commission Project Funding by Michael Martin, Claus Stadler, Philipp Frischmuth und Jens Lehmann in Semantic Web Journal [Bibsonomy of Increasing the Financial Transparency of European Commission Project Funding]
    Abstract The Financial Transparency System (FTS) of the European Commission contains information about grants for European Union projects starting from 2007. It allows users to get an overview on EU funding, including information on beneficiaries as well as the amount and type of expenditure and information on the responsible EU department. The original dataset is freely available on the European Commission website, where users can query the data using an HTML form and download it in CSV and most recently XML format. In this article, we describe the transformation of this data to RDF and its interlinking with other datasets. We show that this allows interesting queries over the data, which were very difficult without this conversion. The main benefit of the dataset is an increased financial transparency of EU project funding. The RDF version of the FTS dataset will become part of the EU Open Data Portal and eventually be hosted and maintained by the European Union itself.
  35. Linked Open Data Statistics: Collection and Exploitation by Ivan Ermilov, Michael Martin, Jens Lehmann und Sören Auer in Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web [Bibsonomy of Linked Open Data Statistics: Collection and Exploitation]
  36. Publishing and Interlinking the Global Health Observatory Dataset by Amrapali Zaveri, Jens Lehmann, Sören Auer, Mofeed M. Hassan, Mohamed A. Sherif und Michael Martin in Semantic Web Journal [Bibsonomy of Publishing and Interlinking the Global Health Observatory Dataset]
    Abstract The improvement of public health is one of the main indicators for societal progress. Statistical data for monitoring public health is highly relevant for a number of sectors, such as research (e.g. in the life sciences or economy), policy making, health care, pharmaceutical industry, insurances etc. Such data is meanwhile available even on a global scale, e.g. in the Global Health Observatory (GHO) of the United Nations's World Health Organization (WHO). GHO comprises more than 50 different datasets, it covers all 198 WHO member countries and is updated as more recent or revised data becomes available or when there are changes to the methodology being used. However, this data is only accessible via complex spreadsheets and, therefore, queries over the 50 different datasets as well as combinations with other datasets are very tedious and require a significant amount of manual work. By making the data available as RDF, we lower the barrier for data re-use and integration. In this article, we describe the conversion and publication process as well as use cases, which can be implemented using the GHO data.
  37. Extending the WebID Protocol with Access Delegation by Sebastian Tramp, Henry Story, Andrei Sambra, Philipp Frischmuth, Michael Martin und Sören Auer in Proceedings of the Third International Workshop on Consuming Linked Data (COLD2012) (Editors: Andreas Harth, Olaf Hartig und Juan Sequeda) [Bibsonomy of Extending the WebID Protocol with Access Delegation]
  38. LODStats---An Extensible Framework for High-performance Dataset Analytics by Jan Demter, Sören Auer, Michael Martin und Jens Lehmann in Proceedings of the EKAW 2012 [Bibsonomy of LODStats---An Extensible Framework for High-performance Dataset Analytics]
    Note: 29\% acceptance rate
  39. Managing the life-cycle of Linked Data with the LOD2 Stack by Sören Auer, Lorenz Bühmann, Christian Dirschl, Orri Erling, Michael Hausenblas, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, Bert van Nuffelen, Claus Stadler, Sebastian Tramp und Hugh Williams in Proceedings of International Semantic Web Conference (ISWC 2012) [Bibsonomy of Managing the life-cycle of Linked Data with the LOD2 Stack]
    Note: 22\% acceptance rate
  40. OLAP2DataCube: An Ontowiki Plugin for Statistical Data Publishing by Percy E Salas, Michael Martin, Fernando Maia Da Mota, Karin Breitman, Sören Auer und Marco A Casanova in Proceedings of the 2nd Workshop on Developing Tools as Plug-ins [Bibsonomy of OLAP2DataCube: An Ontowiki Plugin for Statistical Data Publishing]
  41. Publishing Statistical Data on the Web by Percy E Salas, Michael Martin, Fernando Maia Da Mota, Karin Breitman, Sören Auer und Marco A Casanova in Proceedings of 6th International IEEE Conference on Semantic Computing [Bibsonomy of Publishing Statistical Data on the Web]
  42. Publishing Statistical Data on the Web by Percy E Salas, Fernando Maia Da Mota, Karin Breitman, Marco A Casanova, Michael Martin und Sören Auer in International Journal of Semantic Computing [Bibsonomy of Publishing Statistical Data on the Web]
  43. The Digital Agenda Scoreboard: A Statistical Anatomy of Europe's way into the Information Age by Michael Martin, Bert van Nuffelen, Stefano Abruzzini und Sören Auer [Bibsonomy of The Digital Agenda Scoreboard: A Statistical Anatomy of Europe's way into the Information Age]
    Abstract Evidence-based policy is policy informed by rigorously established objective evidence. An important aspect of evidence-based policy is the use of scientifically rigorous studies to identify programs and practices capable of improving policy relevant outcomes. Statistics represent a crucial means to determine whether progress is made towards policy targets. In May 2010, the European Commission adopted the Digital Agenda for Europe, a strategy to take advantage of the potential offered by the rapid progress of digital technologies. The Digital Agenda contains commitments to undertake a number of specific policy actions intended to stimulate a circle of investment in and usage of digital technologies. It identifies 13 key performance targets. In order to chart the progress of both the announced policy actions and the key performance targets a scoreboard is published, thus allowing the monitoring and benchmarking of the main developments of information society in European countries. In addition to these human-readable browsing, visualization and exploration methods, machine-readable access facilitating re-usage and interlinking of the underlying data is provided by means of RDF and Linked Open Data. We sketch the transformation process from raw data up to rich, interlinked RDF, describe its publishing and the lessons learned.
  44. Towards an Open-Governmental Data Web by Ivan Ermilov, Claus Stadler, Michael Martin und Sören Auer in Proceedings of the KESW2012 [Bibsonomy of Towards an Open-Governmental Data Web]
    Abstract Up to the present day much effort has been made to publish government data on the Web. However, such data has been published in different formats. For any particular source and use (e.g. exploration, visualization, integration) of such information special applications have to be written. This limits the overall usability of the information provided and makes it diffcult to access information resources. These limitations can be overridden, if the information will be provided using a homogeneous data and access model complying with the Linked Data principles. In this paper we showcase how raw Open Government Data (OGD) from heterogeneous sources can be processed, converted, published and used on the Web of Linked Data. In particular we demonstrate our experience in processing of OGD on two use cases: the Digital Agenda Scoreboard and the Financial Transparency System of the European Commission.
  45. Dispedia.de---A Linked Information System for Rare Diseases by Romy Elze, Tom-Michael Hesse und Michael Martin in Information Quality in e-Health (Editors: Andreas Holzinger und Klaus-Martin Simonic) [Bibsonomy of Dispedia.de---A Linked Information System for Rare Diseases]
    Abstract The challenge of developing information systems for rare diseases is in harmonizing social care conditions and health care conditions with the focus on personalization and patient autonomy. Knowledge about the most rare diseases is limited, which is the result of poorly funded research and the existence of only a few specialized experts. Furthermore, the treatment and care of the affected patients is very complex, cost-intensive, time critical, and involved stakeholders are very heterogeneous. The information needed by the patient depends on his or her personal situation and constitution. To support the information logistics between patients of rare diseases and (all) other stakeholders (e.g. physicians, therapists, and researchers), we developed an information system with Linked Open Data technologies in order to create a platform and tool independent solution addressing the heterogeneity of the stakeholders. To engineer system and data model requirements of our approach we analyzed the rare disease Amyotrophic Lateral Sclerosis (ALS), which have wide-spreaded characteristics. The resulting formal knowledge representation was encoded in OWL, which allows, for instance, a modular development of complex areas and also the re-usage of existing knowledge bases.
  46. Managing Multimodal and Multilingual Semantic Content by Michael Martin, Daniel Gerber, Norman Heino, Sören Auer und Timofey Ermilov in Proceedings of the 7th International Conference on Web Information Systems and Technologies [Bibsonomy of Managing Multimodal and Multilingual Semantic Content]
    Abstract With the advent and increasing popularity of Semantic Wikis and the Linked Data the man- agement of semantically represented knowledge became mainstream. However, certain categories of semantically enriched content, such as multimodal documents as well as multilingual textual resources are still difficult to handle. In this paper, we present a comprehensive strategy for managing the life-cycle of both multimodal and multilingual semantically enriched content. The strategy is based on extending a number of semantic knowledge management techniques such as authoring, versioning, evolution, access and exploration for semantically enriched multimodal and multilingual content. We showcase an implementation and user interface based on the semantic wiki paradigm and present a use case from the e-tourism domain.
  47. ReDD-Observatory: Using the Web of Data for Evaluating the Research-Disease Disparity by Amrapali Zaveri, Ricardo Pietrobon, Sören Auer, Jens Lehmann, Michael Martin und Timofey Ermilov in Proc. of the IEEE/WIC/ACM International Conference on Web Intelligence [Bibsonomy of ReDD-Observatory: Using the Web of Data for Evaluating the Research-Disease Disparity]
  48. The Open Government Data Stakeholder Survey by Michael Martin, Martin Kaltenböck, Helmut Nagy und Sören Auer in Proceedings of the Open Knowledge Conference in 2011 [Bibsonomy of The Open Government Data Stakeholder Survey]
    Abstract This paper describes the results of the LOD2 Open Government Data Stakeholder Survey 2010 (OGD Stakeholder Survey). The objective of the survey was to involve as many relevant stakeholders as possible in the 27 European Union countries in an online questionnaire and ask them about their needs and requirements in the area of open data as well as for the publicdata.eu portal. The main areas of the survey have been questions about Open Government Data itself, questions about data, about the usage of data, questions about the requirements for a centralised data catalogue as well as questions about the participants themselves. The goal of the OGD Stakeholder Survey has been to reach a broad audience of the main stakeholders of open data: citizens, public administration, politics and industry. In the course of the survey that was open for 5 weeks from November 2010 to December 2010 in total 329 participants completed the survey. The results have been published in April 2011 in the form of HTML and PDF, the raw data in CSV. In addition to these publication formats (HTML, PDF, CSV) we published the data also as Linked Data using various vocabularies and tools.
  49. Categorisation of Semantic Web Applications by Michael Martin und Sören Auer in proceedings of the 4th International Conference on Advances in Semantic Processing (SEMAPRO2010) 25 October -- 30 October, Florence, Italy [Bibsonomy of Categorisation of Semantic Web Applications]
  50. Improving the Performance of Semantic Web Applications with SPARQL Query Caching by Michael Martin, Jörg Unbehauen und Sören Auer in Proceedings of 7th Extended Semantic Web Conference (ESWC 2010), 30 May -- 3 June 2010, Heraklion, Crete, Greece (Editors: Lora Aroyo, Grigoris Antoniou, Eero Hyvönen, Annette ten Teije, Heiner Stuckenschmidt, Liliana Cabral und Tania Tudorache) [Bibsonomy of Improving the Performance of Semantic Web Applications with SPARQL Query Caching]
    Abstract The performance of triple stores is one of the major obstacles for the deployment of semantic technologies in many usage scenarios. In particular, Semantic Web applications, which use triple stores as persistence backends, trade performance for the advantage of flexibility with regard to information structuring. In order to get closer to the performance of relational database-backed Web applications, we developed an approach for improving the performance of triple stores by caching query results and even complete application objects. The selective invalidation of cache objects, following updates of the underlying knowledge bases, is based on analysing the graph patterns of cached SPARQL queries in order to obtain information about what kind of updates will change the query result. We evaluated our approach by extending the BSBM triple store benchmark with an update dimension as well as in typical Semantic Web application scenarios.
  51. Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis by Thomas Riechert, Ulf Morgenstern, Sören Auer, Sebastian Tramp und Michael Martin in Proceedings of the 9th International Semantic Web Conference (ISWC2010) (Editors: Peter F. Patel-Schneider, Yue Pan, Pascal Hitzler, Peter Mika, Lei Zhang, Jeff Z. Pan, Ian Horrocks und Birte Glimm) [Bibsonomy of Knowledge Engineering for Historians on the Example of the Catalogus Professorum Lipsiensis]
    Abstract Although the Internet, as an ubiquitous medium for communication, publication and research, already significantly influenced the way historians work, the capabilities of the Web as a direct medium for collaboration in historic research are not much explored. We report about the application of an adaptive, semantics-based knowledge engineering approach for the development of a prosopographical knowledge base on the Web - the Catalogus Professorum Lipsiensis. In order to enable historians to collect, structure and publish prosopographical knowledge an ontology was developed and knowledge engineering facilities based on the semantic data wiki OntoWiki were implemented. The resulting knowledge base contains information about more than 14.000 entities and is tightly interlinked with the emerging Web of Data. For access and exploration by other historians a number of access interfaces were developed, such as a visual SPARQL query builder, a relationship finder and a Linked Data interface. The approach is transferable to other prosopographical research projects and historical research in general, thus improving the collaboration in historic research communities and facilitating the reusability of historic research results.
  52. Learning Semantic Web Technologies with the Web-Based SPARQLTrainer by Daniel Gerber, Marvin Frommhold, Michael Martin, Sebastian Tramp und Sören Auer in Proceedings of the 6th International Conference on Semantic Systems 2010 [Bibsonomy of Learning Semantic Web Technologies with the Web-Based SPARQLTrainer]
    Abstract The success of the Semantic Web in research, technology and standardization communities has resulted in a large variety of different approaches, standards and techniques. This diversity and heterogeneity often involve an increasing difficulty of becoming acquainted with Semantic Web technologies. In this work, we present the SPARQLTrainer approach for educating novices in semantic technologies in a playful way. With SPARQLTrainer educators can devise a SPARQL course by defining a number of exercises either generically or for a specific domain. Learners can complete courses by stepwise answering questions of increasing complexity. These questions usually require the learner to build a SPARQL query for querying a certain knowledge base and using certain SPARQL features. The SPARQL queries created by a learner are compared with example solutions given by the instructor. This comparison takes possible variations into account and gives specific feedback to the learner.
  53. Ortsbezogene Navigation basierend auf einem Vokabular zur Erzeugung geographischer Hierarchien by Michael Martin und Thomas Riechert in Catalogus Professorum Lipsiensis -- Konzeption, technische Umsetzung und Anwendungen für Professorenkataloge im Semantic Web (Editors: Ulf Morgenstern und Thomas Riechert) [Bibsonomy of Ortsbezogene Navigation basierend auf einem Vokabular zur Erzeugung geographischer Hierarchien]
    Abstract Der geschichtswissenschaftliche Professorenkatalog der Universität Leipzig enthält u.a. Informationen zu Lebensläufen von Professoren, die an der Universität Leipzig seit dem Jahr 1409 lehrten. Diese historischen Informationen werden in einem kollaborativen Prozess unter Verwendung des semantischen Daten-Wikis OntoWiki aquiriert und publiziert. Mit dem Professorenkatalog liegt eine Wissensbasis vor, die auch geografische Informationen zu Geburts-, Graduierungs- und Sterbeorten von Professoren enthält. Der Buchbeitrag stellt einen Evolutionsprozess und die Verwendung eines geeigneten Vokabulars vor, um die implizit enthaltene Geo-Hierarchie aufzuarbeiten. Die daraus resultierende Geo-Hierarchie wird anschließend zur Navigation eingesetzt.
  54. The Catalogus Professorum Lipsiensis---Semantics-based Collaboration and Exploration for Historians by Thomas Riechert, Ulf Morgenstern, Sören Auer, Sebastian Tramp und Michael Martin in Proceedings of the 9th International Semantic Web Conference (ISWC2010) [Bibsonomy of The Catalogus Professorum Lipsiensis---Semantics-based Collaboration and Exploration for Historians]
  55. Update Strategies for DBpedia Live by Claus Stadler, Michael Martin, Jens Lehmann und Sebastian Hellmann in 6th Workshop on Scripting and Development for the Semantic Web Colocated with ESWC 2010 30th or 31st May, 2010 Crete, Greece [Bibsonomy of Update Strategies for DBpedia Live]
    Abstract Wikipedia is one of the largest public information spaces with a huge user community, which collaboratively works on the largest online encyclopedia. Their users add or edit up to 150 thousand wiki pages per day. The DBpedia project extracts RDF from Wikipedia and interlinks it with other knowledge bases. In the DBpedia live extraction mode, Wikipedia edits are instantly processed to update information in DBpedia. Due to the high number of edits and the growth of Wikipedia, the update process has to be very efficient and scalable. In this paper, we present different strategies to tackle this challenging problem and describe how we modified the DBpedia live extraction algorithm to work more efficiently.
  56. Developing Semantic Web Applications with the OntoWiki Framework by Norman Heino, Sebastian Dietzold, Michael Martin und Sören Auer in Networked Knowledge - Networked Media (Editors: Tassilo Pellegrini, Sören Auer, Klaus Tochtermann und Sebastian Schaffert) [Bibsonomy of Developing Semantic Web Applications with the OntoWiki Framework]
    Abstract In this paper, we introduce the OntoWiki Application Framework for developing Semantic Web applications with a strong emphasis on collaboration. After presenting OntoWiki as our main show case for the framework, we give both an architectural overview and a detailed view on the included components.We conclude this paper with a presentation of different use cases where the framework was strongly involved.
  57. Entwicklung semantischer Webapplikationen: Auf dem Weg vom Dokumenten- zum Daten-Web by Sören Auer, Sebastian Dietzold und Michael Martin in T3N Magazin [Bibsonomy of Entwicklung semantischer Webapplikationen: Auf dem Weg vom Dokumenten- zum Daten-Web]
  58. Performanzsteigerung datenbankgestützter RDF-Triple-Stores by Michael Martin in Tagungsband XInnovations 2008 in Berlin (Editors: Robert Tolksdorf und Johann-Christoph Freytag) [Bibsonomy of Performanzsteigerung datenbankgestützter RDF-Triple-Stores]
  59. xOperator - Chat with the Semantic Web by Jörg Unbehauen, Sebastian Hellmann, Michael Martin, Sebastian Dietzold und Sören Auer [Bibsonomy of xOperator - Chat with the Semantic Web]
    Note: Poster @ the ISWC 2008
  60. Exploring the Netherlands on a Semantic Path by Michael Martin in Proceedings of the 1st Conference on Social Semantic Web (Editors: Sören Auer, Christian Bizer, Claudia Müller und Anna Zhdanova) [Bibsonomy of Exploring the Netherlands on a Semantic Path]