Springe zum Hauptinhalt
Fakultät für Informatik


270. Informatik-Kolloquium


Dienstag, 29. August 2017, Straße der Nationen 62, Böttcher-Bau, 1/336

08:00 Uhr

Herr Prof. Dr. Jens Lehmann

Large-scale Semantic Data Management and Analytics

The presentation will briefly introduce semantic technologies for knowledge representation. The DBpedia knowledge graph, extracted from Wikipedia and published as Linked Open Data, will be used as an example. Based on that, recent and planned future work of the author in three different areas of large-scale knowledge management and analytics will be covered:

  1. link prediction in knowledge graphs for data integration and management
  2. question answering for allowing natural language queries over knowledge graphs as well as
  3. concept learning for predictive analysis in knowledge graphs.

Theoretical work and algorithms will be accompanied by illustrative examples throughout the talk.

09:40 Uhr

Herr Prof. Dr. Ansgar Scherp

A Parameterized Index Model for Schema-level Search in Large-scale, Distributed Linked Open Data

Linked Open Data (LOD) is about publishing and interlinking graph data on the web. It is the de-facto standard for interoperable, distributed open data management and is supported by industries, non-profit organizations, and governments. The Resource Description Framework (RDF) is used to describe data on the LOD cloud. In contrast to relational databases, RDF does not provide a fixed, pre-defined schema. Rather, RDF allows for flexibly modeling the data schema by attaching RDF types and properties to the entities. Our schema-level index SchemEX allows for mining and searching in large-scale, distributed RDF graph data. The index can be efficiently computed in a stream-based approach with reasonable accuracy over large-scale data sets with billions of RDF triples such as the Billion Triples Challenge datasets. In order to develop, compare, and validate variants of schema-level indices, we recently have developed a novel, parameterized model for schema-level indices using equivalence relations over RDF entities. Additionally, our parameterized model supports features like summarizing owl:sameAs-connected entities and inferencing over RDF Schema. Finally, due to the evolution of the LOD cloud, one observes frequent changes of the data as well as the data schema in terms of combinations of RDF types and properties. Thus, current work includes temporal clustering and finding periodicities in entity dynamics over large-scale, weekly snapshots of the LOD cloud. Knowledge about data evolution is relevant for various application fields and data-driven applications in the future.

11:20 Uhr

Herr Jun.-Prof. Matthias Hagen

Keyqueries for Related Work Search

I will introduce the concept and theory of keyqueries recently developed in my research group. A keyquery for a given document set is a query that returns these documents in the top result ranks when submitted to a reference search engine. Finding keyqueries can be utilized to support scholars searching for related work but also to automatically generate taxonomies for large document collections or to derive and label document clusters.

13:30 Uhr

Herr Dr. Matthias Boehm

Compressed Linear Algebra for Large-Scale Machine Learning in SystemML

Declarative, large-scale machine learning (ML) aims to simplify the development and usage of large-scale ML algorithms. In SystemML, data scientists specify ML algorithms in a high-level language with R-like syntax and the system automatically generates hybrid runtime execution plans that combine both single-node, in-memory operations and distributed operations on MapReduce or Spark. In this talk, we give a brief overview of SystemML and discuss Compressed Linear Algebra (CLA), as a selected runtime technique with high-performance impact. Large-scale ML algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed main memory and enable fast matrix-vector operations on in-memory data. General-purpose, heavy- and lightweight compression techniques struggle to achieve both good compression ratios and fast decompression speed to enable block-wise uncompressed operations. Therefore, we initiate work - inspired by database compression and sparse matrix formats - on value-based compressed linear algebra, in which heterogeneous, lightweight database compression techniques are applied to matrices, and then linear algebra operations such as matrix-vector multiplication are executed directly on the compressed representation. Our experiments show that CLA achieves in-memory operations performance close to the uncompressed case and good compression ratios, which enables fitting substantially larger datasets into available memory. We thereby obtain significant end-to-end performance improvements.

Alle interessierten Personen sind herzlich eingeladen!