Graphs in Space: Graph Embeddings for Machine Learning on Complex Data
- CRC 876 - Topical Seminar
Abstract - In today’s world, data in graph and tabular form are being generated at astonishing rates, with algorithms for machine learning (ML) and data mining (DM) applied to such data being established as drivers of modern society. The field of graph embedding is concerned with bridging the “two worlds” of graph data (represented with nodes and edges) and tabular data (represented with rows and columns) by providing means for mapping graphs to tabular data sets, thus unlocking the use of a wide range of tabular ML and DM techniques on graphs. Graph embedding enjoys increased popularity in recent years, with a plethora of new methods being proposed. However, up to now none of them addressed the dimensionality of the new data space with any sort of depth, which is surprising since it is widely known that dimensionalities greater than 10–15 can lead to adverse effects on tabular ML and DM methods, collectively termed the “curse of dimensionality.” In this talk we will present the most interesting results of our project Graphs in Space: Graph Embeddings for Machine Learning on Complex Data (GRASP) where we investigated the impact of the curse of dimensionality on graph-embedding methods by using two well-studied artifacts of high-dimensional tabular data: (1) hubness (highly connected nodes in nearest-neighbor graphs obtained from tabular data) and (2) local intrinsic dimensionality (LID – number of dimensions needed to express the complexity around particular points in the data space based on properties of surrounding distances). After exploring the interactions between existing graph-embedding methods (focusing on node2vec), and hubness and LID, we will describe new methods based on node2vec that take these factors into account, achieving improved accuracy in at least one of two aspects: (1) graph reconstruction and community preservation in the new space, and (2) success of applications of the produced tabular data to the tasks of clustering and classification. Finally, we will discuss the potential for future research, including applications to similarity search and link prediction, as well as extensions to graphs that evolve over time.
Miloš Radovanović
Miloš Radovanović is Professor of Computer Science at the Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Serbia. His research interests span many areas of data mining and machine learning, with special focus on problems related to high data dimensionality, complex networks, time-series analysis, and text mining, as well as techniques for classification, clustering, and outlier detection. He is Managing Editor of the journal Computer Science and Information Systems (ComSIS) and served as PC member for a large number of international conferences including KDD, ICDM, SDM, AAAI and SISAP.