Top Menu

Contact

Machine Learning Section

 

Department of Computer Science (DIKU)

University of Copenhagen

 

Sigurdsgade 41

2200 København N

Denmark

 

Room: 1.08

 

EMail: This email address is being protected from spambots. You need JavaScript enabled to view it.

 

 

Research

The field of data analysis (e.g., data mining, machine learning, algorithm engineering, ...) has gained more and more attention in recent years. One of the reasons for this phenomenon is the fact that the data volumes have increased dramatically during the last decade, leading to so-called big data-problems. This is the case, for instance, in astronomy, where current and upcoming projects like the Sloan Digital Sky Survey (SDSS) or the Large Synoptic Sky Telescope (LSST) gather and will gather data in the tera- and petabyte range. For such projects, the sheer data volume renders a manual analysis impossible, and this necessitates the use of automatic data analysis tools.

 

The corresponding data-rich scenarios often involve a large number of patterns (e.g., number of galaxy images) and/or a large number of dimensions (e.g., pixels per image). Further, a general lack of "labeled data" can often be observed, since the manual interaction with experts can be very time-consuming. Dealing with these situations usually requires the adaptation of standard data analysis techniques, and this is part of my research. In particular, I am interested in the following research fields/projects:

 

Large-Scale Data Science

In most cases, the sheer data volumes render a manual analysis impossible. Data science techniques aim at “extracting” knowledge in an automatic manner and have been identified as one of the key drivers for discoveries and innovation. My past and current research activities aim at both the development of scalable algorithms that can effectively deal with huge amounts of data as well as the particular application of such tools to real-world problems.

 

My current research aims at the use of "cheap"' massively-parallel HPC systems to reduce the practical runtime of existing approaches. In contrast to conventional architectures, such modern systems are based on a huge amount of "small" specialized compute units, which are well-suited for massively-parallel implementations. The adaptation of data mining and machine learning techniques to such special hardware architectures has gained considerable attention during the last years. In case one can successfully adapt an approach to the specific needs of such systems, one can often achieve a significant runtime reduction (at much lower costs compared to the ones induced by traditional parallel computing architectures).


Energy Systems

In recent years, there has been a significant increase in energy produced by sustainable resources like wind- and solar power plants. This led to a shift of traditional energy systems to so-called smart grids (i. e., distributed systems of energy suppliers and consumers). While the sustainable energy resources are very appealing from an environmental point of view, their volatileness renders the integration into the overall energy system difficult.

 

For this reason, short-term wind and solar energy prediction systems are essential for balance authorities to schedule spinning reserves and reserve energy. This task can be formalized as regression problem (with patterns based on, e.g., wind turbine measurements), and the resulting models are well-suited for short-term forecasting scenarios, see below for details.

 

Big Data in Astronomy

Modern telescopes and satellites can gather huge amounts of data. Current catalogs, for instance, contain data in the terabyte range; upcoming projects will encompass petabytes of data. On the one hand, this data-rich situation offers the opportunity to make new discoveries like detecting new, distant objects. On the other hand, managing such data volumes can be very difficult and usually leads to problem-specific challenges.

Data mining techniques have been recognized to play an important role for upcoming surveys. Typical tasks in astronomy are, for instance, the classification of stars, galaxies, and quasars, or the estimation of the redshift of galaxies based on image data. Appropriate models are already in use for current catalogs (see, e.g., the preprocessing pipeline of the SDSS). However, obtaining high-quality models for specific tasks can still be a very challenging task.

 

I am involved in the development of redshift estimation models (e.g., regression models) for so-called quasi-stellar radio sources (quasars), which are among the most distant objects that can be observed from Earth. To efficiently process the large data volumes, we make use of spatial data structures (like k-d-trees), which can be applied for various other tasks as well. See the publications below for more details.

 

 


Semi- and Unsupervised Learning

The task of classifying patterns is among the most prominent ones in the field of machine learning. Support vector machines depict state-of-the-art tools for this task and have been extended to various learning settings including other supervised learning tasks (e.g., regression or preference learning) but also to so-called semi- and unsupervised scenarios.

 

Among these extensions are, for instance, semi-supervised support vector machines, which take additional unlabeled patterns into account (left: black points). This additional information reveals more information about the "structure" of the data and can lead to models with a better performance.

In some cases, no labeled patterns at all are given. This leads to the so-called maximum margin clustering problem. While being very appealing from a practical point of view, both variants induce difficult combinatorial optimization problems, which renders a direct application of these extensions difficult.

 

Developing efficient optimization schemes for these variants is part of my research; see below for corresponding publications or here for an implemetation. Both support vector machines as well as their extensions can successfully by applied for, e.g., text data, which stems from various application domains like e-commerce or social media.

 

 

     

 

Selected Publications

 

A complete list of my publications can be found here.

 

Large-Scale Data Science

  1. Fabian Gieseke and Christian Igel. Training Big Random Forests with Little Resources. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2018, Accepted.  draft 

  2. Malte Mehren, Fabian Gieseke, Jan Verbesselt, Sabina Rosca, Stéphanie Horion, and Achim Zeileis. Massively-Parallel Break Detection for Satellite Data. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management (SSDBM). 2018, Accepted.  draft 

  3. Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. Buffer k-d Trees: Processing Massive Nearest Neighbor Queries on GPUs. In Proceedings of the 31st International Conference on Machine Learning (ICML) 32(1). 2014, 172-180.   

  4. Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised Multi-Class Regularized Least-Squares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585-594.   

  5. Fabian Gieseke, Gabriel Moruz, and Jan Vahrenhold. Resilient K-d Trees: K-Means in Space Revisited. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). 2010, 815-820.   

 

Big Data in Astronomy

  1. Fabian Gieseke, Steven Bloemen, Cas Bogaard, Tom Heskes, Jonas Kindler, Richard A Scalzo, Valerio A R M Ribeiro, Jan Roestel, Paul J Groot, Fang Yuan, Anais Möller, and Brad E Tucker. Convolutional Neural Networks for Transient Candidate Vetting in Large-Scale Surveys. Monthly Notices of the Royal Astronomical Society (MNRAS), 2017.   

  2. Jan Kremer, Kristoffer Stensbo-Smidt, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy. IEEE Intelligent Systems 32(2):16–22, 2017. 

  3. Jan Kremer, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Nearest Neighbor Density Ratio Estimation for Large-Scale Applications in Astronomy. Astronomy and Computing 12:62–72, 2015.   

  4. Kai Lars Polsterer, Peter Zinn, and Fabian Gieseke. Finding New High-Redshift Quasars by Asking the Neighbours. Monthly Notices of the Royal Astronomical Society (MNRAS) 428(1):226-235, 2013.   

  5. Fabian Gieseke, Kai Lars Polsterer, Andreas Thom, Peter Zinn, Dominik Bomans, Ralf-Jürgen Dettmar, Oliver Kramer, and Jan Vahrenhold. Detecting Quasars in Large-Scale Astronomical Surveys. In Proceedings of the 9th International Conference on Machine Learning and Applications (ICMLA). 2010, 352-357.   

 

Energy Systems

  1. Oliver Kramer, Nils Treiber, and Fabian Gieseke. Machine Learning in Wind Energy Information Systems. In EnviroInfo. 2013, 16-24. 

  2. Oliver Kramer, Fabian Gieseke, and Benjamin Satzger. Wind Energy Prediction and Monitoring with Neural Computation. Neurocomputing 109(0):84-93, 2013.   

  3. Oliver Kramer and Fabian Gieseke. Short-Term Wind Energy Forecasting Using Support Vector Regression. In Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications. 2011, 271-280.   

 

Semi- and Unsupervised Learning

  1. Fabian Gieseke. An Efficient Many-Core Implementation for Semi-Supervised Support Vector Machines. In International Workshop on Machine Learning, Optimization, and Big Data (MOD2015). 2015, 145–157.   

  2. Fabian Gieseke, Tapio Pahikkala, and Christian Igel. Polynomial Runtime Bounds for Fixed-Rank Unsupervised Least-Squares Classification. In Proceedings of the 5th Asian Conference on Machine Learning (ACML). 2013, 62-71.   

  3. Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Fast and Simple Gradient-Based Optimization for Semi-Supervised Support Vector Machines. Neurocomputing (ICPRAM 2012 Special Issue) 123(10):23-32, 2014.   

  4. Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised Multi-Class Regularized Least-Squares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585-594.   

  5. Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast Evolutionary Maximum Margin Clustering. In Proceedings of the 26th International Conference on Machine Learning (ICML). 2009, 361-368.   

Как выбрать внешний диск. Внешний жесткий диск как выбрать самый подходящий. Какой внешний диск выбрать. Какой принтер выбрать. Какой выбрать лазерный принтер сегодня. Какой цветной принтер выбрать. Готовые программы на java. Изучаем java с нуля быстро. Как начать программировать на java. Бесплатные игры для планшетов android. Качественный samsung android планшет. Планшет android цена. Отдых в турции отели. Самый лучший отдых в турции. Отдых в турции отели цены. Wow дк гайд. Лучший wow фрост дк гайд. Wow дк танк гайд. Рабочие программы на языке python. Изучаем python с нуля. Язык python для начинающих. Бесплатые плагины для Joomla. Скачать joomla плагины бесплатно. Где скачать самые последние плагины на joomla.