Publications
You can find my Google Scholar profile here.
Journals
A D'Isanto, S Cavuoti, Fabian Gieseke, and Kai L Polsterer. Return of the features  Efficient feature selection and interpretation for photometric redshifts. Astronomy & Astrophysics 616:A97, 2018.
Corneliu Florea and Fabian Gieseke. Artistic Movement Recognition by Consensus of Boosted SVM Based Experts. Journal of Visual Communication and Image Representation 56:220233, 2018.
Fabian Gieseke, Cosmin Oancea, and Christian Igel. bufferkdtree: A Python library for massive nearest neighbor queries on multimanycore devices. KnowledgeBased Systems 120:1–3, 2017.
Fabian Gieseke, Steven Bloemen, Cas Bogaard, Tom Heskes, Jonas Kindler, Richard A Scalzo, Valerio A R M Ribeiro, Jan Roestel, Paul J Groot, Fang Yuan, Anais Möller, and Brad E Tucker. Convolutional Neural Networks for Transient Candidate Vetting in LargeScale Surveys. Monthly Notices of the Royal Astronomical Society (MNRAS) 472(3):31013114, 2017.
R S Souza, M L L Dantas, M V CostaDuarte, E D Feigelson, M Killedar, P Y Lablanche, R Vilalta, A KroneMartins, R Beck, and Fabian Gieseke. A probabilistic approach to emissionline galaxy classification. Monthly Notices of the Royal Astronomical Society (MNRAS), 2017.
Jan Kremer, Kristoffer StensboSmidt, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy. IEEE Intelligent Systems 32(2):16–22, 2017.
Robert Beck, Chieh A Lin, Emille O Ishida, Fabian Gieseke, Rafel S Souza, Marcus V CostaDuarte, Mohammed W Hattab, and Alberto KroneMartins. On the realistic validation of photometric redshifts. Monthly Notices of the Royal Astronomical Society (MNRAS) 468(4):43234339, 2017.
Kristoffer StensboSmidt, Fabian Gieseke, Andrew Zirm, Kim Steenstrup Pedersen, and Christian Igel. Sacrificing information for the greater good: how to select photometric bands for optimal accuracy. Monthly Notices of the Royal Astronomical Society (MNRAS) 464(3):25772596, 2017.
Michele Sasdelli, E O Ishida, R Vilalta, M Aguena, V C Busti, H Camacho, A M M Trindade, Fabian Gieseke, R S Souza, Y T Fantaye, and P A Mazzali. Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: A machine learning approach. Monthly Notices of the Royal Astronomical Society (MNRAS) 461(2):20442059, 2016.
Jan Kremer, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Nearest Neighbor Density Ratio Estimation for LargeScale Applications in Astronomy. Astronomy and Computing 12:62–72, 2015.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Fast and Simple GradientBased Optimization for SemiSupervised Support Vector Machines. Neurocomputing (ICPRAM 2012 Special Issue) 123(10):2332, 2014.
Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. On Unsupervised Training of MultiClass Regularized LeastSquares Classifiers. Journal of Computer Science and Technology (ICDM 2012 Special Issue) 29(1):90–104, 2014.
Fabian Gieseke. From Supervised to Unsupervised Support Vector Machines and Applications in Astronomy. KI – Künstliche Intelligenz (abstract of my PhD thesis) 27(3):281285, 2013.
Oliver Kramer, Fabian Gieseke, and Kai Lars Polsterer. Learning Morphological Maps of Galaxies with Unsupervised Regression. Expert Systems with Applications 40(8):28412844, 2013.
Kai Lars Polsterer, Peter Zinn, and Fabian Gieseke. Finding New HighRedshift Quasars by Asking the Neighbours. Monthly Notices of the Royal Astronomical Society (MNRAS) 428(1):226235, 2013.
Oliver Kramer, Fabian Gieseke, and Benjamin Satzger. Wind Energy Prediction and Monitoring with Neural Computation. Neurocomputing 109(0):8493, 2013.
Fabian Gieseke, Gabriel Moruz, and Jan Vahrenhold. Resilient Kd Trees: KMeans in Space Revisited. Frontiers of Computer Science (ICDM 2010 Special Issue) 6(2):166178, 2012.
Oliver Kramer and Fabian Gieseke. Evolutionary Kernel Density Regression. Expert Systems with Applications 10(39):92469254, 2012.
Fabian Gieseke, Oliver Kramer, Antti Airola, and Tapio Pahikkala. Efficient Recurrent Local Search Strategies for Semi and Unsupervised Regularized LeastSquares Classification. Evolutionary Intelligence 5(3):189205, 2012.
Fabian Gieseke, Joachim Gudmundsson, and Jan Vahrenhold. Pruning Spanners and Constructing WellSeparated Pair Decompositions in the Presence of Memory Hierarchies. Journal of Discrete Algorithms (JDA) 8(3):259272, 2010.
PeerReviewed Conference and Workshop Contributions
Fabian Gieseke, Sabina Rosca, Troels Henriksen, Jan Verbesselt, and Cosmin Oancea. MassivelyParallel Change Detection for Satellite Time Series Data with Missing Values. In 36th IEEE International Conference on Data Engineering (ICDE). 2020, accepted.
Stefan Oehmcke, Christoffer Thrysøe, Andreas Borgstad, Marcos Antonio Vaz Salles, Martin Brandt, and Fabian Gieseke. Detecting Hardly Visible Roads in LowResolution Satellite Time Series Data. In Proceedings of the IEEE BigData 2019 Conference (Special Session on Intelligent Data Mining). 2019.
Vinnie Ko, Stefan Oehmcke, and Fabian Gieseke. Magnitude and Uncertainty Pruning Criterion for Neural Networks. In Proceedings of the IEEE BigData 2019 Conference (Special Session on Intelligent Data Mining). 2019.
Fabian Gieseke, Cosmin Eugen Oancea, Ashish Mahaba, Christian Igel, and Tom Heskes. Bigger Buffer kd Trees on MultiManyCore Systems. In High Performance Computing for Computational Science – VECPAR 2018. 2018, 202–214. draft
Fabian Gieseke and Christian Igel. Training Big Random Forests with Little Resources. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2018, 1445–1454.
Malte Mehren, Fabian Gieseke, Jan Verbesselt, Sabina Rosca, Stéphanie Horion, and Achim Zeileis. MassivelyParallel Break Detection for Satellite Data. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management (SSDBM). 2018, Accepted. draft
Fabian Gieseke, Kai Polsterer, Ashish Mahabal, Christian Igel, and Tom Heskes. MassivelyParallel Best Subset Selection for Ordinary LeastSquares Regression. In IEEE Symposium Series on Computational Intelligence (SSCI). 2017, in press.
Ashish Mahabal, Kshiteej Sheth, Fabian Gieseke, Akshay Pai, George Djorgovski, Andrew Drake, and Matthew Graham. DeepLearnt Classification of Light Curves. In IEEE Symposium Series on Computational Intelligence (SSCI). 2017, in press.
Corneliu Florea, Cosmin Toca, and Fabian Gieseke. Artistic Movement Recognition by Boosted Fusion of Color Structure and Topographic Description. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 2017, 569–577.
Kai Lars Posterer, Fabian Gieseke, Christian Igel, Bernd Doser, and Nikos Gianniotis. Parallelized rotation and flipping INvariant Kohonen maps (PINK ) on GPUs. In Proceedings of the 24nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 2016, 405410.
Fabian Gieseke, Tapio Pahikkala, and Tom Heskes. Batch SteepestDescentMildestAscent for Interactive Maximum Margin Clustering. In Proceedings of the 14th International Symposium on Intelligent Data Analysis. Advances in Intelligent Data Analysis XIV 9385. 2015, 95–107.
Fabian Gieseke. An Efficient ManyCore Implementation for SemiSupervised Support Vector Machines. In International Workshop on Machine Learning, Optimization, and Big Data (MOD2015). 2015, 145–157.
Oliver Kramer, Fabian Gieseke, Justin Heinermann, Jendrik Poloczek, and Nils Treiber. A Framework for Data Mining in Wind Power Time Series. In Proceedings of the 2nd ECML/PKDD 2014 International Workshop on Data Analytics for Renewable Energy Integration (DARE'14), Lecture Notes in Compute Science 8817. 2014, 97107.
Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. Buffer kd Trees: Processing Massive Nearest Neighbor Queries on GPUs. In Proceedings of the 31st International Conference on Machine Learning (ICML) 32(1). 2014, 172180.
Fabian Gieseke, Kai Lars Posterer, Cosmin Oancea, and Christian Igel. Speedy Greedy Feature Selection: Better Redshift Estimation via Massive Parallelism. In Proceedings of the 22nd European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN). 2014, 87–92.
Fabian Gieseke, Tapio Pahikkala, and Christian Igel. Polynomial Runtime Bounds for FixedRank Unsupervised LeastSquares Classification. In Proceedings of the 5th Asian Conference on Machine Learning (ACML). 2013, 6271.
Justin Heinermann, Oliver Kramer, Kai Lars Polsterer, and Fabian Gieseke. On GPUBased Nearest Neighbor Queries for LargeScale Photometric Catalogs in Astronomy. In KI 2013: Advances in Artificial Intelligence. Lecture Notes in Computer Science series, volume 8077, Springer, 2013, pages 8697.
Oliver Kramer, Nils Treiber, and Fabian Gieseke. Machine Learning in Wind Energy Information Systems. In EnviroInfo. 2013, 1624.
Fabian Gieseke and Oliver Kramer. Towards NonLinear Constraint Estimation for Expensive Optimization. In EvoApplications. 2013, 459468.
Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised MultiClass Regularized LeastSquares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585594.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Sparse QuasiNewton Optimization for SemiSupervised Support Vector Machines. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM). 2012, 4554.
Oliver Kramer and Fabian Gieseke. ShortTerm Wind Energy Forecasting Using Support Vector Regression. In Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications. 2011, 271280.
Fabian Gieseke, Oliver Kramer, Antti Airola, and Tapio Pahikkala. Speedy Local Search for SemiSupervised Regularized LeastSquares. In Proceedings of the 34th Annual German Conference on Artificial Intelligence. 2011, 8798.
Oliver Kramer and Fabian Gieseke. Variance Scaling for EDAs Revisited. In Proceedings of the 34th Annual German Conference on Artificial Intelligence. 2011, 169178.
Oliver Kramer and Fabian Gieseke. Analysis of wind energy time series with kernel methods and neural networks. In Proceedings of the 7th International Conference on Natural Computation. 2011, 23812385.
Fabian Gieseke, Gabriel Moruz, and Jan Vahrenhold. Resilient Kd Trees: KMeans in Space Revisited. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). 2010, 815820.
Fabian Gieseke, Kai Lars Polsterer, Andreas Thom, Peter Zinn, Dominik Bomans, RalfJürgen Dettmar, Oliver Kramer, and Jan Vahrenhold. Detecting Quasars in LargeScale Astronomical Surveys. In Proceedings of the 9th International Conference on Machine Learning and Applications (ICMLA). 2010, 352357.
Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast Evolutionary Maximum Margin Clustering. In Proceedings of the 26th International Conference on Machine Learning (ICML). 2009, 361368.
Fabian Gieseke and Jan Vahrenhold. CacheOblivious Construction of a WellSeparated Pair Decomposition. In Proceedings of the 25th European Workshop on Computational Geometry. 2009, 341344.
Evgeni Tsivtsivadze, Fabian Gieseke, Tapio Pahikkala, Jorma Boberg, and Tapio Salakoski. Learning Preferences with CoRegularized LeastSquares. In Proceedings of the ECML/PKDD Workshop on Preference Learning. 2008, 5266.
Theses
Fabian Gieseke. From Supervised to Unsupervised Support Vector Machines and Applications in Astronomy. PhD thesis, Carl von Ossietzky Universität Oldenburg, 2011.
Fabian Gieseke. Algorithmen zur Konstruktion und Ausdünnung von SpannerGraphen im CacheObliviousModell. Diplomarbeit, Westfälische WilhelmsUniversität Münster (in German), 2006.
Other Contributions
Michele Sasdelli, Emille O Ishida, R Vilalta, M Aguena, V C Busti, H Camacho, A M M Trindade, Fabian Gieseke, R S Souza, Y T Fantaye, and P A Mazzali. Exploring the spectroscopic diversity of type Ia supernovae with DRACULA: a machine learning approach. CoRR abs/1512.06810, 2015.
Fabian Gieseke, Cosmin E Oancea, Ashish Mahabal, Christian Igel, and Tom Heskes. Bigger Buffer kd Trees on MultiManyCore Systems. CoRR abs/1512.02831, 2015.
Kristoffer StensboSmidt, Fabian Gieseke, Chstian Igel, Andrew Zirm, and Kim Steenstrup Pedersen. Simple, Fast and Accurate Photometric Estimation of Specific Star Formation Rate. CoRR abs/1511.05424, 2015.
Fabian Gieseke. Big Data Analytics in Astronomy... using the supercomputer under your desk!. Big Data in a Transdisciplinary Perspective. 7th Herrenhausen Conference of the Volkswagen Foundation, 2527 March 2015, Herrenhausen Palace, 2015.
Kai Polsterer, Fabian Gieseke, and Christian Igel. Automatic Classification of Galaxies via Machine Learning Techniques  Parallelized Rotation/Flipping Invariant Kohonen Map (PINK). In Proceedings of the 24th Annual Astronomical Data Analysis Software & Systems conference (ADASS). 2014.
Kai Polsterer, Fabian Gieseke, Christian Igel, and Tomotsugu Goto. Improving the Performance of Photometric Regression Models via Massive Parallel Feature Selection. In Proceedings of the 23rd Annual Astronomical Data Analysis Software & Systems conference (ADASS). 2013.
Fabian Gieseke. Von uberwachten zu unuberwachten SupportVektorMaschinen und Anwendungen in der Astronomie. In Steffen Hölldobler et al. (ed.). Ausgezeichnete Informatikdissertationen 2012. GIEdition Lecture Notes in Informatics (LNI), D13 series, Bonner Köllen Verlag (in German), 2013.
Fabian Gieseke, Kai Polsterer, and Peter Zinn. Photometric Redshift Estimation of Quasars: Local versus Global Regression. In Proceedings of the 21st Annual Astronomical Data Analysis Software & Systems conference (ADASS). 2011.
Kai Polsterer, Fabian Gieseke, and Oliver Kramer. Galaxy Classification without Feature Extraction. In Proceedings of the 21st Annual Astronomical Data Analysis Software & Systems conference (ADASS). 2011.
Fabian Gieseke, Kai Polsterer, Andreas Thom, Peter Zinn, Dominik Bomans, RalfJürgen Dettmar, Oliver Kramer, and Jan Vahrenhold. Identifying Quasars in LargeScale Spectroscopic Surveys. Astroinformatics 2010, Poster, Pasadena, 2010.
Kai Polsterer, Fabian Gieseke, Andreas Thom, Oliver Kramer, Jan Vahrenhold, Dominik Bomans, and RalfJürgen Dettmar. Discriminating Point and Extended Sources with kNearest Neighbors. Astroinformatics 2010, Poster, Pasadena, 2010.
Fabian Gieseke. Regularized LeastSquares for Maximum Margin Clustering and SemiSupervised Classification. Machine Learning Summer School, Cambridge, UK, 2009.
Fabian Gieseke. Constructing and Pruning Spanners in the CacheOblivious Model. Summer School on Algorithmic Data Analysis (SADA07), Helsinki, Finland, 2007.
Research
The field of data analysis (e.g., data mining, machine learning, algorithm engineering, ...) has gained more and more attention in recent years. One of the reasons for this phenomenon is the fact that the data volumes have increased dramatically during the last decade, leading to socalled big dataproblems. This is the case, for instance, in astronomy, where current and upcoming projects like the Sloan Digital Sky Survey (SDSS) or the Large Synoptic Sky Telescope (LSST) gather and will gather data in the tera and petabyte range. For such projects, the sheer data volume renders a manual analysis impossible, and this necessitates the use of automatic data analysis tools.
The corresponding datarich scenarios often involve a large number of patterns (e.g., number of galaxy images) and/or a large number of dimensions (e.g., pixels per image). Further, a general lack of "labeled data" can often be observed, since the manual interaction with experts can be very timeconsuming. Dealing with these situations usually requires the adaptation of standard data analysis techniques, and this is part of my research. In particular, I am interested in the following research fields/projects:
LargeScale Data Science
In most cases, the sheer data volumes render a manual analysis impossible. Data science techniques aim at “extracting” knowledge in an automatic manner and have been identified as one of the key drivers for discoveries and innovation. My past and current research activities aim at both the development of scalable algorithms that can effectively deal with huge amounts of data as well as the particular application of such tools to realworld problems.
My current research aims at the use of "cheap"' massivelyparallel HPC systems to reduce the practical runtime of existing approaches. In contrast to conventional architectures, such modern systems are based on a huge amount of "small" specialized compute units, which are wellsuited for massivelyparallel implementations. The adaptation of data mining and machine learning techniques to such special hardware architectures has gained considerable attention during the last years. In case one can successfully adapt an approach to the specific needs of such systems, one can often achieve a significant runtime reduction (at much lower costs compared to the ones induced by traditional parallel computing architectures). Energy Systems In recent years, there has been a significant increase in energy produced by sustainable resources like wind and solar power plants. This led to a shift of traditional energy systems to socalled smart grids (i. e., distributed systems of energy suppliers and consumers). While the sustainable energy resources are very appealing from an environmental point of view, their volatileness renders the integration into the overall energy system difficult.
For this reason, shortterm wind and solar energy prediction systems are essential for balance authorities to schedule spinning reserves and reserve energy. This task can be formalized as regression problem (with patterns based on, e.g., wind turbine measurements), and the resulting models are wellsuited for shortterm forecasting scenarios, see below for details. 
Big Data in Astronomy
Modern telescopes and satellites can gather huge amounts of data. Current catalogs, for instance, contain data in the terabyte range; upcoming projects will encompass petabytes of data. On the one hand, this datarich situation offers the opportunity to make new discoveries like detecting new, distant objects. On the other hand, managing such data volumes can be very difficult and usually leads to problemspecific challenges.
I am involved in the development of redshift estimation models (e.g., regression models) for socalled quasistellar radio sources (quasars), which are among the most distant objects that can be observed from Earth. To efficiently process the large data volumes, we make use of spatial data structures (like kdtrees), which can be applied for various other tasks as well. See the publications below for more details.
Semi and Unsupervised Learning The task of classifying patterns is among the most prominent ones in the field of machine learning. Support vector machines depict stateoftheart tools for this task and have been extended to various learning settings including other supervised learning tasks (e.g., regression or preference learning) but also to socalled semi and unsupervised scenarios.
Among these extensions are, for instance, semisupervised support vector machines, which take additional unlabeled patterns into account (left: black points). This additional information reveals more information about the "structure" of the data and can lead to models with a better performance. In some cases, no labeled patterns at all are given. This leads to the socalled maximum margin clustering problem. While being very appealing from a practical point of view, both variants induce difficult combinatorial optimization problems, which renders a direct application of these extensions difficult.
Developing efficient optimization schemes for these variants is part of my research; see below for corresponding publications or here for an implemetation. Both support vector machines as well as their extensions can successfully by applied for, e.g., text data, which stems from various application domains like ecommerce or social media.


Selected Publications
A complete list of my publications can be found here.
LargeScale Data Science
Fabian Gieseke and Christian Igel. Training Big Random Forests with Little Resources. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 2018, Accepted. draft
Malte Mehren, Fabian Gieseke, Jan Verbesselt, Sabina Rosca, Stéphanie Horion, and Achim Zeileis. MassivelyParallel Break Detection for Satellite Data. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management (SSDBM). 2018, Accepted. draft
Fabian Gieseke, Justin Heinermann, Cosmin Oancea, and Christian Igel. Buffer kd Trees: Processing Massive Nearest Neighbor Queries on GPUs. In Proceedings of the 31st International Conference on Machine Learning (ICML) 32(1). 2014, 172180.
Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised MultiClass Regularized LeastSquares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585594.
Fabian Gieseke, Gabriel Moruz, and Jan Vahrenhold. Resilient Kd Trees: KMeans in Space Revisited. In Proceedings of the 10th IEEE International Conference on Data Mining (ICDM). 2010, 815820.
Big Data in Astronomy
Fabian Gieseke, Steven Bloemen, Cas Bogaard, Tom Heskes, Jonas Kindler, Richard A Scalzo, Valerio A R M Ribeiro, Jan Roestel, Paul J Groot, Fang Yuan, Anais Möller, and Brad E Tucker. Convolutional Neural Networks for Transient Candidate Vetting in LargeScale Surveys. Monthly Notices of the Royal Astronomical Society (MNRAS), 2017.
Jan Kremer, Kristoffer StensboSmidt, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy. IEEE Intelligent Systems 32(2):16–22, 2017.
Jan Kremer, Fabian Gieseke, Kim Steenstrup Pedersen, and Christian Igel. Nearest Neighbor Density Ratio Estimation for LargeScale Applications in Astronomy. Astronomy and Computing 12:62–72, 2015.
Kai Lars Polsterer, Peter Zinn, and Fabian Gieseke. Finding New HighRedshift Quasars by Asking the Neighbours. Monthly Notices of the Royal Astronomical Society (MNRAS) 428(1):226235, 2013.
Fabian Gieseke, Kai Lars Polsterer, Andreas Thom, Peter Zinn, Dominik Bomans, RalfJürgen Dettmar, Oliver Kramer, and Jan Vahrenhold. Detecting Quasars in LargeScale Astronomical Surveys. In Proceedings of the 9th International Conference on Machine Learning and Applications (ICMLA). 2010, 352357.
Energy Systems
Oliver Kramer, Nils Treiber, and Fabian Gieseke. Machine Learning in Wind Energy Information Systems. In EnviroInfo. 2013, 1624.
Oliver Kramer, Fabian Gieseke, and Benjamin Satzger. Wind Energy Prediction and Monitoring with Neural Computation. Neurocomputing 109(0):8493, 2013.
Oliver Kramer and Fabian Gieseke. ShortTerm Wind Energy Forecasting Using Support Vector Regression. In Proceedings of the International Conference on Soft Computing Models in Industrial and Environmental Applications. 2011, 271280.
Semi and Unsupervised Learning
Fabian Gieseke. An Efficient ManyCore Implementation for SemiSupervised Support Vector Machines. In International Workshop on Machine Learning, Optimization, and Big Data (MOD2015). 2015, 145–157.
Fabian Gieseke, Tapio Pahikkala, and Christian Igel. Polynomial Runtime Bounds for FixedRank Unsupervised LeastSquares Classification. In Proceedings of the 5th Asian Conference on Machine Learning (ACML). 2013, 6271.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Fast and Simple GradientBased Optimization for SemiSupervised Support Vector Machines. Neurocomputing (ICPRAM 2012 Special Issue) 123(10):2332, 2014.
Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised MultiClass Regularized LeastSquares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585594.
Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast Evolutionary Maximum Margin Clustering. In Proceedings of the 26th International Conference on Machine Learning (ICML). 2009, 361368.
QuasiNewton SemiSupervised Support Vector Machines
A quasiNewton optimization framework implemented in Python using the Numpy and the Scipy packages. This type of model depicts an extension of support vector machines to semisupervised learning settings with both labeled and unlabeled patterns given in the training phase: In contrast to standard support vector machines (left), the model takes the additional unlabeled patterns (right, black points) into account to reveal more information about the structure of the data. The QNS3VM framework can handle linear and nonlinear kernels. In addition, the special case of sparse data (given a linear kernel) can also be handled efficiently.
Source Code
The code is free for scientific use. In case you are planning to use (parts of) the software for commercial purposes, please contact me. If you use the code for scientific work, please use the reference(s) below to cite us. The source code can be downloaded here.
The code contains three examples for sparse and nonsparse data set instances. If you find any bugs or if you have problems with the code, feel free to contact us via email.
History
August 2012: Initial Release
References
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Fast and Simple GradientBased Optimization for SemiSupervised Support Vector Machines. Neurocomputing (ICPRAM 2012 Special Issue) 123(10):2332, 2014.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Sparse QuasiNewton Optimization for SemiSupervised Support Vector Machines. In Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods (ICPRAM 2012). 2012, 4554.
Disclaimer
The implementation is free for noncommercial use only. It is not allowed to redistribute the software without permission of the authors. Further, the authors are not responsible for any implications that stem from its use.
Support Vector Machines
The task of classifying patterns is among the most prominent ones in the field of machine learning. Support vector machines depict stateoftheart tools for this tasks and have been extended to various learning settings.
Among these extensions are, for instance, semisupervised support vector machines that take additional unlabeled patterns into account (black points). This additional information reveals more information about the structure of the data and can lead to better models. In some cases, no labeled patterns at all are given, which leads to the socalled maximum margin clustering problem.
Fabian Gieseke. An Efficient ManyCore Implementation for SemiSupervised Support Vector Machines. In International Workshop on Machine Learning, Optimization, and Big Data (MOD2015). 2015, 145–157.
Fabian Gieseke, Tapio Pahikkala, and Christian Igel. Polynomial Runtime Bounds for FixedRank Unsupervised LeastSquares Classification. In Proceedings of the 5th Asian Conference on Machine Learning (ACML). 2013, 6271.
Fabian Gieseke, Antti Airola, Tapio Pahikkala, and Oliver Kramer. Fast and Simple GradientBased Optimization for SemiSupervised Support Vector Machines. Neurocomputing (ICPRAM 2012 Special Issue) 123(10):2332, 2014.
Tapio Pahikkala, Antti Airola, Fabian Gieseke, and Oliver Kramer. Unsupervised MultiClass Regularized LeastSquares Classification. In Proceedings of the 12th IEEE International Conference on Data Mining (ICDM). 2012, 585594.
Fabian Gieseke, Tapio Pahikkala, and Oliver Kramer. Fast Evolutionary Maximum Margin Clustering. In Proceedings of the 26th International Conference on Machine Learning (ICML). 2009, 361368.