Tuesday, July 29, 2014

Tight convex relaxations for sparse matrix factorization - implementation -


Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and low-rank sparse bilinear regression as potential applications. We compute slow rates and an upper bound on the statistical dimension of the suggested norm for rank 1 matrices, showing that its statistical dimension is an order of magnitude smaller than the usual ℓ1-norm, trace norm and their combinations. Even though our convex formulation is in theory hard and does not lead to provably polynomial time algorithmic schemes, we propose an active set algorithm leveraging the structure of the convex problem to solve it and show promising numerical results.
The implementation of the Sparse PCA using various methods and some sparsely factorized matrices using the (k,q)-trace norm as penalty are on Emile Richard's software page. They will be added to the Advanced Matrix Factorization Jungle page shortly.


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Monday, July 28, 2014

Compressed Subspace Matching on the Continuum / Subspace Learning From Bits

Two different techniques to learn more about the subspace you're in.



Compressed Subspace Matching on the Continuum by William Mantzel, Justin Romberg

We consider the general problem of matching a subspace to a signal in R^N that has been observed indirectly (compressed) through a random projection. We are interested in the case where the collection of K-dimensional subspaces is continuously parameterized, i.e. naturally indexed by an interval from the real line, or more generally a region of R^D. Our main results show that if the dimension of the random projection is on the order of K times a geometrical constant that describes the complexity of the collection, then the match obtained from the compressed observation is nearly as good as one obtained from a full observation of the signal. We give multiple concrete examples of collections of subspaces for which this geometrical constant can be estimated, and discuss the relevance of the results to the general problems of template matching and source localization.

Subspace Learning From Bits by Yuejie Chi

This paper proposes a simple sensing and estimation framework to faithfully recover the principal subspace of high-dimensional datasets or data streams from a collection of one-bit measurements from distributed sensors based on comparing accumulated energy projections of their data samples of dimension n over pairs of randomly selected directions. By leveraging low-dimensional structures, the top eigenvectors of a properly designed surrogate matrix is shown to recover the principal subspace of rank $r$ as soon as the number of bit measurements exceeds the order of $nr^2 \log n$, which can be much smaller than the ambient dimension of the covariance matrix. The sample complexity to obtain reliable comparison outcomes is also obtained. Furthermore, we develop a low-complexity online algorithm to track the principal subspace that allows new bit measurements arrive sequentially. Numerical examples are provided to validate the proposed approach.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Mondrian Forests: Efficient Online Random Forests - implementation -

It looks Mondrian inspired a few images in the mind of people ( CS: The Boogie Woogie Grid ). Here is different version of a random forest, only online this time:  




Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically re-trained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff.

The GitHub repository for the implementation of the Mondrian Forest  is here.

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Saturday, July 26, 2014

Saturday Morning Videos: 27th Annual Conference on Learning Theory (COLT), Barcelona 2014

 All the videos of the 27th Annual Conference on Learning Theory (COLT), Barcelona 2014 are here.


Here are a few:

Sequential Learning



Compressed Counting Meets Compressed Sensing

Ping Li


Invited Speakers


Invited Speakers

Unsupervised Learning; Dictionary Learning; Latent Variable Models


Unsupervised Learning; Mixture Models

Online Learning



Computational Learning Theory/Algorithmic Results



Computational Learning Theory/Lower Bounds





Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Friday, July 25, 2014

It's Friday afternoon, it's Hamming's time: Higher Order Antipode Maps ?






Could the second map be roughly approximated through the use of correlation between zero-th, first and higher order harmonic on the sphere ?


Last Call for Contributions - TCMM 2014 International Workshop on Technical Computing for Machine Learning and Mathematical Engineering, 8 - 12 September, 2014 - Leuven, Belgium


Marco just sent me the following:

Dear Igor,
could you be so kind to post the attached CFP  (last call for contributions - TCMM 2014) on Nuit Blanche?
kind regards
Marco
--
dr. Marco Signoretto
FWO research fellow,
ESAT - STADIUS,
Katholieke Universiteit Leuven,
Kasteelpark Arenberg 10, B-3001 LEUVEN - HEVERLEE (BELGIUM)
Homepage: http://homes.esat.kuleuven.be/~msignore/
Sure Marco, here is the call:
                                                                     
                                                                     
                                                                     
                                             
Last Call for Contributions - TCMM 2014 International Workshop on Technical Computing for Machine Learning and Mathematical Engineering
8 - 12 September, 2014 - Leuven, Belgium
Workshop homepage: http://www.esat.kuleuven.be/stadius/tcmm2014/
The workshop will provide a venue for researchers and practitioners to interact on the latest developments in technical computing in relation to machine learning and mathematical engineering problems and methods (including also optimization, system identification, computational statistics, signal processing, data visualization, deep learning, compressed sensing and big-data). The emphasis is especially on open-source implementations in high-level programming languages, including but not limited to Julia, Python, Scala and R. For further information see the workshop homepage.
The 3 days main event (8-10 September) will consist of invited and contributed talks as well as poster presentations. It will be followed by a 2 days additional event (11-12 September) including software demos and hands-on tutorials on selected topics. 
Attendees can register to the main event only or to the full workshop. Submission of extended abstracts are solicited for the main event. Submission of demo presentations are solicited for the two days additional event. For further information (including Registration, Location and Venue) see details at the workshop website.
Important dates: 
Deadline extended abstract/demo submission: 31 July 2014
Deadline for registration: 1 September 2014
Confirmed invited speakers (talks and tutorials): 
James Bergstra, Center for Theoretical Neuroscience, University of Waterloo:
Theano and Hyperopt: Modelling, Training, and Hyperparameter Optimization in Python
Jeff Bezanson, MIT:
TBA
Luis Pedro Coelho, European Molecular Biology Laboratory (EMBL):
Large Scale Analysis of Bioimages Using Python
Steven Diamond, Stanford University
Convex Optimization in Python with CVXPY
Stefan Karpinski, MIT
TBA
Graham Taylor, School of Engineering, University of Guelph:
An Overview of Deep Learning and Its Challenges for Technical Computing
Ewout van den Berg, IBM T.J. Watson Research Center:
Tools and Techniques for Sparse Optimization and Beyond
Organizing committee: 
Marco Signoretto, Department of Electrical Engineering, KU Leuven
Johan Suykens, Department of Electrical Engineering, KU Leuven
Vilen Jumutc , Department of Electrical Engineering, KU Leuven
For further information (including Registration, Location and Venue) see http://www.esat.kuleuven.be/stadius/tcmm2014/


Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Fast matrix completion without the condition number - implementation -


Fast matrix completion without the condition number by Moritz Hardt, Mary Wootters

We give the first algorithm for Matrix Completion whose running time and sample complexity is polynomial in the rank of the unknown target matrix, linear in the dimension of the matrix, and logarithmic in the condition number of the matrix. To the best of our knowledge, all previous algorithms either incurred a quadratic dependence on the condition number of the unknown matrix or a quadratic dependence on the dimension of the matrix in the running time. Our algorithm is based on a novel extension of Alternating Minimization which we show has theoretical guarantees under standard assumptions even in the presence of noise.

The attendant implementation is on Mary Wootters research page.

You can also watch a video of Mary at COLT:  Fast Matrix Completion Without the Condition Number, Mary Wootters






Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Thursday, July 24, 2014

The 200 Million Dollar Assumption

There used to be the Six Million Dollar Man, but now with inflation and all, we have the 200 Million Dollar Assumption. In some of the Slides from the workshop on Science on the Sphere, you may have noticed this slide from Tom Kitching on Weak lensing on the sphere



I went ahead and asked Tom what he meant by mentioning a 200M$ assumption, here is what he responded:

Dear Igor,

Thank you for your email.

The the plot on the slide is from this paper http://arxiv.org/pdf/1007.2953.pdf where we investigated the impact of the "Limber approximation" on weak lensing statistics and forecasted cosmological parameter error. The approximation replaces Bessel functions with delta functions to make calculations easier, and should be ok for small scales (where the functional approximation is asymptotically ok). What we found was that the predicted marginalised 1-sigma error bars using the Limber approximation are about 20% larger than when the approximation is not used.

Future experiments such as Euclid, LSST and SKA are to cost about 1 billion euros/dollars so a 20% increase in error bars on the key parameters is the equivalent cost of nearly 200M.

This was a slightly provocative statement aimed to stimulate discussion on how we can avoid this and other approximations to fully exploit these new experiments. In fact it reminds me of a sporting analogy where it is often said that the "aggregated effect of marginal gains" (e.g. http://www.bbc.co.uk/sport/0/olympics/19174302) can result in wins; where small differences can all add up to a significant difference.

I hope that helps, let me know if you have any further questions.

Best Regards
Tom
Thanks Tom !

Join the CompressiveSensing subreddit or the Google+ Community and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

Euclid in a Taxicab: Sparse Blind Deconvolution with Smoothed $\ell_1/ell_2$ Regularization

Laurent Duval just sent me the following:

Dear Igor
Nuit Blanche is a nest of choice for many sparsities. The ones of concern here are those approximated by an $l_1/l_2$, or Taxicab/Euclidean norm ratio, which was already covered in some of your posts: 
We propose in the following preprint a smoothed, parametrized penalty, termed SOOT for "Smoothed-One-Over-Two" norm ratio, with results on its theoretical convergence, and an algorithm based on proximal methods. It is applied to blind deconvolution, here for seismic data. We hope it could be of interest to your readership. 
[Abstract]
The $\ell_1/ell_2$ ratio regularization function has shown good performance for retrieving sparse signals in a number of recent works, in the context of blind deconvolution. Indeed, it benefits from a scale invariance property much desirable in the blind context.
However, the $\ell_1/ell_2$ function raises some difficulties when solving the nonconvex and nonsmooth minimization problems resulting from the use of such regularization penalties in current restoration methods.
In this paper, we propose a new penalty based on a smooth approximation to the $\ell_1/ell_2$ function. In addition, we develop a proximal-based algorithm to solve variational problems involving this function and we derive theoretical convergence results. We demonstrate the effectiveness of our method through a comparison with a recent alternating optimization strategy dealing with the exact $\ell_1/ell_2$ term, on an application to seismic data blind deconvolution.

[arXiv Link]
Amitiés
Laurent

******** Travail et autres activités/Work and misc. ********

Thanks Laurent !

From the paper: "The code will be made available at http://www-syscom.univ-mlv.fr/ upon the paper acceptance"

All the blind deconvolution blog entries are at: 


Wednesday, July 23, 2014

Slides: Summer School on Hashing: Theory and Applications, 2014



The webpage of the EADS Summer School on Hashing: Theory and Applications that took place on July 14-17, 2014 at University of Copenhagen, Denmark, now features the slides of the presentation made there:


OVERVIEW AND GOAL

Hashing is used everywhere in computing and is getting increasingly important with the exploding amount of data. The summer school will provide an in-depth introduction to hashing, both theory and applications. The topics range will from modern theory of hashing, to actual implementations of hash functions that are both efficient and provide the necessary probabilistic guarantees. Application areas will be studied, from sketching and data bases, to similarity estimation, and machine learning.


Rasmus Pagh Dictionaries with implicit keys (July 15) Michael Mitzenmacher
Bloom Filters and Such (July 14)
Cuckoo hashing and balanced allocations (July 15)
Mikkel Thorup
High speed hashing for integers and strings (July 14)
Reliable hashing for complex applications (July 15)
Graham Cormode