## Monday, July 25, 2016

### Onsager-corrected deep learning for sparse linear inverse problems

Thomas let me know on Twitter that the Great Convergence continues, Today we find out how we go about changing the iterative process of AMP and then learn coefficients of that process as in Deep Learning. It looks like the Learned AMP beats LISTA. Looking back at the few COLT presentations earlier (Saturday videos), one wonders how these solvers change the rule of thumbs on model depth and size. To be continued....

Deep learning has gained great popularity due to its widespread success on many inference problems. We consider the application of deep learning to the sparse linear inverse problem encountered in compressive sensing, where one seeks to recover a sparse signal from a small number of noisy linear measurements. In this paper, we propose a novel neural-network architecture that decouples prediction errors across layers in the same way that the approximate message passing (AMP) algorithm decouples them across iterations: through Onsager correction. Numerical experiments suggest that our "learned AMP" network significantly improves upon Gregor and LeCun's "learned ISTA" network in both accuracy and complexity.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Saturday, July 23, 2016

### Saturday Morning Video: On the Expressive Power of Deep Learning: A Tensor Analysis, Nadav Cohen, Or Sharir, Amnon Shashua @ COLT2016

The preprint on which it relies is:

On the Expressive Power of Deep Learning: A Tensor Analysis by Nadav Cohen, Or Sharir, Amnon Shashua

It has long been conjectured that hypotheses spaces suitable for data that is compositional in nature, such as text or images, may be more efficiently represented with deep hierarchical networks than with shallow ones. Despite the vast empirical evidence supporting this belief, theoretical justifications to date are limited. In particular, they do not account for the locality, sharing and pooling constructs of convolutional networks, the most successful deep learning architecture to date. In this work we derive a deep network architecture based on arithmetic circuits that inherently employs locality, sharing and pooling. An equivalence between the networks and hierarchical tensor factorizations is established. We show that a shallow network corresponds to CP (rank-1) decomposition, whereas a deep network corresponds to Hierarchical Tucker decomposition. Using tools from measure theory and matrix algebra, we prove that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be realized (or even approximated) by a shallow network. Since log-space computation transforms our networks into SimNets, the result applies directly to a deep learning architecture demonstrating promising empirical performance. The construction and theory developed in this paper shed new light on various practices and ideas employed by the deep learning community.
other recent work by Nadav include:

Convolutional Rectifier Networks as Generalized Tensor Decompositions by Nadav Cohen, Amnon Shashua

Convolutional rectifier networks, i.e. convolutional neural networks with rectified linear activation and max or average pooling, are the cornerstone of modern deep learning. However, despite their wide use and success, our theoretical understanding of the expressive properties that drive these networks is partial at best. On the other hand, we have a much firmer grasp of these issues in the world of arithmetic circuits. Specifically, it is known that convolutional arithmetic circuits possess the property of "complete depth efficiency", meaning that besides a negligible set, all functions that can be implemented by a deep network of polynomial size, require exponential size in order to be implemented (or even approximated) by a shallow network. In this paper we describe a construction based on generalized tensor decompositions, that transforms convolutional arithmetic circuits into convolutional rectifier networks. We then use mathematical tools available from the world of arithmetic circuits to prove new results. First, we show that convolutional rectifier networks are universal with max pooling but not with average pooling. Second, and more importantly, we show that depth efficiency is weaker with convolutional rectifier networks than it is with convolutional arithmetic circuits. This leads us to believe that developing effective methods for training convolutional arithmetic circuits, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional rectifier networks but has so far been overlooked by practitioners.

Inductive Bias of Deep Convolutional Networks through Pooling Geometry by Nadav Cohen, Amnon Shashua
Our formal understanding of the inductive bias that drives the success of convolutional networks on computer vision tasks is limited. In particular, it is unclear what makes hypotheses spaces born from convolution and pooling operations so suitable for natural images. In this paper we study the ability of convolutional arithmetic circuits to model correlations among regions of their input. Correlations are formalized through the notion of separation rank, which for a given input partition, measures how far a function is from being separable. We show that a polynomially sized deep network supports exponentially high separation ranks for certain input partitions, while being limited to polynomial separation ranks for others. The network's pooling geometry effectively determines which input partitions are favored, thus serves as a means for controlling the inductive bias. Contiguous pooling windows as commonly employed in practice favor interleaved partitions over coarse ones, orienting the inductive bias towards the statistics of natural images. In addition to analyzing deep networks, we show that shallow ones support only linear separation ranks, and by this gain insight into the benefit of functions brought forth by depth - they are able to efficiently model strong correlation under favored partitions of the input.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: Benefits of depth in neural networks, Matus Telgarsky @ COLT2016

Representation Benefits of Deep Feedforward Networks by Matus Telgarsky
This note provides a family of classification problems, indexed by a positive integer k, where all shallow networks with fewer than exponentially (in k) many nodes exhibit error at least 1/6, whereas a deep network with 2 nodes in each of 2k layers achieves zero error, as does a recurrent network with 3 distinct nodes iterated k times. The proof is elementary, and the networks are standard feedforward networks with ReLU (Rectified Linear Unit) nonlinearities.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: The Power of Depth for Feedforward Neural Networks by Ohad Shamir @ COLT2016

The attendant paper is here:

The Power of Depth for Feedforward Neural Networks by Ronen Eldan, Ohad Shamir

We show that there is a simple (approximately radial) function on $\reals^d$, expressible by a small 3-layer feedforward neural networks, which cannot be approximated by any 2-layer network, to more than a certain constant accuracy, unless its width is exponential in the dimension. The result holds for virtually all known activation functions, including rectified linear units, sigmoids and thresholds, and formally demonstrates that depth -- even if increased by 1 -- can be exponentially more valuable than width for standard feedforward neural networks. Moreover, compared to related results in the context of Boolean functions, our result requires fewer assumptions, and the proof techniques and construction are very different.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### Saturday Morning Video: Randomized Algorithms in Linear Algebra, Ravi Kannan @ COLT2016

As Sebastien pointed out the COLT 2016 videos are out. Here is one: Ravi Kannan on Randomized Algorithms in Linear Algebra

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Friday, July 22, 2016

### Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks - implementation -

Ivan, the person behind the Tensor Train tensor decomposition just sent me the following:

Dear Igor,
I have a new interesting paper to share. " Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks"
http://arxiv.org/abs/1607.04228
(accepted at ACM RecSys 2016).
A framework is also available: https://github.com/Evfro/polaraThe key idea is to introduce a tensor from user-item-rating, thus being able to recommend even from a negative feedback.

With best wishes,
Ivan.

Thanks Ivan !

Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks by Evgeny Frolov, Ivan Oseledets
Conventional collaborative filtering techniques treat a top-n recommendations problem as a task of generating a list of the most relevant items. This formulation, however, disregards an opposite - avoiding recommendations with completely irrelevant items. Due to that bias, standard algorithms, as well as commonly used evaluation metrics, become insensitive to negative feedback. In order to resolve this problem we propose to treat user feedback as a categorical variable and model it with users and items in a ternary way. We employ a third-order tensor factorization technique and implement a higher order folding-in method to support online recommendations. The method is equally sensitive to entire spectrum of user ratings and is able to accurately predict relevant items even from a negative only feedback. Our method may partially eliminate the need for complicated rating elicitation process as it provides means for personalized recommendations from the very beginning of an interaction with a recommender system. We also propose a modification of standard metrics which helps to reveal unwanted biases and account for sensitivity to a negative feedback. Our model achieves state-of-the-art quality in standard recommendation tasks while significantly outperforming other methods in the cold-start "no-positive-feedback" scenarios.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Wednesday, July 20, 2016

### Random projections of random manifolds

Looks like some bounds have been tightened. Funny enough, in a matter of a week, I am either talking to or featuring someone from SpaceX, which just happens to deliver today or tomorrow a nanopore sequencer to the international space station

Small worlds!

Anyway, getting back to the paper.

Random projections of random manifolds by Subhaneil Lahiri, Peiran Gao, Surya Ganguli
A ubiquitous phenomenon is that interesting signals or data concentrate on low dimensional smooth manifolds inside a high dimensional ambient Euclidean space. Random projections are a simple and powerful tool for dimensionality reduction of such signals and data. Previous, seminal works have studied bounds on how the number of projections needed to preserve the geometry of these manifolds, at a given accuracy, scales with the intrinsic dimensionality, volume and curvature of the manifold. However, such works employ definitions of volume and curvature that are inherently difficult to compute. Therefore such theory cannot be easily tested against numerical simulations to quantitatively understand the tightness of the proven bounds. We instead study the typical distortions arising in random projections of an ensemble of smooth Gaussian random manifolds. In doing so, we find explicitly computable, approximate theoretical bounds on the number of projections required to preserve the geometric structure of these manifolds to a prescribed level of accuracy. Our bounds, while approximate, can only be violated with a probability that is exponentially small in the ambient dimension, and therefore they hold with high probability in most cases of practical interest. Moreover, unlike previous work, we test our theoretical bounds against numerical experiments on the actual geometric distortions that typically occur for random projections of random smooth manifolds. Through this comparison, we find our bounds are tighter than previous results by several orders of magnitude.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Tuesday, July 19, 2016

### Our proposal for a NIPS workshop on "Mapping Machine Learning to Hardware" has been submitted !

You probably have noticed recently an acceleration of the offering when it comes to computing hardware in Machine Learning. You may have also noticed that some algorithms seem to take into account the constraints of being in the binary world or that new and innovative technologies were trying to get in that area without knowing for sure what primitives they were improving. You might even have worndered what sort of technology will bring us closer to better efficiencies while still being flexible in terms of algorithm design. Eventually, as algorithm designer, you mmight have wondered about the sort of constraints your algorithm should be fulfilling. All these questions and many others are the reasons for this recent announcement on Nuit Blanche ( A proposal for a NIPS Workshop or Symposium on "Mapping Machine Learning to Hardware").With a truly outstanding set of people, we have been able to get the commitment from the following people as plenary speakers so that they can enlighten us about what is going on when it comes to the mapping for Machine Learning to hardware and more importantly what is the State of the Possible:

Here is the final proposal we sent out last night to the gods of NIPS.

With the recent success of Deep Learning and related techniques, we are beginning to see new specialized hardware or extensions to existing architectures dedicated for making training and inference computations faster or energy efficient or both. These technologies use either traditional CMOS technology on conventional von-Neumann architectures such as CPUs or accelerators such as DSPs, GPUs, FPGAs, and ASICs or other novel and exotic technologies in research phase such as  Memristors, Quantum Chips,  and Optical/Photonics. The overarching goal being to address a specific trade-off in mapping machine learning algorithms in general and deep learning in particular, to a specific underlying hardware technology.

Conversely, there has been quite an effort on the empirical side at devising deep network architectures for efficient implementation on low complexity hardware via low-rank tensor factorizations, structured matrix approximations, lower bit-depth like binary coefficients, compression and pruning to name a few approaches. This also has implications on leveraging appropriate hardware technology for inferencing primarily with energy and latency as the primary design goals.

These efforts are finding some traction in the signal processing and sparse/compressive sensing community to map the first layers of sensing hardware with the first layers of models such as deep networks. This approach has the potential of changing the way  sensing hardware, image reconstruction, signal processing and image understanding will be performed in the future.

This workshop aims to tie these seemingly disparate themes of co-design, architecture, algorithms and signal processing and bring together researchers at the interface of machine learning, hardware implementation, sensing, physics and biology for discussing the state of the art and the state of the possible.

The goals of the workshop are
• To present how machine learning computations and algorithms are mapped and co-designed with new hardware technologies and architectures
• To stimulate discussions on how machine learning algorithms can be co-designed or co-optimized with the underlying hardware technology to best take advantage of the synergy
• To evaluate the different trade-offs in accuracy, computational complexity, hardware cost, energy efficiency and application throughput currently investigated in these approaches
• To understand how data acquisition frameworks may be transformed to better interface with machine Learning algorithms
• To evaluate the constraints put forth on recent deep learning architectures so as to reduce redundancy and enable a simpler mapping between computing hardware and models.
• To enable a wider discussion on how Machine Learning algorithms and their primitives may provide a path toward exotic technology insertion in industrial roadmaps.

Besides the presentations made by the plenary speakers, there will be a poster session, lightning talks selected from the posters and a roundtable at the conclusion of the workshop. The workshop will be taped and presentations slides will be made available online. A white paper describing the talks and the discussions that went on during the workshop will be made available after the meeting.
Thanks to the following co-organizers, this proposal became a reality:
and a few other friends who transmitted the call inside their organizations.   Thank you !
Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

### DeepBinaryMask: Learning a Binary Mask for Video Compressive Sensing

The Great Convergence continues in compressive sensing hardware and machine learning:

DeepBinaryMask: Learning a Binary Mask for Video Compressive Sensing by Michael Iliadis, Leonidas Spinoulas, Aggelos K. Katsaggelos
In this paper, we propose a novel encoder-decoder neural network model referred to as DeepBinaryMask for video compressive sensing. In video compressive sensing one frame is acquired using a set of coded masks (sensing matrix) from which a number of video frames is reconstructed, equal to the number of coded masks. The proposed framework is an end-to-end model where the sensing matrix is trained along with the video reconstruction. The encoder learns the binary elements of the sensing matrix and the decoder is trained to recover the unknown video sequence. The reconstruction performance is found to improve when using the trained sensing mask from the network as compared to other mask designs such as random, across a wide variety of compressive sensing reconstruction algorithms. Finally, our analysis and discussion offers insights into understanding the characteristics of the trained mask designs that lead to the improved reconstruction quality.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.

## Monday, July 18, 2016

### Single-shot diffuser-encoded light field imaging

Oh ! A new compressive sensing hardware, woohoo !

Single-shot diffuser-encoded light field imaging by Nicholas Antipa , Sylvia Necula , Ren Ng , Laura Waller

We capture 4D light field data in a single 2D sensor image by encoding spatio-angular information into a speckle field (causticpattern) through a phase diffuser. Using wave-optics theory and a coherent phase retrieval method, we calibrate the system by measuring the diffuser surface height from through-focus images. Wave-optics theory further informs the design of system geometry such that a purely additive ray-optics model is valid. Light field reconstruction is done using nonlinear matrix inversion methods, including l_1 minimization. We demonstrate a prototype system and present empirical results of 4D light field reconstruction and computational refocusing from a single diffuser-encoded 2D image.

Join the CompressiveSensing subreddit or the Google+ Community or the Facebook page and post there !
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email, explore the Big Picture in Compressive Sensing or the Matrix Factorization Jungle and join the conversations on compressive sensing, advanced matrix factorization and calibration issues on Linkedin.