Restricted Boltzmann machine
A restricted Boltzmann machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs. RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986,[1] but only rose to prominence after Geoffrey Hinton and collaborators invented fast learning algorithms for them in the mid-2000s.

Restricted Boltzmann machine

From Wikipedia, the free encyclopedia
  (Redirected from Restricted Boltzmann Machine)
Diagram of a restricted Boltzmann machine with three visible units and four hidden units (no bias units).

restricted Boltzmann machine (RBM) is a generative stochastic neural network that can learn a probability distribution over its set of inputs. RBMs were initially invented under the name Harmonium by Paul Smolensky in 1986,[1] but only rose to prominence after Geoffrey Hinton and collaborators invented fast learning algorithms for them in the mid-2000s. RBMs have found applications in dimensionality reduction,[2]classification,[3] collaborative filteringfeature learning[4] and topic modelling.[5] They can be trained in either supervised or unsupervised ways, depending on the task.

As their name implies, RBMs are a variant of Boltzmann machines, with the restriction that their neurons must form a bipartite graph: they have input units, corresponding to features of their inputs, hidden units that are trained, and each connection in an RBM must connect a visible unit to a hidden unit. (By contrast, "unrestricted" Boltzmann machines may have connections between hidden units, making them recurrent networks.) This restriction allows for more efficient training algorithms than are available for the general class of Boltzmann machines, in particular the gradient-based contrastive divergence algorithm.[6]

Restricted Boltzmann machines can also be used in deep learning networks. In particular, deep belief networks can be formed by "stacking" RBMs in a particular way, and the weights of these can in turn be used to initialize the weights of multilayer perceptrons trained with backpropagation.[7]

 

 

Structure[edit]

The standard type of RBM has binary-valued (Boolean) hidden and visible units, and consists of a symmetric matrix of weights W = (w_{i,j}) associated with the connection between hidden unit h_j and visible unit v_i, as well as bias weights (offsets) a_i for the visible units and b_j for the hidden units. Given these, the energy of a configuration (v,h) can be computed as

E(v,h) = -\sum_i a_i v_i - \sum_j b_j h_j -\sum_i \sum_j h_j w_{i,j} v_i

or, in vector form,

E(v,h) = -a^{\mathrm{T}} v - b^{\mathrm{T}} h -h^{\mathrm{T}} W v

This energy function is analogous to that of a Hopfield network. As in general Boltzmann machines, probability distributions over hidden and/or visible vectors are defined in terms of the energy function:[8]

P(v,h) = \frac{1}{Z} e^{-E(v,h)}

where Z is a partition function defined as the sum of e^{-E(v,h)} over all possible configurations (in other words, just a normalizing constant to ensure the probability distribution sums to 1). Similarly, the (marginal) probability of a visible (input) vector of booleans is the sum over all possible hidden layer configurations:[8]

P(v) = \frac{1}{Z} \sum_h e^{-E(v,h)}

Since the RBM has the shape of a bipartite graph, with no intra-layer connections, the hidden unit activations are mutually independent given the visible unit activations and conversely, the visible unit activations are mutually independent given the hidden unit activations.[6] That is, for m visible units and n hidden units,

P(v|h) = \prod_{i=1}^m P(v_i|h)
P(h|v) = \prod_{j=1}^n P(h_j|v)

and the individual activation probabilities are given by

P(h_j=1|v) = \sigma(b_j + \sum_{i=1}^m w_{i,j} v_i)

and

P(v_i=1|h) = \sigma(a_i + \sum_{j=1}^n w_{i,j} h_j)

where \sigma denotes the logistic sigmoid.

Relation to other models[edit]

Restricted Boltzmann machines are a special case of Boltzmann machines and Markov random fields.[9][10] Their graphical model corresponds to that of factor analysis.[11]

Training algorithm[edit]

Restricted Boltzmann machines are trained to maximize the product of probabilities assigned to some training set V (a matrix, each row of which is treated as a visible vector v),

\arg\max_W \prod_{v \in V} P(v)

or equivalently, to maximize the expected log probability of V:[9][10]

\arg\max_W \mathbb{E} \left[\sum_{v \in V} \log P(v)\right]

The algorithm most often used to train RBMs, that is, to optimize the weight vector W, is the contrastive divergence (CD) algorithm due to Hinton, originally developed to train product of experts models.[12] The algorithm performs Gibbs sampling and is used inside a gradient descent procedure (similar to the way backpropagation is used inside such a procedure when training feedforward neural nets) to compute weight update.

The basic, single-step contrastive divergence (CD-1) procedure for a single sample can be summarized as follows:

  1. Take a training sample v, compute the probabilities of the hidden units and sample a hidden activation vector h from this probability distribution.
  2. Compute the outer product of v and h and call this the positive gradient.
  3. From h, sample a reconstruction v' of the visible units, then resample the hidden activations h' from this.
  4. Compute the outer product of v' and h' and call this the negative gradient.
  5. Let the weight update to w_{i,j} be the positive gradient minus the negative gradient, times some learning rate.

See also[edit]

References[edit]

  1. Jump up^ Smolensky, Paul (1986). "Chapter 6: Information Processing in Dynamical Systems: Foundations of Harmony Theory". In Rumelhart, David E.; McLelland, James L. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations. MIT Press. pp. 194–281. ISBN 0-262-68053-X.
  2. Jump up^ Hinton, G. E.; Salakhutdinov, R. R. (2006). "Reducing the Dimensionality of Data with Neural Networks"Science313 (5786): 504–507. doi:10.1126/science.1127647.PMID 16873662. edit
  3. Jump up^ Larochelle, H.; Bengio, Y. (2008). "Classification using discriminative restricted Boltzmann machines".Proceedings of the 25th international conference on Machine learning - ICML '08. p. 536.doi:10.1145/1390156.1390224ISBN 9781605582054.edit
  4. Jump up^ Coates, Adam; Lee, Honglak; Ng, Andrew Y. (2011). "An analysis of single-layer networks in unsupervised feature learning". International Conference on Artificial Intelligence and Statistics (AISTATS).
  5. Jump up^ Ruslan Salakhutdinov and Geoffrey Hinton (2010).Replicated softmax: an undirected topic modelNeural Information Processing Systems 23.
  6. Jump up to:a b Miguel Á. Carreira-Perpiñán and Geoffrey Hinton (2005). On contrastive divergence learning. Artificial Intelligence and Statistics.
  7. Jump up^ Deep Belief Networks in Deep Learning Tutorials.
  8. Jump up to:a b Geoffrey Hinton (2010). A Practical Guide to Training Restricted Boltzmann Machines. UTML TR 2010–003, University of Toronto.
  9. Jump up to:a b Sutskever, Ilya; Tieleman, Tijmen (2010). "On the convergence properties of contrastive divergence"Proc. 13th Int'l Conf. on AI and Statistics (AISTATS).
  10. Jump up to:a b Asja Fischer and Christian Igel. Training Restricted Boltzmann Machines: An Introduction. Pattern Recognition 47, pp. 25-39, 2014
  11. Jump up^ María Angélica Cueto; Jason Morton; Bernd Sturmfels (2010). "Geometry of the restricted Boltzmann machine".Algebraic Methods in Statistics and Probability (American Mathematical Society) 516arXiv:0908.4425.
  12. Jump up^ Hinton, G. E. (2002). "Training Products of Experts by Minimizing Contrastive Divergence"Neural Computation 14 (8): 1771–1800.doi:10.1162/089976602760128018PMID 12180402.edit

External links[edit]

CONTEXT(Help)
-
Machine Learning Methods & Algorithms »Machine Learning Methods & Algorithms
Supervised learning »Supervised learning
Artificial neural network »Artificial neural network
Restricted Boltzmann machine
+Comments (0)
+Citations (0)
+About