p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.
0px Helvetica}p.p2 {margin: 0.0px 0.0px 0.
- Thesis Statement
- Structure and Outline
- Voice and Grammar
- Conclusion
0px 0.0px; font: 12.0px Helvetica}p.
p3 {margin: 0.0px 0.0px 0.
0px 0.0px; font: 9.0px Helvetica}span.s1 {font: 24.0px Helvetica}span.
s2 {color: #002486} Speech recognition is the translation, throughsome methodologies, of human speech into textby computers. In this research review we examinethree di erent methods that are used in speechrecognition field and we investigate the accuracythey succeed in di erent data sets. We analyzethe state-of-art deep neural networks (DNNs),that have evolved into complex architectures andthey achieve significant results in many cases.Afterward, we explain convolutional neural networks(CNNs) and we explore their dynamic inthis field. Finally, we present the recent researchin highway deep neural networks (HDNNs) thatseem to be more flexible for resource constrainedplatforms. Overall, we critically try to comparethese methods and show their strengths and limitations.
We conclude that each method has itsadvantages but also has its weaknesses and weuse them for di erent purposes.I. IntroductionMachine Learning (ML) is a field of computer sciencethat gives the computers the ability to learn throughdi erent algorithms and techniques without being programmed.Automatic speech recognition (ASR) is closelyrelated with ML because it uses methodologies and proceduresof ML 1 , 2 , 3 .
ASR has been around for decadesbut it was not until recently that there was a tremendousdevelopment because of the advances in both machine learningmethods and computer hardware. New ML techniquesmade speech recognition accurate enough to be useful outsideof carefully controlled environments and so it couldeasily be deployed in many electronic devices nowadays(i.e. computers, smart-phones).Speech is the most important mode of communicationbetween human beings and that is why from the early partof the previous century, e orts have been made in orderto make computers do what only humans could perceive.Research has been conducted through the past five decadesand the main reason was the desire of making tasks automatedusing machines 2 . Many motivations using di erenttheories such as probabilistic modeling and reasoning,pattern recognition and artificial neural networks a ectedthe researchers and helped to advance ASR.The first single advance in the history of ASR occurredin the middle of 70’s with the introduction of theexpectation-maximization (EM) 4 algorithm for traininghidden Markov models (HMMs).
The EM technique gavethe possibility to develop the first speech recognition systemsusing Gaussian mixture models (GMMs). Despiteall the advantages of the GMMs, they are statistically ine cient for modeling data that lie on or near a nonlinearmanifold in the data space. This problem could be solvedby artificial neural networks but the computer hardware ofthat era did not allow us to build complex neural networks.As a result most speech recognition systems were basedon HMMs and later they used the neural network and hiddenMarkov model (NN/ HMM) hybrid architecture, firstinvestigated in the early 1990s 5 . After 2000s and overthe last years the improvement of computer hardware andthe invention of new machine learning algorithms madepossible the training for DNNs. DNNs with many hiddenlayers have been shown to outperform GMMs on a varietyof speech recognition benchmarks 6 . Other more complexneural architectures such as recurrent neural networkswith long short-term memory units (LSTM-RNNs) 7 andCNNs seem to have their benefits and applications.In this literature review we present three types of artificialneural networks (DNNs, CNNs, and HDNNs).
Weanalyze each method, we explain how they are used fortraining and what are their advantages and disadvantages.Finally we compare these methods, identifying where eachone of them is more suitable and what are their limitations.Furthermore we draw some conclusions from these comparisonsand we carefully suggest some probable futuredirections.II. MethodsA. Deep Neural Networks 6 B.
Convolutional Neural Networks 8 C. Highway Deep Neural NetworksH DNNs are a depth-gated feed-forward neural network9 . They are distinguished from the conventionalDNNs for two main reasons. Firstly they use much lessmodel parameters and secondly they use two types of gatefunctions to facilitate the information flow through di erentlayers.
HDNNs are a multi-layer network with L hidden layers.In the first layer we have the transformation of the inputInformatics Research Review (s1736880) with the first parameter followed by a nonlinear activationfunction and in each next layer we have the transformationof the previous hidden layer with the current parameter followedby a nonlinear activation function (i.e. sigmoid function).
The output layer is parameterized with the parameterand the output functions, which usually is the softmax toobtain the posterior probability of each class given the inputfeature. Given target labels, the network is usually trainedby gradient descent to minimize a loss function such ascross-entropy. However, as the number of hidden layers increases,the error surface becomes increasingly non-convex,and it becomes more likely to find a poor local minimumusing gradient based optimization algorithms with randominitialization 23. Furthermore the variance of the backpropagatedgradients may become small in the lower layersif the model parameters are not initialized properly 24.Highway deep neural networks (HDNNs) 17 wereproposed to enable very deep networks to be trained byaugmenting the hidden layers with gate functions.
These arethe transform gate that scales the original hidden activationsand the carry gate, which scales the input before passing itdirectly to the next hidden layer.Training…