In the process of

neural network model optimization, will encounter many problems, such as how to set the learning rate problem, we can through the way of exponential decay model in the beginning of training fast approaching the optimal solution, in the later training into stable optimal solution region; for the over fitting problem via regularization to deal with; moving average model can make the final model in the unknown data more robust.

** 1. Setting up the learning rate **

learning rate is neither too large nor too small. TensorFlow provides a more flexible way to set up the learning rate - the exponential decay method. This method achieves exponential attenuation learning rate. First, we use a large learning rate to get a better solution quickly. Then, with the iteration continues to reduce learning rate, the model is more stable, smooth and smooth at the later stage of training.

the function index level decreases the learning rate, the realization of each round of actual optimization when the attenuation after learning rate decayed_learning_rate = learning_rate * decay_rate ^ (global_step /decay_steps), learning_rate is set by accident learning rate decay_rate attenuation coefficient, decay_steps decay rate. The following diagram, parameter staircase=False, learning rate trends for staircase=True dark colored part; part, makes the learning rate change (staircase function), a step function commonly used application scenario of this setup is complete again after each training data, the learning rate is reduced once.

uses examples: learning_rate =tf.train.exponential_decay (starter_learning_rate, global_step, 100000, 0.96, staircase=True).

** two **

1., the over fitting problem of over fitting problem and solution of

called the overfitting problem, which is when a model is too complex, it can be a random noise in the training data the part of a good memory and forget to learn the general trend of the training data.

in order to avoid the overfitting problem, is a commonly used method of regularization (Regularization), describe the thought is to join the model complexity index in the loss function, the optimization object is defined as "**J (theta) + lambda R (W) ** R (W), which is a portrait of the complexity of the model, including the weight of W does not include bias B, lambda representation model complex loss in the total loss ratio. In general, the model complexity is determined only by the weight W. Describe the models used in the complexity of the function R (W) has two kinds, one is L1

"

201829152541" no matter what kind of regularization method, the basic idea is to limit the weight by the size of the random noise model can not be fitted in the training data. The difference is: L1 regularization makes the parameter become more sparse, and L2 doesn't, so the more sparse the parameter is, the more parameters will be changed to 0, which can achieve the similar feature selection function. In practice, you can also use L1 and L2 regularization are used simultaneously:

"

2. 201829152613" over fitting problem of TensorFlow solutions of

loss =tf.reduce_mean (tf.square (y_ - y) + tf.contrib.layers.l2_regularizer (lambda) (W)

is a L2 containing regularization loss function. The first part is the mean square error loss function, and the second part is the regularization term. The lambda parameter represents the weight of the regularized term, that is, the J (theta) + lambda R (W), and W is the parameter that needs to be calculated for the regularization loss. ) function can be calculated given the parameters of the L2 regularization, similar to the "color: #800000", **tf.contrib.layers.l1_regularizer (**) to that given the parameters of the L1 regularization.

# L1 and L2 regularization function w = tf.constant ([[1.0, -2.0], [-3.0, 4.0]]) with tf.Session (as) sess: # 0.5* (|1|+|-2|+|-3|+|4|=5.0) print (sess.run (tf.contrib.layers.l1_regularizer (0.5) (W))) 5 (1+4+9+16) # # 0.5*[/2]=7.5 TensorFlow will L2 regular the 2 term divided by the derivation results more concise print (sess.run (tf.contrib.layers.l2_regularizer (0.5) (W)) #) 7.5

increased when the parameters of neural network, the above definition of loss function will result in the definition of loss is very long, poor readability, in addition the definition of network structure and calculation part when the loss function of the complex network structure after the part may not be in the same function, through the way of calculating the variable loss function is not convenient. To solve this problem, you can use the set (Collection) provided in TensorFlow. The specific implementation is shown in the code section.

tf.add_to_collection () adds variables to the specified set; tf.get_collection () returns a list that stores the elements in the set. The

** three, the sliding average model **

another makes the model more robust (robust) sliding average model on the test data. When using the stochastic gradient descent algorithm to train the neural network, the moving average model can improve the performance of the final model on test data in many applications. The training of GradientDescent and Momentum can benefit from the ExponentialMovingAverage method. The **tf.train.ExponentialMovingAverage** provided by

is a class class in order to achieve the moving average model in TensorFlow. When the tf.train.ExponentialMovingAverage class object is initialized, the attenuation rate decay and the parameter num_updates for the dynamic control attenuation rate must be specified. Tf.train.ExponentialMovingAverage for each variable maintains a shadow variable (shadow variable), the initial value is the shadow variable initial corresponding variables, each variable is updated, shadow_variable =decay * shadow_variable + (1 - decay) * variable. It can be seen from the formula that decay determines the speed of updating the model. The larger the decay model, the more stable the model is, and the actual application of decay is generally set to a number of close to 1. Num_updates is None by default, and if set, the attenuation rate is calculated by min (decay, (1 +num_updates) / (10 + num_updates)). The apply method of

tf.train.ExponentialMovingAverage object returns an update average operation on var_list. Var_list must be Variable or Tensor of list, which will update the shadow variable shadowvariable of var_list. The average method can obtain the values of the sliding average after variables.

** four, **

1. code has complex structure of neural network weights L2 regularization method

import tensorflow as TF '# L1 and L2 regularization function w = tf.constant ([[1.0, -2.0], [-3.0, 4.0]]) with (tf.Session) as sess: (|1|+|-2|+|-3|+|4|=5.0) print (0.5* # (sess.run tf.contrib.layers.l1_regularizer (0.5) (W))) 5 (1+4+9+16) # # 0.5*[/2]=7.5 TensorFlow will L2 regularization term divided by 2 makes the results more concise derivation of print (sess.run (tf.contrib.layers.l2_regularizer (0.5) (W))) # 7.5' # complex structure of neural network weights L2 regularization method to define the weight of each layer #, and the weights of the L2 regularization term is added to the name of "losses'def get_weight set (shape, lambda1): VAR = tf.Variable (t F.random_normal (shape), dtype=tf.float32 tf.add_to_collection ('losses'), tf.contrib.layers.l2_regularizer (lambda1) (VaR) var x tf.placeholder (return) = tf.float32 (None, 2) = tf.placeholder (tf.float32, y_) (None, 1)) layer_dimension = [2,10,5,3,1] # defines the number of nodes n_layers = len for each layer of the neural network (layer_dimension) current_layer = x # current layer is provided in the input layer in_dimension = layer_dimension[0] # through the cycle to generate a 5 layer fully connected neural network structure of for I in range (1, n_layers): out_dimension = layer_dimension[i] weight = get_weight ([in_dimension, out_dimension], tf.Variable (0.003) bias = tf.constant (0.1, shape=[out_dimension])) current_layer = tf.nn.relu (tf.matmul (current_layer, weight) + bias) In_dimension = layer_dimension[i] mse_loss = tf.reduce_mean (tf.square (y_ - current_layer)) tf.add_to_collection ('losses', mse_loss) loss = tf.add_n (tf.get_collection ('losses')

) # loss function contains all the parameters of the regularization term 2. tf.train.ExponentialMovingAverage using

import tensorflow as TF sample # tf.train.ExponentialMovingAverage using sample V1 = tf.Variable (0, dtype=tf.float32) Step = tf.Variable (0, trainable=False) a moving average class round number # # here step neural network simulation iterative definition, initialization of decay=0.99 decay rate and decay rate for dynamic control parameters num_updates EMA = tf.train.ExponentialMovingAverage (0.99, num_updates=step) # apply method returns a var_list update. Moving average operation, var_list must be list Variable or Tensor # the operation will update var_list variable maintain_averages_op = ema.apply shadow variable shadow (var_list=[v1]) with tf.Session (as) sess: = tf.global_variables_initializer (init_op) sess.run (init_op) # average method to obtain the value of print variable after moving average (sess.run ([v1, ema.average (V1)]) # [0.0, 0.0] (sess.run) tf.assign (V1, 5) #) min{0.99, (1+step) (10+step) =0.1}=0.1 # sliding average update V1 for 0.1*0.0+0.9*5=4.5 sess.run (maintain_averages_op) print (sess.run ([v1, ema.average (V1)]) [5.0 4.5] (sess.run) #, tf.assign (step 10000), sess.run (tf.assign) (V1, 10) #) min{0.99, (1+step) (10+step) = 0.999}=0.99 V1 update # sliding average Value of 0.99*4.5+0.01*10=4.555 sess.run (maintain_averages_op) print (sess.run ([v1, ema.average (V1)])) # [10.0, 4.5549998] # sliding update V1 average value of 0.99*4.555+0.01*10=4.60945 sess.run (maintain_averages_op) print (sess.run ([v1, ema.average (V1)])) # [10.0

4.6094499], all above is the hope. To help everyone to learn, I hope you will support a script.

#### you may be interested in articles:

This paper fixed link:http://www.script-home.com/optimization-strategy-learning-of-tensorflow-neural-network.html | Script Home |** +Copy Link **

Article reprint please specify:Optimization strategy learning of TensorFlow neural network | Script Home