본 포스팅은 다음 과정을 정리 한 글입니다.
Custom and Distributed Training with TensorFlow
Custom and Distributed Training with TensorFlow
deeplearning.ai에서 제공합니다. In this course, you will: • Learn about Tensor objects, the fundamental building blocks of TensorFlow, understand the ... 무료로 등록하십시오.
www.coursera.org
지난 시간 리뷰
[Tensorflow 2][Keras][Custom and Distributed Training with TensorFlow] Week1 - Gradient Tape, Gradient Descent using Gradient Ta
본 포스팅은 다음 과정을 정리 한 글입니다. Custom and Distributed Training with TensorFlow https://www.coursera.org/learn/custom-distributed-training-with-tensorflow?specialization=tensorflow-ad..
mypark.tistory.com
Exercise on basics of Gradient Tape
Let's explore how you can use tf.GradientTape() to do automatic differentiation.
# Define a 2x2 array of 1's x = tf.ones((2,2)) with tf.GradientTape() as t: # Record the actions performed on tensor x with `watch` t.watch(x) # Define y as the sum of the elements in x y = tf.reduce_sum(x) # Let z be the square of y z = tf.square(y) # Get the derivative of z wrt the original input tensor x dz_dx = t.gradient(z, x) # Print our result print(dz_dx)
tf.Tensor( [[8. 8.] [8. 8.]], shape=(2, 2), dtype=float32)
Gradient tape expires after one use, by default
If you want to compute multiple gradients, note that by default, GradientTape is not persistent (persistent=False). This means that the GradientTape will expire after you use it to calculate a gradient.
To see this, set up gradient tape as usual and calculate a gradient, so that the gradient tape will be 'expired'.
x = tf.constant(3.0) # Notice that persistent is False by default with tf.GradientTape() as t: t.watch(x) # y = x^2 y = x * x # z = y^2 z = y * y # Compute dz/dx. 4 * x^3 at x = 3 --> 108.0 dz_dx = t.gradient(z, x) print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)
Gradient tape has expired
See what happens if you try to calculate another gradient after you've already used gradient tape once.
# If you try to compute dy/dx after the gradient tape has expired: try: dy_dx = t.gradient(y, x) # 6.0 print(dy_dx) except RuntimeError as e: print("The error message you get is:") print(e)
The error message you get is:
GradientTape.gradient can only be called once on non-persistent tapes.
Make the gradient tape persistent
To make sure that the gradient tape can be used multiple times, set persistent=True
x = tf.constant(3.0) # Set persistent=True so that you can reuse the tape with tf.GradientTape(persistent=True) as t: t.watch(x) # y = x^2 y = x * x # z = y^2 z = y * y # Compute dz/dx. 4 * x^3 at x = 3 --> 108.0 dz_dx = t.gradient(z, x) print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)
Now that it's persistent, you can still reuse this tape!
Try calculating a second gradient on this persistent tape.
# You can still compute dy/dx because of the persistent flag. dy_dx = t.gradient(y, x) # 6.0 print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)
Nested Gradient tapes
Now let's try computing a higher order derivative by nesting the GradientTapes:
Acceptable indentation of the first gradient calculation
Keep in mind that you'll want to make sure that the first gradient calculation of dy_dx should occur at least inside the outer with block.
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x # The first gradient calculation should occur at leaset # within the outer with block dy_dx = tape_1.gradient(y, x) d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
The first gradient calculation can also be inside the inner with block.
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x # The first gradient calculation can also be within the inner with block dy_dx = tape_1.gradient(y, x) d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
Where not to indent the first gradient calculation
If the first gradient calculation is OUTSIDE of the outer with block, it won't persist for the second gradient calculation.
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x # The first gradient call is outside the outer with block # so the tape will expire after this dy_dx = tape_1.gradient(y, x) # The tape is now expired and the gradient output will be `None` d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
None
Notice how the d2y_dx2 calculation is now None. The tape has expired. Also note that this still won't work even if you set persistent=True for both gradient tapes.
x = tf.Variable(1.0) # Setting persistent=True still won't work with tf.GradientTape(persistent=True) as tape_2: # Setting persistent=True still won't work with tf.GradientTape(persistent=True) as tape_1: y = x * x * x # The first gradient call is outside the outer with block # so the tape will expire after this dy_dx = tape_1.gradient(y, x) # the output will be `None` d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
None
Proper indentation for the second gradient calculation
The second gradient calculation d2y_dx2 can be indented as much as the first calculation of dy_dx but not more.
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x dy_dx = tape_1.gradient(y, x) # this is acceptable d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
This is also acceptable
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x dy_dx = tape_1.gradient(y, x) # this is also acceptable d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
This is also acceptable
x = tf.Variable(1.0) with tf.GradientTape() as tape_2: with tf.GradientTape() as tape_1: y = x * x * x dy_dx = tape_1.gradient(y, x) # this is also acceptable d2y_dx2 = tape_2.gradient(dy_dx, x) print(dy_dx) print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)
댓글