본문 바로가기
Artificial Intelligence/Keras

[Tensorflow 2][Keras][Custom and Distributed Training with TensorFlow] Week1 - Gradient Tape Basics

by 개발자J의일상 2021. 7. 27.
반응형

본 포스팅은 다음 과정을 정리 한 글입니다.

 

Custom and Distributed Training with TensorFlow

https://www.coursera.org/learn/custom-distributed-training-with-tensorflow?specialization=tensorflow-advanced-techniques 

 

Custom and Distributed Training with TensorFlow

deeplearning.ai에서 제공합니다. In this course, you will: • Learn about Tensor objects, the fundamental building blocks of TensorFlow, understand the ... 무료로 등록하십시오.

www.coursera.org

 

지난 시간 리뷰

2021.07.27 - [Artificial Intelligence/Keras] - [Tensorflow 2][Keras][Custom and Distributed Training with TensorFlow] Week1 - Gradient Tape, Gradient Descent using Gradient Tape

 

[Tensorflow 2][Keras][Custom and Distributed Training with TensorFlow] Week1 - Gradient Tape, Gradient Descent using Gradient Ta

본 포스팅은 다음 과정을 정리 한 글입니다. Custom and Distributed Training with TensorFlow https://www.coursera.org/learn/custom-distributed-training-with-tensorflow?specialization=tensorflow-ad..

mypark.tistory.com

 

Exercise on basics of Gradient Tape

Let's explore how you can use tf.GradientTape() to do automatic differentiation.

 

# Define a 2x2 array of 1's
x = tf.ones((2,2))

with tf.GradientTape() as t:
    # Record the actions performed on tensor x with `watch`
    t.watch(x) 

    # Define y as the sum of the elements in x
    y =  tf.reduce_sum(x)

    # Let z be the square of y
    z = tf.square(y) 

# Get the derivative of z wrt the original input tensor x
dz_dx = t.gradient(z, x)

# Print our result
print(dz_dx)
tf.Tensor( [[8. 8.] [8. 8.]], shape=(2, 2), dtype=float32)

Gradient tape expires after one use, by default

 

If you want to compute multiple gradients, note that by default, GradientTape is not persistent (persistent=False). This means that the GradientTape will expire after you use it to calculate a gradient.

 

To see this, set up gradient tape as usual and calculate a gradient, so that the gradient tape will be 'expired'.

 

x = tf.constant(3.0)

# Notice that persistent is False by default
with tf.GradientTape() as t:
    t.watch(x)
    
    # y = x^2
    y = x * x
    
    # z = y^2
    z = y * y

# Compute dz/dx. 4 * x^3 at x = 3 --> 108.0
dz_dx = t.gradient(z, x)
print(dz_dx)
tf.Tensor(108.0, shape=(), dtype=float32)

Gradient tape has expired

 

See what happens if you try to calculate another gradient after you've already used gradient tape once.

# If you try to compute dy/dx after the gradient tape has expired:
try:
    dy_dx = t.gradient(y, x)  # 6.0
    print(dy_dx)
except RuntimeError as e:
    print("The error message you get is:")
    print(e)
The error message you get is:
GradientTape.gradient can only be called once on non-persistent tapes.

 

Make the gradient tape persistent

To make sure that the gradient tape can be used multiple times, set persistent=True

 

x = tf.constant(3.0)

# Set persistent=True so that you can reuse the tape
with tf.GradientTape(persistent=True) as t:
    t.watch(x)
    
    # y = x^2
    y = x * x
    
    # z = y^2
    z = y * y

# Compute dz/dx. 4 * x^3 at x = 3 --> 108.0
dz_dx = t.gradient(z, x)
print(dz_dx)

 

tf.Tensor(108.0, shape=(), dtype=float32)

Now that it's persistent, you can still reuse this tape!

 

Try calculating a second gradient on this persistent tape.

 

# You can still compute dy/dx because of the persistent flag.
dy_dx = t.gradient(y, x)  # 6.0
print(dy_dx)
tf.Tensor(6.0, shape=(), dtype=float32)

 

 

Nested Gradient tapes

Now let's try computing a higher order derivative by nesting the GradientTapes:

 

Acceptable indentation of the first gradient calculation

 

Keep in mind that you'll want to make sure that the first gradient calculation of dy_dx should occur at least inside the outer with block.

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x
    
    # The first gradient calculation should occur at leaset
    # within the outer with block
    dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

 

The first gradient calculation can also be inside the inner with block.

 

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x
    
        # The first gradient calculation can also be within the inner with block
        dy_dx = tape_1.gradient(y, x)
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

 

Where not to indent the first gradient calculation

 

If the first gradient calculation is OUTSIDE of the outer with block, it won't persist for the second gradient calculation.

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

# The first gradient call is outside the outer with block
# so the tape will expire after this
dy_dx = tape_1.gradient(y, x)

# The tape is now expired and the gradient output will be `None`
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
None

Notice how the d2y_dx2 calculation is now None. The tape has expired. Also note that this still won't work even if you set persistent=True for both gradient tapes.

x = tf.Variable(1.0)

# Setting persistent=True still won't work
with tf.GradientTape(persistent=True) as tape_2:
    # Setting persistent=True still won't work
    with tf.GradientTape(persistent=True) as tape_1:
        y = x * x * x

# The first gradient call is outside the outer with block
# so the tape will expire after this
dy_dx = tape_1.gradient(y, x)

# the output will be `None`
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
None

 

 

Proper indentation for the second gradient calculation

The second gradient calculation d2y_dx2 can be indented as much as the first calculation of dy_dx but not more.

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)
        
        # this is acceptable
        d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

This is also acceptable

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)
        
    # this is also acceptable
    d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

This is also acceptable

x = tf.Variable(1.0)

with tf.GradientTape() as tape_2:
    with tf.GradientTape() as tape_1:
        y = x * x * x

        dy_dx = tape_1.gradient(y, x)
        
# this is also acceptable
d2y_dx2 = tape_2.gradient(dy_dx, x)

print(dy_dx)
print(d2y_dx2)
tf.Tensor(3.0, shape=(), dtype=float32)
tf.Tensor(6.0, shape=(), dtype=float32)

 

300x250

댓글