File tree Expand file tree Collapse file tree 1 file changed +1
-1
lines changed
tutorials/sphinx-tutorials Expand file tree Collapse file tree 1 file changed +1
-1
lines changed Original file line number Diff line number Diff line change 5757#
5858# This type of algorithms is usually trained *on-policy*. This means that, at every learning iteration, we have a
5959# **sampling** and a **training** phase. In the **sampling** phase of iteration :math:`t`, rollouts are collected
60- # form agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
60+ # from agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
6161# In the **training** phase, all the collected rollouts are immediately fed to the training process to perform
6262# backpropagation. This leads to updated policies which are then used again for sampling.
6363# The execution of this process in a loop constitutes *on-policy learning*.
You can’t perform that action at this time.
0 commit comments