Reinforcement Learning

An paradigm of ML: How intelligent agent should take actions to accumulate maximum reward

Given an action by the agent,

Environment changes state
Environment gives reward
Agent can take next action based on knowledge
Physically based

UnityML

Agent:

Collection of observations from environment
Action execution

Brain:

Decision making for linked agents
Academy to track iterations, set simulation speed and reset environment.

Academy:

Tracks iterations
Global environment variables

Proximal Policy Optimization:

Used to train advanced models

Model Locomotion

Force applied is based on acceleration $F=ma$
each limb being subjected to a force with the limb’s mass
$F = (v_1 - v_0)m/t = ma$