Minor rewrites of various descriptions

author: bd-912 <bdunahu@gmail.com> 2023-12-13 19:50:51 -0700
committer: bd-912 <bdunahu@gmail.com> 2023-12-13 19:50:51 -0700
commit: d2c5b3ce4bccefeef40d7cadbbe14b350960475d (patch)
tree: 0b593a43dd4fa7957007d4a1e482ab260f4be8c0 /one_revised_snake_q_table.ipynb
parent: b1137269b269eed1207005828b7939efc9f557c2 (diff)
1 files changed, 17 insertions, 19 deletions
diff --git a/one_revised_snake_q_table.ipynb b/one_revised_snake_q_table.ipynb
index e827ae4..6d1c114 100644
--- a/one_revised_snake_q_table.ipynb
+++ b/one_revised_snake_q_table.ipynb
@@ -143,7 +143,7 @@
     {
      "data": {
       "text/plain": [
-       "[<CollisionType.GOAL: 1>]"
+       "[<CollisionType.NONE: 2>]"
       ]
      },
      "execution_count": 6,
@@ -169,7 +169,7 @@
    "metadata": {},
    "source": [
     "### State-sensing methods, creating and reading a q-table\n",
-    "Now, we can start redesigning some functions used to allow the snake to play intelligently. We'll use a multi-dimensional numpy array to store the rewards corresponding to each state and action. This is called a q-function, or a q-table in this case.\n",
+    "Now, we can start redesigning some functions used to allow the snake to play intelligently. We'll use a multi-dimensional numpy array to store the expected rewards corresponding to each state and action. This is called a q-function, or a q-table in this case, and represents one of the most fundamental methods of reinforcement learning. More on this later...\n",
     "\n",
     "How many states do I need? Seeing how the new **get_viable_actions** method already prevents the snake from choosing life-ending moves, the snake is no longer tasked with learning or memorizing it.\n",
     "\n",
@@ -263,9 +263,7 @@
     {
      "data": {
       "text/plain": [
-       "([Point(x=160, y=80)],\n",
-       " [Point(x=160, y=80), Point(x=160, y=0)],\n",
-       " Point(x=0, y=0))"
+       "([Point(x=400, y=160)], [Point(x=400, y=160)], Point(x=0, y=80))"
       ]
      },
      "execution_count": 9,
@@ -282,7 +280,7 @@
    "id": "d3fd47ce-55fe-4d2f-9147-8848193f7ca1",
    "metadata": {},
    "source": [
-    "Now to make a function to index our expected reward-to-go given a state using sense_goal:"
+    "Now to make a function to index our expected reward-to-go given a state using sense_goal. Because we only have one state-sensing function, this function really only serves as a neat interface to sense_goal:"
    ]
   },
   {
@@ -339,7 +337,7 @@
    "id": "6a2ef7f7-f6f7-4610-8e98-1d389327f3e8",
    "metadata": {},
    "source": [
-    "In our learning agent, these actions will obviously be associated with different expected rewards. Essentially, we have a function that, given a state, tells us the expected utility of each action. Should we just choose the best one?\n",
+    "In my learning agent, these actions will obviously be associated with different expected rewards. Essentially, I have a function that, given a state, tells me the expected utility of each action. Should I just choose the best one?\n",
     "\n",
     "There are two problems with a greedy approach...\n",
     "\n",
@@ -510,9 +508,9 @@
    "id": "5c36ab97-2ca0-4468-8d4c-ebd1e4deec23",
    "metadata": {},
    "source": [
-    "And the snake already plays optimally, no learning required. This implementation might be more similar to a passive learning agent, in the sense I already told the snake what policy I want it to follow.\n",
+    "And the snake already plays optimally, no learning required! This implementation might be more similar to a passive learning agent, in the sense I already told the snake what policy I want it to follow.\n",
     "\n",
-    "Now that we have these methods, I will create functions to allow the snake to learn by its own, and then pair it off against the q-table I just built."
+    "Now that I have these methods, I will create functions to allow the snake to learn by its own, and then pair it off against the q-table I just built."
    ]
   },
   {
@@ -528,7 +526,7 @@
    "id": "ce537e44-ac8c-4f09-b89d-a330f13277da",
    "metadata": {},
    "source": [
-    "A good agent prioritizes actions that leads to the highest expected reward. The Q-function assigns an expected utility to each state-action pair, usually the expected reward-to-go.\n",
+    "A rational agent prioritizes actions that leads to the highest expected reward. The Q-function assigns an expected utility to each state-action pair, usually the expected reward-to-go.\n",
     "\n",
     "A popular method of adjusting this state-value function is a version of the temporal difference equation, which adjusts the utility associated with each input to agree with the maximum utility of its successor:\n",
     "\n",
@@ -538,7 +536,7 @@
     "\n",
     "The discount factor $\\gamma$ can be used to weight immediate reward higher than future reward, though will be kept as 1 in my solution, which means we consider all future actions equally. All I need to do is assign an enticing enough reinforcement to goal-attaining actions, and use the temporal difference equation to update all other state transitions.\n",
     "\n",
-    "In order to implement this equation, I simply need a function that takes the q-table to be updated, the old state-action pair, and then ew state-action pair, and the outcome as returned by the game engine so we can assign a reward.\n",
+    "In order to implement this equation, I simply need a function that takes the q-table to be updated, the old state-action pair, and then ew state-action pair, and the outcome as returned by the game engine so I can assign a reward.\n",
     "\n",
     "When the agent does reach the goal, I will manually set that state and action to the best reward, 0. Remember that the q-table is initialized with zeros, meaning untravelled actions are pre-assigned good rewards. Both this and epsilon will encourage exploration.\n",
     "\n",
@@ -620,14 +618,14 @@
     {
      "data": {
       "text/plain": [
-       "array([[-7.98101961, -2.63365542, -0.40046043, -1.66295111],\n",
-       "       [-6.79790104, -8.97312148, -2.76629639, -2.01841064],\n",
-       "       [-2.5668375 , -7.76913115, -5.25510457, -1.60454875],\n",
-       "       [-2.53858287, -4.87135085, -7.27897488, -3.55392954],\n",
-       "       [-1.11332388, -1.16738974, -8.00673287, -3.76512078],\n",
-       "       [-4.80299325, -1.82240999, -4.36261659, -7.78143806],\n",
-       "       [-3.74031239, -1.81917483, -2.55794318, -8.59533619],\n",
-       "       [-7.27706114, -5.22216365, -2.79252452, -3.34047701]])"
+       "array([[-7.67491582, -2.33111962, -0.60015858, -1.37540041],\n",
+       "       [-5.27152969, -7.98969718, -3.87453687, -1.24435724],\n",
+       "       [-6.31827426, -5.34343005, -5.15269717, -3.14416104],\n",
+       "       [-4.03006198, -7.7299521 , -5.56781757, -2.25634664],\n",
+       "       [-2.21238723, -3.2719937 , -7.19183517, -1.12966867],\n",
+       "       [-4.51471731, -3.0205144 , -7.82635816, -5.67783926],\n",
+       "       [-3.65731728, -1.23278448, -1.3241749 , -7.94001044],\n",
+       "       [-7.61199203, -2.0832036 , -2.27008676, -6.21290761]])"
       ]
      },
      "execution_count": 22,
author	bd-912 <bdunahu@gmail.com>	2023-12-13 19:50:51 -0700
committer	bd-912 <bdunahu@gmail.com>	2023-12-13 19:50:51 -0700
commit	d2c5b3ce4bccefeef40d7cadbbe14b350960475d (patch)
tree	0b593a43dd4fa7957007d4a1e482ab260f4be8c0 /one_revised_snake_q_table.ipynb
parent	b1137269b269eed1207005828b7939efc9f557c2 (diff)