Exam (elaborations)

CS 234 assignment 1-ALL ANSWERS 100% CORRECT

Name: CS 234 assignment 1-ALL ANSWERS 100% CORRECT
SKU: doc_1624024
Rating: 5.00 (1 reviews)
Author: Themanehoppe

1 review

324 views 23 purchases

Course
CS 234

Institution
CS 234

[Show more]

Preview 2 out of 9 pages

View example

Uploaded on March 17, 2022
Number of pages 9
Written in 2021/2022
Type Exam (elaborations)
Contains Questions & answers

stanford universitycs 234assignment1

1 review

By: isheynzon • 2 year ago

Themanehoppe

Member since 3 year 258 documents sold

$9.99

Added

Add to cart

Add to wishlist

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

CS 234 Winter 2021
Assignment 1
Due: January 22 at 6:00 pm (PST)

For submission instructions please refer to website For all problems, if you use an existing result
from either the literature or a textbook to solve the exercise, you need to cite the source.

1 Flappy Karel MDP [25 pts]
There is a hot new mobile game on the market called Flappy Karel, where Karel the robot must
dodge the red pillars of doom and flap its way to the green pasture. Consider the following 2 grid
environments (Flappy World 1 and Flappy World 2). Starting from any unshaded square, Karel can
either move right & up, or right & down (e.g from state 4 you can move to state 10 or 12, think
checkers). Actions are deterministic and always succeed unless they will cause Karel to run into a
wall. The thicker edges indicate walls, and attempting to move in the direction of a wall results in
falling down one square (e.g. going in any direction from state 30 leads to falling into state 31). A
successful run by Karel in Flappy World 1 is shown in Figure 1b. Taking any action from the green
target squares (no. 32) earns a reward of rg and ends the episode. Taking any action from the red
squares of doom (no. 1, 7, 8, 12, 13...) earns a reward of rr and ends the episode. Otherwise, from
every other square, taking any action is associated with a reward rs . Assume the discount factor
γ = 0.9, rg = +5, and rr = −5 unless otherwise specified. Notice the Horizon is technically infinite
in both worlds.

1 8 15 22 29 1 8 15 22 29

2 9 16 23 30 2 9 16 23 30

3 10 17 24 31 3 10 17 24 31

4 11 18 25 32 4 11 18 25 32

5 12 19 26 33 5 12 19 26 33

6 13 20 27 34 6 13 20 27 34

7 14 21 28 35 7 14 21 28 35

(a) Flappy World 1 (b) A successful run by Karel in Flappy World 1

Figure 1

1

, (a) Let rs ∈ {−4, −1, 0, 1}. Starting in square 2, for each of the possible values of rs briefly
explain what the optimal policy would be in Flappy World 1. In each case is the optimal policy
unique and does the optimal policy depend on the value of the discount factor γ? Explain
your answer. [5 pts]
(b) What value of rs would cause the optimal policy to return the shortest path to the green
target square? Using this value of rs find the optimal value function for each square in Flappy
world 1. What is the optimal action from square 27? [5 pts]

Now consider Flappy world 2. It is the same as Flappy world 1, except there are no walls on the
right and left sides. Going past the right end of flappy world 2 simply loops you to left hand side.
Take a look at Figure 1b for a successful run by Karel in Flappy World 2.

1 8 15 22 29
1 8 15 22 29 1 8 15 22 29

2 9 16 23 30 2 9 16 23 30 2 9 16 23 30

3 10 17 24 31 3 10 17 24 31 3 10 17 24 31

4 11 18 25 32 4 11 18 25 32
4 11 18 25 32
5 12 19 26 33 5 12 19 26 33

5 12 19 26 33
6 13 20 27 34 6 13 20 27 34

6 13 20 27 34 7 14 21 28 35 7 14 21 28 35

7 14 21 28 35
(b) A successful run by Karel in Flappy World 2
(a) Flappy World 2

Figure 2

(c) Let rs ∈ {−4, −1, 0, 1}. Starting in square 3, for each of the possible values of rs briefly
explain what the optimal policy would be in Flappy World 2. Using the value of rs , that
would cause the optimal policy to return the shortest path to the green target square, find the
optimal value function for each square in Flappy world 2. What is the optimal action from
square 27? [5 pts]
(d) Consider a general MDP with rewards, and transitions. Consider a discount factor of γ. For
this case assume that the horizon is infinite (so there is no termination). A policy π in this
MDP induces a value function V π (lets refer to this as Vold
π ). Now suppose we have the same

MDP where all rewards have a constant c added to them and then have been scaled by a
constant a (i.e. rnew = a(c + rold )). Can you come up with an expression for the new value
function V π induced by π in this second MDP in terms of Vold π , c, a, and γ? [5 pts]

(e) Can scaling all the rewards by a fixed amount change the optimal policy of a MDP? If so,
describe how different ranges of the constant a (where rnew = a ∗ (rold )) would change the
optimal policy of the MDP from part (c). [5 pts]

(a) rs = 1 Take longest possible path (2,10,18,24,30,31,32) to target square.
rs = 0 Take shortest possible path to target square.
rs = −1 Take shortest possible path to target square.

2

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Themanehoppe. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

72042 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling

Popular Universities in the United States

Popular books

Find notes and summaries for these qualifications

Exam (elaborations)

CS 234 assignment 1-ALL ANSWERS 100% CORRECT

Document information

Subjects

Written for

1 review

Seller

Reviews received

Content preview

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Quick and easy check-out

Focus on what matters

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?