現(xiàn)在讓我們定義這個利用 Q-Learning 學(xué)習(xí) Catch 游戲的模型。我們使用 Keras 作為 Tensorflow 的前端。我們的基準模型是一個簡單的三層密集網(wǎng)絡(luò)。這個模型在簡單版的 Catch 游戲當(dāng)中表現(xiàn)很好。你可以在 GitHub 中找到它的完整實現(xiàn)過程。
?
你也可以嘗試更加復(fù)雜的模型,測試其能否獲得更好的性能。
num_actions =3# [move_left, stay, move_right]hidden_size =100# Size of the hidden layersgrid_size =10# Size of the playing fielddefbaseline_model(grid_size,num_actions,hidden_size):#seting up the model with kerasmodel = Sequential() model.add(Dense(hidden_size, input_shape=(grid_size**2,), activation='relu')) model.add(Dense(hidden_size, activation='relu')) model.add(Dense(num_actions)) model.compile(sgd(lr=.1),"mse")returnmodel
?
探索
?
Q-Learning 的最后一種成分是探索。日常生活的經(jīng)驗告訴我們,有時候你得做點奇怪的事情或是隨機的手段,才能發(fā)現(xiàn)是否有比日常動作更好的東西。
?
Q-Learning 也是如此。總是做最好的選擇,意味著你可能會錯過一些從未探索的道路。為了避免這種情況,學(xué)習(xí)者有時會添加一個隨機項,而未必總是用最好的。我們可以將定義訓(xùn)練方法如下:
deftrain(model,epochs):# Train#Reseting the win counterwin_cnt =0# We want to keep track of the progress of the AI over time, so we save its win count historywin_hist = []#Epochs is the number of games we playforeinrange(epochs): loss =0.#Resetting the gameenv.reset() game_over =False# get initial inputinput_t = env.observe()whilenotgame_over:#The learner is acting on the last observed game screen#input_t is a vector containing representing the game screeninput_tm1 = input_t#Take a random action with probability epsilonifnp.random.rand() <= epsilon:#Eat something random from the menuaction = np.random.randint(0, num_actions, size=1)else:#Choose yourself#q contains the expected rewards for the actionsq = model.predict(input_tm1)#We pick the action with the highest expected rewardaction = np.argmax(q[0])# apply action, get rewards and new stateinput_t, reward, game_over = env.act(action)#If we managed to catch the fruit we add 1 to our win counterifreward ==1: win_cnt +=1#Uncomment this to render the game here#display_screen(action,3000,inputs[0])""" The experiences < s, a, r, s’ > we make during gameplay are our training data. Here we first save the last experience, and then load a batch of experiences to train our model """# store experienceexp_replay.remember([input_tm1, action, reward, input_t], game_over)# Load batch of experiencesinputs, targets = exp_replay.get_batch(model, batch_size=batch_size)# train model on experiencesbatch_loss = model.train_on_batch(inputs, targets)#sum up loss over all batches in an epochloss += batch_loss win_hist.append(win_cnt)returnwin_hist
?
我將這個游戲機器人訓(xùn)練了 5000 個 epoch,結(jié)果表現(xiàn)得很不錯!
?
?

?
?
Catch 機器人的動作
?
電子發(fā)燒友App










評論