DRL 01: A Gentle Introduction to Deep Reinforcement Learning (2023)

Deep Reinforcement Learning Explained - 01

Learn the fundamentals of reinforcement learning

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (1)

This is the first post in the series."Deep Reinforcement Learning Explained"; an introductionSerieswhich introduces the reader to the basic concepts and methods of modern Deep Reinforcement Learning in a practical, step-by-step manner.

Spanish versionthis post:

Deep Reinforcement Learning (DRL), a very fast moving field, is the combination of Reinforcement Learning and Deep Learning. It is also the most popular type of machine learning, as it can solve a variety of complex decision-making tasks that were previously unattainable by a real-world problem-solving machine with human-like intelligence.

Today I'm starting a series on Deep Reinforcement Learning, which should bring the reader closer to the subject. The aim is to brush up on the territory, from technical terms and jargon to basic concepts and classic algorithms in this field, so that novices don't get lost when delving into this wonderful field.

My first serious contact with Deep Reinforcement Learning was in Cádiz (Spain), during theMachine learning summer schoolin 2016. I attended the three-day seminar ofJuan Schulmann(that momentUC Berkeleyand co-founder ofopen AI) Deep reinforcement learning.

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (2)

It was great, but I must admit that I found John's explanations extremely difficult to follow. Much time has passed since then and thanks to the collaboration with Xavier Giró and Ph.D. Students like Víctor Campos and MPh.D.am Bellver allowed me to develop and enjoy the subject.

But despite the fact that several years have passed since then, I sincerely believe that the taxonomy of different approaches to reinforcement learning that he presented is still a good framework for organizing knowledge for beginners.

Dynamic programming is actually the beginning of most reinforcement learning courses in textbooks. I'll do that, but before that, as John did in his seminar, I'll introduce the cross-entropy method, a kind of evolutionary algorithm, although most books don't cover it. It goes really well with this first method of introducing deep learning to reinforcement learning, deep reinforcement learning, because it's an easy method to implement and it works surprisingly well.

With this method, we can conveniently review how deep learning and reinforcement learning work together before we delve into more classic approaches to approaching a RL problem without considering the DL, such asdynamic programming,Montecarlo,Learning by Time Differencein the order of the vast majority of academic books on the subject. So we dedicate the last part of this series to the most basic algorithms (they are not state-of-the-art because they are ubiquitous) of DL+RL likePolicy Gradient Method.

Especificamente, nesta primeira publicação apresentamos brevemente o que é Deep Reinforcement Learning e quais os termos básicos utilizados nesta área de pesquisa e inovação.

I think deep reinforcement learning is one of the most exciting fields in artificial intelligence. It combines the power and ability of deep neural networks to represent and understand the world with the ability to act on that understanding. Let's see if I can share that enthusiasm. Here we go!

exciting news aboutArtificial intelligence(AI) has only happened in recent years. For example, AlphaGo defeated the best professional human player in the game of Go. Or last year, for example, our friend Oriol Vinyals and his team demonstrated on DeepMind that AlphaStar Agent outperforms professional players in the StarCraft II game. Or a few months later, OpenAI's Dota 2 gaming bot became the first AI system to beat world champions in an eSports game. What all these systems have in common is that they use Deep Reinforcement Learning (DRL). But what are AI and DRL?

1.1 Artificial Intelligence

We need to take a step back to look at types of learning. Sometimes the terminology itself can confuse us with the basics. Artificial intelligence, the main field of computing in whichreinforcement learning(RL) falls within a discipline concerned with creating computer programs that exhibit human-like "intelligence".

What do we mean when we talk about artificial intelligence? Artificial intelligence (AI) is a broad field. Even an authoritative AI bookArtificial intelligence, a modern approachwritten byStuart RusselYPeter Norvig, does not give a precise definition and looks at AI definitions from different perspectives:

Artificial Intelligence: A Modern Approach (AIMA) 3rd edition, Stuart J. Russell and Peter Norvig, Prentice Hall, 2009. ISBN 0-13-604259-7

(Video) Deep Learning: Reinforcement Learning - Part 1

Undoubtedly, this book is the best starting point to gain an overview of the subject. But for a more general approach (the purpose of this series), we could accept a simple definition where by artificial intelligence we mean the intelligence exhibited by machines as opposed to the natural intelligence of humans. In this sense, a possible concise and general definition of artificial intelligence could be an effort to automate intellectual tasks normally performed by humans.

Therefore, the field of artificial intelligence is a vast scientific field that encompasses many areas of knowledge related to machine learning; many other approaches don't always get cataloged as my college classmates who are experts in the field embrace machine learning. Furthermore, over time, as computers have become more and more capable of "doing things", the tasks or technologies considered "smart" have changed.

Furthermore, since the 1950s, artificial intelligence has experienced multiple waves of optimism, followed by disappointments and loss of funding and interest (periods known asKI-Inverno), followed by new approaches, success and funding. Furthermore, for most of its history, AI research has been dynamically divided into subfields based on technical considerations or concrete mathematical tools, and with research communities that sometimes did not communicate sufficiently with one another.

1.2 Machine learning

Machine learning (ML) is itself a huge field of research and development. In particular, machine learning can be defined as a branch of artificial intelligence that gives computers the ability to learn without being explicitly programmed, ie. h without the programmer needing to specify the rules he must follow to accomplish his task; computers do this automatically.

To generalize, we can say that machine learning consists of developing a prediction “algorithm” for each problem for a specific use case. These algorithms learn from the data to find patterns or trends to understand whatthe dates saynodes and thus build a model for predicting and classifying the elements.

Given the maturity of the machine learning research field, there are many well-established approaches to machine learning. Each of them uses a different algorithmic framework to optimize predictions based on incoming data. Machine learning is a broad field with a complex taxonomy of algorithms that generally fall into three main categories:

  • supervised learningis the task of learning from labeled data, and its goal is togeneralize. We believe that learning is monitored when the data we use for training contains the desired solution, called a “label”. Some of the most popular machine learning algorithms in this category are linear regression, logistic regression, support vector machines, decision trees, random forests or neural networks.
  • unsupervised learningis the task of learning from unlabeled data, and its goal is tocompress. When the training data does not contain the labels, we speak of unsupervised learning and it is the algorithm itself that tries to classify the information. Some of the most popular algorithms in this category are clustering (K-means) or principal component analysis (PCA).
  • reinforcement learningis the task of learning by trial and error and its objective isact. This learning category can be combined with other categories and is a very active area of ​​research today, as we will see in this series.

1.3 Deep learning

Orthogonal to this categorization, we can consider a powerful ML approach calleddeep learning(DL), a topic we've covered extensively in previous posts. Remember that deep learning algorithms are based on artificial neural networks, whose algorithmic structures allow models composed of multiple processing layers to learn data representations with different levels of abstraction.

DL is not a separate branch of ML, so it is not a different task from those described above. DL is a collection of techniques and methods for using neural networks to solve ML tasks, be it supervised learning, unsupervised learning or reinforcement learning. We can represent it graphically in Figure 1.

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (3)

1.4 Deep reinforcement learning

Deep learning is one of the best tools we have today for dealing with unstructured environments; they can learn from large amounts of data or discover patterns. But this is not decision making; it's a detection problem. Reinforcement learning provides this capability.

Reinforcement learning can solve problems using a variety of ML methods and techniques, from decision trees to SVMs and neural networks. However, in this series, we only use neural networks; after all, that's what the "deep" part of the DRL is all about. However, neural networks are not necessarily the best solution to all problems. For example, neural networks consume a lot of data and are difficult to interpret. However, neural networks are arguably one of the most powerful techniques available today, and they often perform at their best.

In this section, we offer a brief first approach to RL as it is necessary for a good understanding of Deep Reinforcement Learning, a specific type of RL, which uses deep neural networks for state representation and/or function approximation to function value, politics, etc.

2.1 Learning through interaction

Learning through interaction with our environment is probably the first approach that comes to mind when we think about the nature of learning. That's how we learn intuitively, like a baby learns. And we know that such interactions are undoubtedly a fundamental source of knowledge about our environment and about ourselves, throughout life, not just from childhood. For example, when we learn to drive a car, we are acutely aware of how the environment reacts to what we are doing, and we also try to influence what is happening in our environment through our actions. Learning from interaction is a fundamental concept that underlies almost all theories of learning and is the basis of reinforcement learning.

The reinforcement learning approach focuses much more on targeted learning from interaction than other machine learning approaches. The student is not told what actions to take, but must discover for himself which actions will bring the greatest reward, his goal, by testing them through "trial and error". Furthermore, these actions can affect not only immediate rewards, but also future, “deferred rewards”, since current actions determine future situations (as it happens in real life). These two characteristics, search for "trial and error" and "delayed gratification", are two differentials of reinforcement learning that we will approach in this series of articles.

2.2 Key elements of reinforcement learning

Reinforcement learning (RL) is an area that is influenced by a variety of other major areas that deal with it.Decision problems under uncertainty. For example,control theoryexplores ways to control known complex dynamical systems; However, the dynamics of the systems we are trying to control are usually known in advance, unlike the DRL case which is not known in advance. Another field can becorporate investigationwhich also examines decision making under uncertainty, but often considers much broader scopes than commonly seen in RL.

This creates a synergy between these areas, which is undoubtedly positive for scientific progress. But it also introduces some inconsistencies in terminology, notations, etc. For this reason, in this section we will provide a detailed introduction to the terminologies and notations we will use throughout the series.

Reinforcement learning is essentially a mathematical formalization of a decision problem that we will introduce later in this series.

agent and environment

There are two main components to reinforcement learning:

  • AAgent, which represents the "solution", a computer program with the sole role of taking decisions (actions) to solve complex decision problems under uncertainty.
  • ASurroundings, which is the representation of a "problem", that is, everything that comes after the agent's decision. The environment responds with consequences of these actions, which are observations or states, and rewards, sometimes called costs.

For example, in tic-tac-toe, we can assume that the agent is one of the players and the environment includes the board game and the other player.

These two main components continually interact so that the agent tries to influence the environment through actions and the environment responds to the agent's actions. The way in which the environment reacts to certain actions is defined by a model that may or may not be known by the agent, and in it two circumstances can be distinguished:

  • If the agent knows the model, we denote this situation asmodel-based RL. In this case, knowing the environment in its entirety allows us to find the optimal solution.dynamic programming. This is not the purpose of this post.
  • If the agent does not know the model, it must make decisions with incomplete information; To dofree template rl,or try to learn the model explicitly as part of the algorithm.


The environment is represented by a set of variables related to the problem (very dependent on the type of problem we want to solve). This set of variables and all the possible values ​​they can take on are calledstate space. AIllnessis an instantiation of state space, a set of values ​​that variables take on.


As we assume that the agent does not have access to the full real state of the environment, it is usually calledsupervision, the part of the state that the agent can observe. However, we will often see observations and states used interchangeably in the literature, which is why we will be doing so in this series of posts.

The action and transition function

In any state, the environment provides aseries of actions, from which the agent selects aAction. The agent affects the environment through these actions, and the environment can change state in response to the agent's action. The function responsible for this mapping is mentioned in the literature.transition functionotransition probabilitiesbetween states


The environment usually has a well-defined purpose and can provide the agent with:awardSignal as a direct response to the agent's actions. This reward is feedback on how well the last action contributes to the task the environment is supposed to perform. The function responsible for this assignment is calledreward function. As we will see later, the agent's goal is to maximize the total reward it receives and, therefore, rewards are the motivation the agent needs to exhibit the desired behavior.

(Video) How Deep Reinforcement Learning Operates


Let's summarize the concepts presented earlier in the reinforcement learning cycle in the figure below:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (4)

In general, reinforcement learning essentially consists of turning this figure into a mathematical formalism.

The loop starts when the agent observes the environment (step 1) and receives a condition and a reward. The agent uses this state and the reward to decide the next action to take (Step 2). The agent then sends an action to the environment to try to control it in a beneficial way (step 3). Finally, the environment changes and its internal state changes as a result of the previous state and the agent's action (step 4). Then the cycle repeats.


The task the agent is trying to solve may or may not have a natural end. Tasks that have a natural ending, like B. a game, are calledepisodic tasks. Tasks that don't do this, on the other hand, are calledcontinuous tasksFor example, learn to move on. the consequence oftime stepsfrom beginning to end of an episodic task is denoted as inConsequence.


As we will see, agents can go through several time steps and episodes to learn how to solve a task. The sum of rewards collected in a single episode is denoted asvolte. Agents are generally designed to maximize returns.

One of the limitations is that these rewards are not revealed to the agent until the end of an episode, which we previously called "belated reward“. For example, in tic-tac-toe, the rewards for each move (action) are not known until the end of the game. It would be a positive reward if the agent won the game (because the agent achieved the desired total) or a negative reward (penalties) if the agent lost the game.

exploitation vs. Exploration

Another important characteristic and challenge in reinforcement learning is the exchange between “exploration"Y"exploitation“. When trying to get a lot of rewards, an agent should favor the actions it has tried in the past and know that they will be effective actions to get rewards. But to discover such actions, paradoxically, you must try actions you never chose before.

In short, an agent must exploit what it has already experienced to get the highest possible reward, but at the same time it must also explore to take better actions in the future. HeExploration-exploitation-dilemmais a crucial question and still an unresolved research topic. We'll talk about this tradeoff later in this series.

We reinforce our understanding of reinforcement learning by looking at a simple example, a frozen lake (too slippery) for our agent to skate on:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (5)

The Frozen Lake environment we'll use as an example is an ice skating rink divided into 16 cells (4x4), and as shown in the image below, some of the cells have broken through the ice. The skater called Agent starts skating in the upper left position and his objective is to reach the lower right point without falling into the four holes of the track.

The example described is coded asfrozen versionenvironmentacademia. With this example environment, we will review and clarify the RL terminology presented so far. It will also be helpful for future posts in this series to have this example.

3.1 Fitnessstudio Tool Kit

OpenAI is an artificial intelligence (AI) research organization that has a famous set of tools calledacademiatrain a reinforcement learning agent to develop and compare RL algorithms. Gym offers a variety of environments to train an RL agent, from classic control tasks to Atari game environments. We can train our RL agent to learn from different RL algorithms in these simulated environments. Throughout the series, we'll use the Gym toolkit to create and test reinforcement learning algorithms for a variety of classic control tasks, such as balancing carts and poles or climbing mountain carts.

Gym also offers 59 Atari game environments, including Pong, Space Invaders, Air Raid, Asteroids, Centipede, Ms. Pac-Man, etc. Training our reinforcement learning agent to play Atari games is an interesting and challenging task. Later in this series, we will train our reinforcement learning agent DQN to play in the Atari Pong game environment.

As an example, let's take one of the simplest environments calledfrozen versionSurroundings.

3.2 The frozen lake environment

The Frozen-Lake environment belongs to the so-called grid-world category if the agent lives in a 4x4 grid (it has 16 cells), which means that astate spaceconsisting of 16 states (0-15) in thethey, jGrid world coordinates.

E Frozen-Lake, derAgentit always starts at the top left corner and your goal is to get to the bottom right position of the grid. There are four holes in the fixed cells of the grid, and when the agent enters these holes, theConsequenceends, and theawardreceived is zero. When the agent arrives at the target cell, it receives aawardof +1, and theConsequenceends The following figure shows a visual representation of the Frozen Lake environment:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (6)

To achieve the goal, the agent mustaction spaceIt consists of movements in four directions: up, down, left and right. We also know that there is a fence around the lake. So if the agent tries to leave the grid world, it will simply go back to the cell it tried to leave.

As the lake is frozen, the world is slippery, so the agent's actions don't always go as expected: there's a 33% chance he'll slide left or right. For example, if we want the agent to move left, there is a 33% chance it will actually move left, a 33% chance it will end up in the cell above, and a 33% chance it will move left. will end up in the cell below.

This behavior of the environment is reflected in thetransition functionotransition probabilitiespreviously presented. However, we don't need to go into detail about this feature at this point and we'll defer it to later.

(Video) How to Code Hindsight Experience Replay | Deep Reinforcement Learning Tutorial

In summary, we could visualize all this information in the following figure:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (7)

3.3 Encoding of the environment

Let's see how this environment is represented in the Gym. I suggest using thoseAlabama offered by Google to run the code described in this post (the Gym package is already installed). If you prefer to use your Python programming environment, you can install Gym with the steps provided.Here.

The first step is to import the Gym:

import academy

Then specify the gym game you want to use. We will use the Frozen Lake game:

env = gym.make('FrozenLake-v0')

The game environment can be reset to its initial state with:


And to see the state of the game we can use:


The surface represented byto do()It is represented by a grid like the following:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (8)

Where the highlighted character indicates the agent's position in the flowtime to walkY

  • "S" shows the starting cell (safe position)
  • "F" indicates a frozen surface (safe position)
  • "H" indicates a hole
  • "G": Indicates the target

Official documentation can be foundHereto see the detailed usage and explanation of the Gym Toolkit.

3.4 Agent coding

For now, let's create the simplest agent we can make that just performs random actions. For this we use theaction_space.sample()showing a random action from the field of action.

Suppose we allow a maximum of 10 iterations; The following code can be our "dumb" agent:

import academyenv = gym.make("FrozenLake-v0")
for t in the interval (10):
imprimir("\nZeitschritt {}".formato(t))
a = env.action_space.sample()
ob, r, vertex, _ = env.step(a)
if done:
print("\nSequence terminated prematurely")

When executing this code, lines similar to this one are generated, where we can observe the time step, the action and the state of the environment:

DRL 01: A Gentle Introduction to Deep Reinforcement Learning (9)

In general, it is difficult, if not almost impossible, to find an episode of our "dumb" agent in which he manages to use randomly chosen actions to overcome obstacles and reach the target cell. ThenHow could we build an agent to track it?We will introduce this in the next part of this series, where we will further formalize the problem and create a new version of the agent that can learn to target the target cell.

To wrap up this post, let's briefly review the basics of reinforcement learning and compare it to other learning methods.

4.1 Reinforcement learning versus supervised learning

In supervised learning, the system learns from the training data, which consists of a pair of labeled inputs and outputs. We then train the model (agent) using the training data so that the model can generalize its learning to new invisible data (labeled pairs of inputs and outputs guide the model in learning the given task).

Let's understand the difference between supervised and reinforcement learning with an example. Imagine that we want to teach a model to play chess through supervised learning. In this case, we train the model to learn using a training dataset that contains all the moves a player can make in each state, along with labels indicating whether it's a good move or not. Whereas in the case of RL, our agent does not receive any training data; Instead, we simply offer the agent a reward for every action taken. The agent then learns through interaction with the environment and chooses its actions based on the reward it receives.

4.1 Reinforcement learning versus unsupervised learning

Similar to supervised learning, in unsupervised learning, we train the model using the training data. However, in unsupervised learning, the training data does not contain labels. And this leads to a common misconception that RL is some kind of unsupervised learning, as we don't have labels as input data. But it is not. In unsupervised learning, the model learns the hidden structure in the input data, whereas in RL, the model learns by maximizing the reward.

A classic example is a movie recommendation system that wants to recommend a new movie to the user. In unsupervised learning, the model (agent) finds movies similar to the movie the user (or user with similar profile to the user) has seen and recommends new movies to the user. In contrast, with reinforcement learning, the agent receives continuous feedback from the user. These comments represent rewards (a reward could be time spent watching a movie, time spent watching trailers, how many movies watched in a row, etc.). Based on the rewards, an RL agent will understand the user's movie preference and suggest new movies accordingly. It is important to note that an RL agent can know if the user's movie preference is changing and can dynamically suggest new movies according to the user's changed movie preference.

4.3 Where are the data in reinforcement learning?

We might think that with reinforcement learning we don't have data like in supervised or unsupervised learning. However, the data is actually the environment, because when interacting with this environment, data (trajectories) can be created, which are sequences of observations and actions. We can still learn something, and this is basically the core of reinforcement learning.

(Video) An Introduction to Reinforcement Learning

We may sometimes use additional data from people or trips that exist, for example,imitation learning. In fact, we might be watching some people play and we don't need to know exactly how the environment works. Sometimes we explicitly specify a dataset as some kind of monitored dataset, but only in the pure reinforcement learning environment.The data is the environment..

Reinforcement learning has developed rapidly in recent years with a wide range of applications. One of the main reasons for this development is the combination of reinforcement learning and deep learning. For this reason, in this series, we focus on presenting the basic state-of-the-art algorithms for Deep Reinforcement Learning (DRL).

5.1 Real DRL applications

The media tends to focus on apps where DRL beats humans in games, with examples I mentioned at the beginning of this post: AlphaGo beat the best professional human player in the game of Go; AlphaStar beat the pros in StarCraft II; OpenAI's Dota 2 gaming bot defeated world champions in an eSports game.

Fortunately, there are many real applications of DRLs. One of the most prominent is in the field of driverless cars. In manufacturing, intelligent robots are trained using DRLs to position objects correctly, reducing labor costs and increasing productivity. Another popular RL application is dynamic pricing, which allows the price of products to change based on demand and supply. In a recommender system, RL is also used to build a recommender system where user behavior changes continuously.

In today's business, DRL is widely used in supply chain management, demand forecasting, inventory management, warehouse operations management, etc. DRL is also widely used in financial portfolio management, forecasting and trading in commercial transactional markets. DRL has been widely used in various NLP (Natural Language Processing) tasks, for example, B. abstract text summarization, chatbots, etc.

Much recent research suggests DRL applications in healthcare, education, and smart cities, among others. In short, DRL leaves no sector untouched.

5.2 DRL Security

DRL agents can sometimes control dangerous real world environments such as robots or cars, increasing the risk of making the wrong decisions. There is an important field calledfor sure RLthat tries to manage that risk, for example by learning a policy that maximizes rewards while operating within predefined security constraints.

Also, like any other software system, DRL agents are at risk of being attacked. But DRL adds some new attack vectors beyond traditional machine learning systems, as we typically deal with much more complex systems to understand and model.

Considering the security of DRL systems is outside the introductory scope of this document. However, I would like to draw the reader's attention to this, and if you intend to implement a DRL system in the future, please note that this point needs to be covered in more detail.

5.3 We cannot escape our responsibility

Artificial intelligence is definitely invading society like electricity, what can we expect? The future we "invent" is a choice we make together, not something that happens. We are in a position of power. With DRL we have the power and authority to automate entire decisions and strategies.

That is good! But as with most things in life, where there's light there can be shadow, and DRL technology is dangerous in the wrong hands. When engineers think about what we're building, I ask: Can our DRL system inadvertently add distortion? How does this affect the individual? Or how does our solution affect the climate through its energy consumption? Can our DRL solution be used accidentally? Or do you? Or, according to our ethics, could it have a type of use that we could consider nefarious?

We need to think about the upcoming introduction of artificial intelligence and its implications. If we continue to build artificial intelligence, regardless of our responsibility to prevent its misuse, we can never expect artificial intelligence to bring prosperity to humanity.

All of us who work or want to work with these issues cannot escape our responsibility, otherwise we will regret it in the future.

We started the post by understanding the basic idea of ​​RL. We learned that RL is a trial and error learning process and RL learning is reward based. We present the difference between RL and the other ML paradigms. Finally, we look at some real-life uses of RL and reflect on DRL safety and ethics.

Militarynext postwe know the Markov Decision Process (MDP) and how the RL environment can be modeled as an MDP. Next, we will review some important fundamental concepts related to RL. i see you innext post!

Post updated on 8/12/2020

VonUPC Barcelona technologyYBarcelona Supercomputing Center

a relaxed introductionSerieswhich presents the reader, step by step and with a practical approach, this exciting technology that is the true enabler of the most recent disruptive advances in the field of artificial intelligence.

I started writing this series in May, during thePeriod of confinement in Barcelona.To be honest, writing these posts in my spare time helped me.#Stay at homefor confinement Thank you for reading this post at the moment; justify my effort.

disclaimers— These publications were written during this period of confinement in Barcelona as a personal distraction and dissemination of scientific knowledge in case it is useful to someone, but without the aim of being an academic reference document in the area of ​​DRL. If the reader needs a more rigorous document, the latest publication in the series offers an extensive list of scholarly resources and books for the reader to consult. The author is aware that this series of contributions may contain some errors and requires revision of the English text to improve it when intended as an academic document. But, although the author wants to improve the content quantitatively and qualitatively, his professional commitment does not allow him time for this. However, the author agrees to correct any errors that readers may report as soon as possible.

(Video) Practical insights into deep reinforcement learning - Sahika Genc (Amazon)


1. Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
(NAIST Robot Learning Lab )
2. Design a truss with graph neural network & deep reinforcement learning
3. ICRA2021 RCM recommendation using deep reinforcement learning
(Huxin Gao)
4. Deep Reinforcement Learning - A tutorial: Vikas Raykar
(Hasgeek TV)
5. tinyML Talks Local Israel - Dor Livne: PoPS: Policy Pruning and Shrinking of Deep Reinforcement...
6. 12.Tom Schaul: Deep Reinforcement Learning


Top Articles
Latest Posts
Article information

Author: Dan Stracke

Last Updated: 09/09/2023

Views: 6074

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Dan Stracke

Birthday: 1992-08-25

Address: 2253 Brown Springs, East Alla, OH 38634-0309

Phone: +398735162064

Job: Investor Government Associate

Hobby: Shopping, LARPing, Scrapbooking, Surfing, Slacklining, Dance, Glassblowing

Introduction: My name is Dan Stracke, I am a homely, gleaming, glamorous, inquisitive, homely, gorgeous, light person who loves writing and wants to share my knowledge and understanding with you.