Most people follow a similar process for learning new things: receive the information, process it, try it out yourself, and receive feedback on how it went. A lot of this process is also bolstered by rewards and punishment: if you answer correctly, you get a gold star, an extra point, or a higher grade. If you answer incorrectly, you lose points, leave the competition, or must repeat the exercise.
As artificial intelligence becomes increasingly prevalent and competent, programmers are using the same processes in a popular form of machine learning called reinforcement learning (RL). With this technology, businesses are able to optimize, control, and monitor their workflows with a previously impossible level of accuracy and finesse.1 As reinforcement learning evolves, its potential and benefits only grow stronger.
Keep reading to find out more about the origins and applications of RL, many of which you might have already experienced today.
What is reinforcement learning?
Reinforcement learning is the closest to human learning as digital systems and machines can get. Through this training, machine learning models can be taught to follow instructions, conduct tests, operate equipment, and much more.2
Reinforcement learning is centered around a digital agent who is put in a specific environment to learn. Similar to the way that we learn new things, the agent faces a game-like situation and must make a series of decisions to try to achieve the correct outcome.3 Through trial and error, the agent will learn what to do (and what not to do) and is rewarded and punished accordingly. Every time it receives a reward, it reinforces the behavior and signals the agent to employ the same tactics again next time.
History & Background
The foundations for reinforcement learning were laid over 100 years ago, and it is actually said to have a two-pronged origin. The first is rooted in animal learning and the “Law of Effect,” coined by Edward Thorndike. Thorndike described the Law of Effect in 1911 as the notion that an animal will repeat actions if they produce satisfaction, and it will be deterred from actions that produce discomfort. Furthermore, the greater the level of pleasure or pain, the greater the pursuit or deterrence from the action.4 The Law of Effect combines both selectional and associative learning; with selectional learning, the animal will try to try a few different options and routes and select among them based on how they went. In associative learning, the animal chooses its options based on what situations they associate them with, and whether they’re positive or negative.
Although Thorndike established the essence of reinforcement learning, the term “reinforcement” wasn’t formally used until 1927 by Ivan Pavlov. He described reinforcement as “the strengthening of a pattern of behavior due to an animal receiving a stimulus—a reinforcer— in a time-dependent relationship with another stimulus or with a response.”4 In other words, when animals receive a reaction to something they’ve done shortly after they’ve done it, it affects whether or not they’ll do it again, in the same way, in the future.
The second origin, optimal control, is more rooted in mathematics and algorithms than animal learning. Starting in the 1950s, researchers began to define optimization methods to derive control policies in continuous time control problems. Building on this, Richard Bellman developed programming that defines a functional equation using a dynamic system’s state and returns an optimal value function (commonly referred to as the Bellman equation). Bellman then went on to introduce the Markovian Decision Process (MDP), which he defines as “a discrete stochastic version of the optimal control problem”.4,1 MDPs helped create solution methods that gradually reach the correct answer to something through successive guesses, much like modern reinforcement learning.
Applications of Reinforcement Learning
Reinforcement learning is on the rise and its future is just as vibrant. Here, we’ll take a look at some of the current ways RL is working in the real world.
1. Automated Robots
While most robots don’t look like pop culture has led us to believe, their capabilities are just as impressive. The more robots learn using RL, the more accurate they become, and the quicker they can complete a previously arduous task. They can also perform duties that would be dangerous for people with far less consequences. For these reasons, aside from requiring some oversight and regular maintenance, robots are a cost-effective and efficient alternative to manual labor.
For example, some restaurants use robots to deliver food to tables. Grocery stores are using robots to identify where shelves are low and order more product. In common settings, automated robots have been used thus far to assemble products; inspect for defects; count, track, and manage inventory; deliver goods; travel long and short distances; input, organize, and report on data; and grasp and handle objects of all different shapes and sizes. As we continue to test robotic abilities, new features are being introduced to expand their potential.
2. Natural Language Processing
Predictive text, text summarization, question answering, and machine translation are all examples of natural language processing (NLP) that uses reinforcement learning. By studying typical language patterns, RL agents can mimic and predict how people speak to each other every day. This includes the actual language used, as well as syntax, (the arrangement of words and phrases) and diction (the choice of words).
In 2016, researchers from Stanford University, Ohio State University, and Microsoft Research used this learning to generate dialogue, like what’s used for chatbots. Using two virtual agents, they simulated conversations and used policy gradient methods to reward important attributes such as coherence, informativity, and ease of answering.5 This research was unique in that it didn’t only focus on the question at hand, but also on how an answer could influence future outcomes. This approach to reinforcement learning in NLP is now widely adopted and used by customer service departments in many major organizations.
3. Marketing and Advertising
Both brands and consumers can use reinforcement learning to their benefit. For brands selling to target audiences, they can use real-time bidding platforms, A/B testing, and automatic ad optimization. This means that they can place a series of advertisements in the marketplace and the host will automatically serve the best-performing ads in the best spots for the lowest prices.2,5 Although brands post and set up the campaigns themselves, marketing and advertising platforms are also learning which types of ads are resonating with audiences and will display those ads more frequently and prominently.
From a consumer perspective, you might notice that the ads you receive are usually from companies whose websites you’ve visited before, whom you have bought from before, or are in the same industry as a company from which you’ve made a purchase. That’s because marketing and advertising platforms can use reinforcement learning to associate similar companies, products, and services to prioritize for certain customers. If they try certain options and receive a click or other engagement, it signals that they were ‘correct’ and should employ the same strategy again.2
4. Image Processing
Have you ever taken a security test that asked you to identify objects in frames, such as “Click on the photos that have a street sign in them”? This is similar to what learning machines can do, although they approach it in a different way.
When asked to process an image, RL agents will search an entire image as their starting point, then identify objects sequentially until everything is registered. Artificial vision systems also use deep convolutional neural networks, made up of large, labeled datasets, to map images to human-generated scene descriptions from simulation engines.2
Some more examples of reinforcement learning in image processing include:2
- Robots equipped with visual sensors from to learn their surrounding environment
- Scanners to understand and interpret text
- Image pre-processing and segmentation of medical images, like CT Scans
- Traffic analysis and real-time road processing by video segmentation and frame-by-frame image processing
- CCTV cameras for traffic and crowd analytics
5. Recommendation Systems
The “Frequently Bought Together” section on Amazon, a “Customers Also Liked” tab online at Target, and the “Recommended Reading” articles from news outlets all utilize learning machines to generate recommendations. Specifically for news reading, RL agents can track the types of stories, topics, and even author names someone prefers so that the system can queue the next story they think they would enjoy. That includes the details of exactly how they interact with the content, e.g., clicks and shares, and aspects such as timing and freshness of the news. A reward is then defined based on these user behaviors.5
Recommendation systems also analyze past behaviors to try to predict future ones. So if, for example, a hundred people who bought ski pants then went on to buy ski boots, a company’s system learns to send ads for ski boots to anyone who just bought ski pants. If the ads are unsuccessful, they might try to display ads for ski jackets, instead, and see how the results compare.
From creating a new game, to testing its bugs, to defeating its levels, RL is an efficient and relatively easy resource on which programmers can rely. Compared to traditional video games that require complex behavioral trees to craft the logic of the game, training an RL model is much simpler. Here, the agent will learn by itself in the simulated game environment through navigation, defense, attack, and strategizing.2 Through trial and error, they’ll begin to perform the necessary actions to reach the desired goal.
RL agents are also used in bug detection and game testing. This is due to its ability to run a large number of iterations without human input, stress testing, and creating situations for potential bugs.2
7. Energy Conservation
As much of the world works to lower their effects on the climate, reducing energy consumption is at the top of the list. A prime example is the partnership between Deepmind and Google to cool massive and essential Google Data Centers. With a fully-functioning AI system, the centers saw a 40% reduction in energy spending without the need for human intervention—though there is still some supervision from data center experts.5,6
The system works in the following way:5
- Taking snapshots of data from the data centers every five minutes and feeding this to deep neural networks
- Predicting how different combinations will affect future energy consumptions
- Identifying actions that will lead to minimal power consumption while maintaining a set standard of safety criteria
- Sending and implementing these actions at the data center
- Verifying the actions by the local control system
Another example may be an Eco setting on your thermostat, or motion-activated lights that offer different settings based on the level of light already in the room.
8. Traffic Control
Civil engineers have been struggling with traffic for centuries, but reinforcement learning is working to help solve that. Continuous traffic monitoring in complex urban networks helps build a literal and figurative “map” of traffic patterns and vehicle behavior. Due to its data-driven nature, the RL agents can start to learn when traffic is heaviest, which directions it’s coming from, and how quickly cars are moving through each light color.2 Then, they adapt accordingly and continue to test and learn across times, climates, and seasons.
Healthcare employs machine learning and artificial intelligence in much of its work, and RL is no exception. It has been used in automated medical diagnosis, resource scheduling, drug discovery and development, and health management.5
One important avenue for deploying reinforcement learning is in dynamic treatment regimes (DTRs). To create a DTR, someone must input a set of clinical observations and assessments of a patient. Using previous outcomes and patient medical history, the learning system will then output a suggestion on treatment type, drug dosages, and appointment timing for every stage of the patient’s journey. This is extremely beneficial for making time-dependent decisions for the best treatment for a patient at a specific time without expending much time, energy, or effort to consult with multiple parties.2
Learn RL & More to Advance in Analytics
Behind every successful reinforcement learning scenario is a team of data scientists, programmers, and business analysts who make it all possible. But RL requires a specific set of skills, one that a Master’s in Business Analytics or Data Science is guaranteed to give you.
The Online MSBA program at Santa Clara University’s Leavey School of Business offers courses on RL Algorithms, Temporal Difference Learning, Q-Learning, Deep Learning Neural Networks, and much more. The popularity of and demand for these skills is certainly apparent: jobs for professionals trained in data science and analytics increased by 50% across a number of sectors in the past few years, and the U.S. Bureau of Labor Statistics has listed data science as one of the top 20 fastest growing occupations.*,*If you're already a student of business analytics, or a prospective student looking to enhance your career, consider how an MSBA degree could enhance your career.
- Retrieved on September 19, 2022, from towardsdatascience.com/reinforcement-learning-fda8ff535bb6#757c
- Retrieved on September 19, 2022, from v7labs.com/blog/reinforcement-learning-applications#h8
- Retrieved on September 19, 2022, from deepsense.ai/what-is-reinforcement-learning-the-complete-guide/
- Retrieved on September 19, 2022, from researchdatapod.com/history-reinforcement-learning/
- Retrieved on September 20, 2022, from neptune.ai/blog/reinforcement-learning-applications
- Retrieved on September 20, 2022, from deepmind.com/blog/safety-first-ai-for-autonomous-data-centre-cooling-and-industrial-control