Training Manual Part II
By Dolphin Encounters
Edited by NAPPA
Dolphin Encounters generously provided NAPPA with their professional “Animal Training Manual” to be revised for the training of the Potbellied Pig. We have edited this manual to apply to the training of our pet pigs. NAPPA wishes to express our appreciation to Dolphin Encounters and their generous donation. Animal Training Manual
The North American Potbellied Pig Association
A target enables a trainer to tell the animal where and how to do a behavior. It is an object to which an animal has to move towards in order to perform the correct behavior, for example a hand, or a ramp, or a toy. Because the animal “operates” on its environment, physically choosing to move towards the target, the training technique using the concept of targets is called operant conditioning.
This method is different from the type of training typically used with dogs and horses in which leashes and reins are used. When an animal is physically pulled or pushed to complete a task, it is called manipulation. Rarely do we use manipulation with the pigs because the outcome of a choice to participate by the animal is much more positive than the effects of physically forcing them to do the same.
In most interactive behaviors, the pig must touch particular part(s) of the person with whom they are to interact in order to do a behavior correctly. These are real targets, solid objects which the animal is to touch in a specific way. However, there is another target that can be used to train a pig: an imaginary target. This is a target that used to be there but as the animal learned the behavior, the target was no longer needed and was extinguished. Let’s look at this in more detail.
Once a behavior is trained, the target used to originally train the behavior is usually removed. It is phased out as the animal learns the behavior. For example, a trainer may use a target to train a pig to play a piano with its nose, defining the motion of the behavior step by step until the animal follows the target without hesitation. By this time in the training, the animal should recognize the signal with the behavior and the trainer need only use the signal with a partial motion of the target to have the animal perform the correct behavior. The pig anticipates the behavior and needs only a reminder of the target to do the behavior correctly.
When training a new behavior, anticipation is a positive tool because it helps the animal make a “mental leap” in understanding the behavior without the original target. It is literally the animal figuring out what the trainer wants. Anticipation occurs because the trainer asks for the same behavior repetitively, intentionally using repetition to help the pig predict the behavior in advance. In a behavior which the target is completely removed, anticipation helps the animal make the mental leap from following a real target to following an imaginary target, i.e. where the real target would be if it was still included in the behavior.
When trainers ask for behaviors that are already learned, they generally avoided repetition to make their interactions with the animals less predictable and more interesting. If trainers are predictable, anticipation becomes a motivational problem because the animal does not want to pay attention to the trainer. In fact, the pig will spend more time and energy guessing as to what the trainer wants instead of waiting for the signal of the next behavior. Thus, anticipation is a tool to be carefully used.
In order to communicate with the animals when they have done something correctly we use a signal called a bridge, which for us is the whistle. (This could be any sound that the pig only hears in training. Some people say the letter “X” because we do not use the sound of “X” in our speech very often The bridge pinpoints in time when the behavior is correct. It also effectively “bridges” the gap in time when the behavior has been done correctly and when the animal receives its reward.
For example, when a pig circles and the circle meets the expectations of the trainer, the trainer sounds the whistle at the peak of the behavior. When the animal returns to the trainer, it knows that the reward to be given applies to the behavior performed the instant the whistle was blown, not just for the action of returning to the trainer. Still, it should be pointed out that the gap between the correct behavior and the reward should not to be longer than absolutely necessary. In other words, the animal should return to the trainer immediately. The longer the gap, the less effective the reward will be in value in relation to the behavior intended to reinforce.
Not only does the bridge tell the animal the behavior is correct and that a reward is likely, but it also terminates the behavior at that instant and (generally) requests that the animal returns to the trainer promptly. If behaviors are chained together in a sequence, the bridge would apply to the correctness of all the behaviors in the chain. This allows a trainer to reinforce a group of behaviors rather than reward for each individual event.
However, trainers beware: a chain is only as strong as its weakest link. Too many behaviors in a chain, or, a chain accepted with one wrong behavior, weakens the effectiveness of the reward for all the behaviors. For example, a trainer wants to chain a circle followed by a sit-up, a kiss, and finally another sit-up before bridging. If each component of the chain is performed well, the bridge is appropriate.
However, if the kiss was done poorly and the whole chain is still accepted, not only has the trainer lessened the criteria of the chain but he/she has taught the pig that behaviors in the chain need not be correct to get a reward. In fact, in most cases such as this, only the last one really needs to be correct. Imagine the outcome of a day’s worth of poor chains!at#$!!!
Finally, a trainer needs rewards or reinforcers, things to offer the animal that the animal likes and will cause the animal to want to repeat the behavior in the future. These reinforcers can be anything, such as food, toys, games, attention, another animal, your voice or presence. As long as the animal is motivated by the stimulus chosen at that particular time, it is a reward. Because an animal’s motivation for a reward changes, a trainer must know which rewards will work at any given time.
For example, after Christmas dinner, everyone is stuffed and cannot even eat one more morsel. The mother of the family wants help to get the dishes washed. Does she offer them food to do the dishes? No, with their motivation for food satiated, they won’t work for food. But, if she were to offer an opportunity to watch a special football game or the keys to the car for later use that night, perhaps her children may do the dishes. It is up to the mother to pick the appropriate reward for the desired outcome.
One of the most important rules to remember regarding reinforcers: what you get is what you have trained. In other words, whatever you reward is what you will be getting in the future. If you find the animal offering less than what is required repetitively, it is probably the behavior that has been rewarded in the past. Therefore, if a behavior is not correct, don’t reward it! In order to use rewards effectively, a trainer must only give a reward when a behavior is done correctly — meeting the criteria 100%.
By creating a broad repertoire of rewards to choose from with each animal, a trainer has more flexibility to use each of those rewards where they will be most effective. Not only is the type of reward important in value, but so is the quantity and the delivery. By varying the amount given — a 30 second belly rub versus a 2 minute belly rub— the reward is less predictable and the animal will wok harder not knowing how much of any given reward will be issued.
The delivery of the reward also is another source of variety. For example, a trainer gives one large handful for a correct behavior or chooses to give the treat individually, one right after the other. The positive impact of receiving a large amount in one instant is different from the impact of receiving so many individual treats over a longer period of time, even if the total amount given is the same. It is also essential that the delivery is not sloppy. If the food is important, the trainer should then make sure the animal receives each and every treat intended. Do not give hand fulls if they cannot be delivered with accuracy.
In psychology terms, rewards can be “positive” or “negative,” referring to whether the reward as a stimulus the animal likes is added or a stimulus the animal does not like is removed from animal’s environment. This is different from the typical day to day use of these terms referring to the value of an object or event, i.e. good or bad. For example, a pig who likes balls can be given a ball as a positive reinforcer for a behavior performed correctly.
Another animal is trained going up a ramp using a board behind. Once the pig walks up the ramp, the board is removed, negatively reinforcing the behavior going up the ramp. There are actually very few times when we as trainers use aversive stimulus like negative reinforcers. In most cases, we would rather control a stimulus by being able to add something the animal likes to its environment, such as treats, toys and the trainer’s presence.
There are two types of reinforcers, primary and secondary. Primary reinforcers are rewards that are innately reinforcing — the animal is born with a natural desire for them. Some examples would be food, water, sex, shade from excessive heat or warmth from extreme cold. Because these rewards are innate, we as trainers begin our relationship with the animals using a primary reinforcer, such as food, to reward behaviors.
Once we have established our relationship, we mix in many other rewards which we train the animal to like – secondary reinforcers. These rewards have become reinforcing over time because they have been paired with a primary reinforcer, taking on some of the positive traits of that reward. Secondary reinforcers may be objects, sounds, actions, or any form of stimulus that the animal enjoys — even the trainer’s presence.
Behaviors can also become secondary reinforcers and be used to reward other behaviors. For example, a trainer asks a pig to sit and receive a scratch behind the ear and after bridging, feeds the animal for the behavior. Over time, the animal may like the scratch behind the ear by itself and the scratch behind the ear may be used as a reward for another behavior in place of the treat. The scratch behind the ear has become a secondary reinforcer.
At any time there are several ways in which an animal can and should be paid in order to attain a high level of performance. The different ways of using rewards are called programs of rewarding or schedules of reinforcement. Understanding these different ways to deliver rewards is essential in understanding how high levels of behavioral performance are sustained and why they fail in time.
The first schedule of reinforcement you will encounter is continuous reinforcement — the delivery of a reinforcer each and every time an animal attempts a behavior. It is a technique generally only used when starting a new behavior. Unfortunately, you will use up the allotted diet in a hurry! The animal will also tire of this quickly, realizing that it must merely try the behavior in order to earn a reward. It need not try hard, just try. The animal is therefore not motivated to improve and the behavior you are working reaches a point where no further progress is made. It is time to move on to another schedule of reinforcement.
Fixed interval reinforcing is a schedule in which a behavior is rewarded after a specific amount of time. Example: the circle is always terminated after five seconds. The number of repetitions of the movement or the distance traveled do not matter, only the completed period of time specified. A limitation of this schedule is that the animal predicts the time required and it is very difficult to increase the time required for a behavior once the shorter time criteria becomes ingrained on the animal. Another example: a pig stops short of the complete circle the trainer wants. The animals stop there because the trainers fall into the habit of requesting the same time (inadvertently or not) and the animals no longer listen for the bridge and terminate on time. Predictable reinforcement will create problems for a trainer in the long run.
Fixed ratio reinforcing is a schedule in which a behavior is paid after a specific number of repetitions. Example: the bows are to be performed in a series of three dips of the head. Once again, because there is a limit applied on the behavior because of the way the reinforcement is delivered, it makes it very difficult to modify the behavior. Being predictable with your reinforcement creates problems. In this case, it is challenging to increase the number of repetitions past what the animal is used to. Also, the quality of each head bow is difficult to maintain. How many times can you remember seeing the first head bow in a series of three being substandard but the chain was still reinforced by the trainer?
In general, it is much more beneficial to stay away from predictable programs of reinforcement. Your ability to vary the time or repetitions of any particular behavior will cause the animal to work harder because it will not know how long it will have to perform a behavior or how many times. A good trainer, though, must know the history of each behavior with each animal in order to know the animal’s “limits” and challenge them.
For a given behavior, you will learn what your pig or pigs is capable of by watching your pig or pigs carefully. Then, you will learn at what level your animal performs. Finally, you will plan small steps to improve your animal’s performance and raise it to higher standards. The next programs of reinforcement will help you achieve these goals.
Variable interval reinforcing is a schedule in which a behavior is paid after an arbitrary amount of time. Example: a general vocalization from an animal is paid one time after two seconds and another time after ten seconds. The animal is signaled to vocalize until the trainer thinks the behavior has gone on long enough. As long as the criteria of the behavior is maintained, specific length or repetitions are not important. By using variable interval reinforcement, the trainer can bridge the behavior the instant a pig commits to performing the behavior correctly or wait for longer intervals. Keep in mind, too, that some behaviors require physical stamina that may take many sessions to build up. The reinforcement schedule must accommodate the animal’s physical capacity just as much as its motivational limits. Key point: know your animal’s history.
Variable ratio reinforcing is a schedule in which a behavior is paid after an arbitrary number of repetitions. Example: a pig is asked to perform a circle, spinning 360 degrees continuously. The animal is paid for the correct circle when the trainer decides if the number of rotations is enough. Once again, the animal will work harder on this variable schedule because it is not sure how many repetitions will earn the desired reward.
Over the years, trainers have used many different schedules of reinforcement and the most effective ones involve mixing when, how often, how much and what types of rewards are given for correct behaviors. This schedule is called random interupted reinforcement or RIR. Once a behavior is established to performance, the reinforcement is entirely variable. The animal may receive one treat, five treats or none at all for a correct response. The animal will perform consistently in hopes of hitting a jackpot.
Example: A lady sits at a slot machine and puts one quarter in each time in hope of hitting the jackpot of hundreds of quarters. Though she puts in a quarter each time, not every effort is rewarded. Randomly, she may win one or even dozens of quarters. However, her motivation to continue playing will stay high due to the unpredictability of the pay off. If a second lady was to win one quarter after every four she put in, she would soon quit because the game had become predictable. The first lady may lose more money in the slots than she wins, but her motivation would stay elevated longer and her behavior continue to correct longer than the second lady. A trainer needs to adopt RIR into his/her schedule and periodically review his/her habits to avoid predictable reinforcement histories.
©Dolphins Encounter as Edited by the North American Potbellied Pig Association.
NAPPA will be offering the complete training manual over a period of months. So check your News next month for information about “Time Out”.