14.2.9 RBp Implementation Details

An attempt was made to make the implementation of RBp very flexible, modular, and robust. Thus, units keep track of their own sense of time and record their own history of activations. This is done with a CircBuffer object, which is derived from an Array type (see section 8.3 Arrays). Values in this buffer can wrap-around, and can be shifted without performing any memory copying. Thus, the unit activation records can be quickly shifted after performing a backpropagation enough to make room for new states to be recorded. The trial process will shift the buffers just enough so that they will be full again the next time a backpropagation will occur (i.e., they are shifted by the bp_gap value converted into tick units).

However, the buffers are robust in the sense that if the bp_gap parameter is changed during processing, they will simply add new states and dynamically increase the size of the buffer if it is already full. Indeed, when a unit first starts processing, the buffer is automatically added to by the same mechanism--it is always full until some number of values have been shifted off the end.

The units only record their activation values in the buffers. Thus, there is a StoreState function which takes a snapshot of the current activation state. It is called at the end of the Compute_Act function. During backpropagation, the StepBack function is called, which will take one step back in time. The activation state recorded in the prv_ version of the unit variables are copied into the current variables, and the new previous values are loaded from the buffers at the given tick.

The backpropagation loop looks as follows:

PerformBP(): {
  InitForBP();                  // clear current and prev error vals
  int buf_sz = GetUnitBufSize(); // get unit buffer size (states in units)
  for(i=buf_sz-2; i>=0; i--) { // loop backwards through unit states
    Compute_dEdA_dEdNet();      // backpropagate based on current state
    Compute_dWt();              // compute weight changes from error
    if(i > 0)                   // if not all the way back yet
      StepBack(i-1);		// step back to previous time
  }
  RestoreState(buf_sz-1);	// restore activations to end values
  if(real_time)
    ShiftBuffers();		// don't shift unless real time
}

Thus, error derivatives are computed on the current and prv_ activation state variables, and then these are shifted backwards one step, and this continues for the entire length of stored activation values. The above routine is called by the trial process whenever the buffer size of the units is equal to or greater than the bp time window.

During backpropagation, the prv_dEdA and prv_dEdNet values are kept, and are used to time-average the computations of these values, much in the same way the activations or net inputs are time averaged during the activation computation phase.

The following is a chart describing the flow of processing in the RBp algorithm, starting with the epoch process, since higher levels do not interact with the details of particular algorithms, and assuming sequences are being used:

SequenceEpoch: {
  Init: {                              // at start of epoch
    environment->InitEvents();          // init events (if dynamic)
    event_list.Add() 0 to environment->GroupCount(); // get list of groups
    if(order == PERMUTE) event_list.Permute(); // permute list if necessary
    GetCurEvent();                      // get pointer to current group
  }
  Loop (trial): {                      // loop over trials
    SequenceProcess: {                 // sequence process (one sequence)
      Init: {                          // at start of sequence
        tick.max = cur_event_gp->EventCount(); // set max no of ticks
        event_list.Add() 0 to tick.max; // get list of events from group
        if(order == PERMUTE) event_list.Permute(); // permute if necessary
        GetCurEvent();                  // get pointer to current event
        InitNetState() {               // initialize net state at start
          if(sequence_init == INIT_STATE) network->InitState();
        }
      }
      Loop (tick): {                   // loop over ticks (sequence events)
        RBpTrial: {                    // trial process (one event)
          Init: {                      // at start of trial
            cur_event = epoch_proc->cur_event; // get event from sequence
          }
          Loop (once): {               // process this once per trial
            network->InitExterns();     // init external input to units
            cur_event->ApplyPatterns(network); // apply patterns from event 
            if(unit buffer size == 0) { // units were just reset, time starting
              time = 0;                 // reset time
              StoreState();             // store initial state at t = 0
            }
            Compute_Act(): {           // compute acts (synchronous)
              network->Compute_Net();   // first get net inputs
              network->Compute_Act();   // then update acts based on nets
            }
            if(unit buffer size > time_win_ticks) // if act state buffers full
              PerformBP();              // backpropagate through states
            time += dt;                 // time has advanced..
          }
        }
        if(wt_update == ON_LINE) network->UpdateWeights(); // after trial
      }
    }
    if(wt_update == SMALL_BATCH)        // end of sequence
      network->UpdateWeights();         // update weights after sequence
    GetCurEvent();                      // get next event group
  }
  Final:                                // at end of epoch
    if(wt_update == BATCH)  network->UpdateWeights(); // batch mode updt
}