CRL itself is an approximate approach to learning online amplification, decentralized. It has similarities with population-based techniques, such as ACO, particle intelligence (Kennedy and Eberhart, 2001) and evolutionary data: the system takes as input a variety of DOPs, and it reinforces the choice of agents that have successfully solved DOPs given the state of the system environment; This process enhances the system`s advantages for a stable environment and can also adapt a system to its changing environment. Instead of letting agents die and being replaced by editors, CRL agents break down their solutions to clean up the outdated information system and use collaborative feedback to learn new solutions together. An alternative to the selection of common actions is to allow agents to carry out individual actions so that the collective behaviour of agents minimizes the cost function of the system (Kok and Vlassis, 2007). Experiences conducted by Claus and Boutilier with groups of independent Q-Learning officers have shown the need for agent cooperation to ensure that the actions of local agents are good globally (Claus and Boutilier, 1998). Agents who are not aware of other agents may choose actions that are not optimal for the system because they use local Q-values that are supposed to be independent of the selected actions and rewards received by other agents. Another approach to building an independent learning model, where agents are not aware of each other, is the Wolperts Intelligence Collective (COIN) model (Lawson and Wolpert, 2002). In the COIN model, problems are structured to adapt the local cost models of independent agents to the system cost model to ensure that measures that are good at the local level are always good at the global level. However, the application of this approach is limited.

Although the Guestrins CG model was developed for the online approach, Kok and Vlassis have adapted the e-learning model. In both models, the overall Q function is taken into account in a CG. In Figure 1, we can see Guestrin breaking down the global Q function using a set of neighbors` agents called agent-based decomposition (Guestrin et al. 2002), while Kok and Vlassis break down the overall Q function using links between pairs of agents called edge-based decomposition (Kok and Vlassis, 2006). In edge decomposition, an edge is represented from agent i to agent j as q function, qi, where the sum of all edges (Q functions) defines the overall Q function. Local Q functions are updated based on the local Q functions of the pair of agents that make up the edge. This is comparable to agent-based cutting, where local Q-functions are updated based on the local Q-functions of all neighbors. To calculate the best common action, agents in the Kok and Vlassis model use an approximate algorithm called max-plus, while in the Guestrins model, agents use an exact algorithm called variable elimination. The edge-based decomposition approach is linearly modulated to the width of the CG, while the agent-based decomposition approach becomes exponential (Kok and Vlassis, 2006). Fig.

3 showing how a CDM can be launched either by an application on the agent`s host or by a neighbor who delegates the DOP to him. In distributed systems, delegation actions are attributed to the transmission of messages through a network and investigative actions are categorized into some kind of underlying service or hosting protocol. An investigation operation that finds a new neighbour adds a new delegation action for the neighbor and a new Q-Value entry into the dependency table for the corresponding status (e).