You are on page 1of 7

Limitation of apriori algo Needs several iterations of the data Uses a uniform minimum support threshold Difficulties to find

find rarely occuring events Alternative methods (other than appriori) can address this by using a non-uniform minimum support thresold Some competing alternative approaches focus on partition and sampling

Apriori, while historically significant, suffers from a number of inefficiencies or tradeoffs, which have spawned other algorithms. Candidate generation generates large numbers of subsets (the algorithm attempts to load up the candidate set with as many as possible before each scan). Bottom-up subset exploration (essentially a breadth-first traversal of the subset lattice) finds any maximal subset S only after all of its proper subsets. Later algorithms such as Max-Miner[2] try to identify the maximal frequent itemsets without enumerating their subsets, and perform "jumps" in the search space rather than a purely bottom-up approach.

Sets of database Transactional database D All products an itemset I = {i1, i2,, im} Unique shopping event T I T contains itemset X iff X T Based on itemsets X and Y an association rule can be X Y It is required that X I, Y I and XY=

Genetic Algorithm Flowchart


Now, with the knowledge of how to interpret the gene values, we can discuss how the genetic algorithm functions. Let us have a closer look at the genetic algorithm flowchart its classical representation. 1. Initialize the start time, t=0. Form randomly the initial population that consists of k units. B0 = {A1,A2,,Ak) 2. Calculate the fitness of each unit, FAi = fit(Ai) , i=1k, and the fitness of the entire population, Ft = fit(Bt). The value of this function determines to what extent the unit described by this chromosome suits to solve the problem. 3. Select the Ac unit in the population. Ac = Get(Bt) 4. Select the second unit in the population with a certain probability (the crossover Pc probability), c1 = Get(Bt), and perform the crossover operator, Ac = Crossing(Ac,Ac1). 5. Perform the mutation operator with a certain probability (the mutation Pm probability), Ac = mutation(Ac). 6. Perform the inversion operator with a certain probability (the inversion Pi probability), Ac = inversion(Ac). 7. Place the obtained new chromosome into the new population, insert(Bt+1,Ac). 8. Steps 3 to 7 should be repeated k times. 9. Increase the current epoch number, t=t+1.

10. If the stop condition is met, terminate the loop. Otherwise, go to step 2. Some stages of the algorithm need closer consideration. Steps 3 and 4, the stage of parent chromosomes selection, play the most important role in successful functioning of the algorithm. The can be various possible alternatives at this. The most frequently used selection method is called roulette. When this method is used, the probability that this or that chromosome will be selected is determined by its fitness, i.e., PGet(Ai) ~ Fit(Ai)/Fit(Bt). The use of this method results in the increasing of the probability that attributes belonging to the most adjusted units will be propagated in the offsprings. Antoher frequently used method is the tournament selection. It consists in that several units (2, as a rule) are randomly selected among the population. The fittest unit will be selected as a winner.Besides, in some implementations of the algorithm, the so-called elitism strategy is used, which means that the best-adjusted units are guaranteed to enter the new population. This approach usually allows to accelerate the genetic algorithm convergence. The disadvantage of this strategy is the increased probability of the algorithm getting in the local minimum. The determination of the algorithm stop criteria is another important point. Either the limitation of the algorithm functioning epochs or determination of the convergence of the algorithm (normally, through comparison of the population fitness in several epochs to the stop when this parameter is stabilized) are used as such criteria.
Financial markets conclusion Mahfoud and Mani 1996 used a genetic algorithm to predict the future performance of 1600 publicly traded stocks. Specifically, the GA was tasked with forecasting the relative return of each stock, defined as that stock's return minus the average return of all 1600 stocks over the time period in question, 12 weeks (one calendar quarter) into the future. As input, the GA was given historical data about each stock in the form of a list of 15 attributes, such as price-to-earnings ratio and growth rate, measured at various past points in time; the GA was asked to evolve a set of if/then rules to classify each stock and to provide, as output, both a recommendation on what to do with regards to that stock (buy, sell, or no prediction) and a numerical forecast of the relative return. The GA's results were compared to those of an established neural net-based system which the authors had been using to forecast stock prices and manage portfolios for three years previously. Of course, the stock market is an extremely noisy and nonlinear system, and no predictive mechanism can be correct 100% of the time; the challenge is to find a predictor that is accurate more often than not. In the experiment, the GA and the neural net each made forecasts at the end of the week for each one of the 1600 stocks, for twelve consecutive weeks. Twelve weeks after each prediction, the actual performance was compared with the predicted relative return. Overall, the GA significantly outperformed the neural network: in one trial run, the GA correctly predicted the direction of one stock 47.6% of the time, made no prediction 45.8% of the time, and made an incorrect prediction only 6.6% of the time, for an overall predictive accuracy of 87.8%. Although the neural network made definite predictions more often, it was also wrong in its predictions more often (in fact, the authors speculate that the GA's greater ability to make no prediction when the data were uncertain was a factor in its

success; the neural net always produces a prediction unless explicitly restricted by the programmer). In the 1600-stock experiment, the GA produced a relative return of +5.47%, versus +4.40% for the neural net - a statistically significant difference. In fact, the GA also significantly outperformed three major stock market indices - the S&P 500, the S&P 400, and the Russell 2000 - over this period; chance was excluded as the cause of this result at the 95% confidence level. The authors attribute this compelling success to the ability of the genetic algorithm to learn nonlinear relationships not readily apparent to human observers, as well as the fact that it lacks a human expert's "a priori bias against counterintuitive or contrarian rules" (p.562). Similar success was achieved by Andreou, Georgopoulos and Likothanassis 2002, who used hybrid genetic algorithms to evolve neural networks that predicted the exchange rates of foreign currencies up to one month ahead. As opposed to the last example, where GAs and neural nets were in competition, here the two worked in concert, with the GA evolving the architecture (number of input units, number of hidden units, and the arrangement of the links between them) of the network which was then trained by a filter algorithm. As historical information, the algorithm was given 1300 previous raw daily values of five currencies - the American dollar, the German deutsche mark, the French franc, the British pound, and the Greek drachma - and asked to predict their future values 1, 2, 5, and 20 days ahead. The hybrid GA's performance, in general, showed a "remarkable level of accuracy" (p.200) in all cases tested, outperforming several other methods including neural networks alone. Correlations for the one-day case ranged from 92 to 99%, and though accuracy decreased over increasingly greater time lags, the GA continued to be "quite successful" (p.206) and clearly outperformed the other methods. The authors conclude that "remarkable prediction success has been achieved in both a one-step ahead and a multistep predicting horizon" (p.208) - in fact, they state that their results are better by far than any related predictive strategies attempted on this data series or other currencies. The uses of GAs on the financial markets have begun to spread into real-world brokerage firms. Naik 1996 reports that LBS Capital Management, an American firm headquartered in Florida, uses genetic algorithms to pick stocks for a pension fund it manages. Coale 1997 and Begley and Beals 1995 report that First Quadrant, an investment firm in California that manages over $2.2 billion, uses GAs to make investment decisions for all of their financial services. Their evolved model earns, on average, $255 for every $100 invested over six years, as opposed to $205 for other types of modeling systems.
Pseudocode for optimization of rule generation
1. while (t <= no_of_gen) 2. M_Selection(Population(t)) 3. ACO_MetaHeuristic while(not_termination) generateSolutions() pheromoneUpdate() daemonActions() end while end ACO_MetaHeuristic 4. M_Recombination_and_Mutation(Population(t)) 5. Evaluate Population(t) in each objective.

6. t = t+1 7. end while 8. Decode the individuals obtained from the population with high fitness function

PseudocodeParallel MOGA for association rule mining: 1. t=0 2. Initialize P(t) in G_Master and distribute equally to each available processors. Am. J. Appl. Sci., 3 (11): 2086-2095 2092 3. Evaluate Pi(t) in each processor based on local dataset by round-robin scheme i. 4. while (t <= no_of_gen) 5. Send_String (Pi(t), Master) 6. P(t)M_Selection(P(t)) 7. P(t)M_Recombination_and_Mutation(P(t)) 8. Distribute P(t) to each available Processors. 9. Evaluate Pi(t) in each processor based on the local dataset. 10. t = t+1 11. Pi(t)Pi(t) 12. end while 13. Decode the individuals obtained from the masters of each group in IF-THEN rules.

You might also like