John Koza's Publications: Year Index:
Koza, John R. 1995a. Evolving the architecture of a multi-part program in
genetic programming using architecture-altering operations. In Sebald, A. V.
and Fogel, L. J. (editors). Proceedings of the Fourth Annual Conference on
Evolutionary Programming.
This paper describes six new architecture-altering operations that provide a way to dynamically determine the architecture of a multi-part program during a run of genetic programming. The new operations are patterned after the naturally occurring operations of gene duplication and gene deletion and are motivated by Ohno's provocative book Evolution by Means of Gene Duplication. The new operations are branch duplication, argument duplication, branch creation, argument creation, branch deletion, and argument deletion. These operations dynamically change the architecture of various programs during a run of genetic programming. The new operations can also be interpreted as providing an automated way to specialize and generalize programs. The paper demonstrates that problems can be solved while the architecture is being evolved.
Click here for PDF file of this EP-1995 conference paper
Koza, John R. 1995b. Gene duplication to enable genetic programming to
concurrently evolve both the architecture and work-performing steps of a
computer program. Proceedings of the 14th International Joint Conference on
Artificial Intelligence.
Susumu Ohno's provocative book Evolution by Gene Duplication proposed that the creation of new proteins in nature (and hence new structures and new behaviors in living things) begins with a gene duplication and that gene duplication is "the major force of evolution." This paper describes six new architecture-altering operations for genetic programming that are patterned after the naturally-occurring chromosomal operations of gene duplication and gene deletion. When these new operations are included in a run of genetic programming, genetic programming can dynamically change, during the run, the architecture of a multi-part program consisting of a main program and a set of hierarchically-called subprograms. These on-the-fly architectural changes occur while genetic programming is concurrently evolving the work-performing steps of the main program and the hierarchically-called subprograms. The new operations can be interpreted as an automated way to change the representation of a problem while solving the problem. Equivalently, these operations can be viewed as an automated way to decompose a problem into an non-pre-specified number of subproblems of non-pre-specified dimensionality; solve the subproblems; and assemble the solutions of the subproblems into a solution of the overall problem. These operations can also be interpreted as providing an automated way to specialize and generalize.
Click here for PDF file of this IJCAI-1995 conference paper
Koza, John R. 1995c. Two ways of discovering the size and shape of a
computer program to solve a problem. In Eshelman, L. (editor). Genetic
Algorithms: Proceedings of the Fifth International Conference.
The requirement that the user predetermine the size and shape of the ultimate solution to a problem has been a bane of automated machine learning from the earliest times. This paper compares two techniques for automatically discovering, during a run of genetic programming, the architecture of a multi-part computer program while concurrently solving the problem. In the first technique, called evolutionary selection, the initial random population is architecturally diverse and there is a competition during the run among the various architectures while they are trying to solve the problem. The second technique, called evolution of architecture, employs six new architecture-altering operations that provide a way to evolve the architecture of a multi-part program in the sense of actually changing the architecture of the program dynamically during the run. The new architecture-altering operations are motivated by the naturally occurring operation of gene duplication, as described in Susumu Ohno's provocative book Evolution by Means of Gene Duplication.
Click here for PDF file of this ICGA-1995 conference paper
Koza, John R. 1995d. Genetic Programming for econometric modeling. In
Goonatilaje, Susan and Treleaven, P. (editors). Intelligent Systems for
Finance and Business.
The problem of finding an econometric model expressing the mathematical relationship between empirically observed econometric variables can be viewed as a search of the space of possible computer programs for a program that produces the desired output for given inputs. This computer program can be found via the recently developed genetic programming paradigm which breeds populations of computer programs in a Darwinian competition using genetic operations such as fitness proportionate reproduction and crossover (sexual recombination). In this paper, we rediscover the well-known non-linear "exchange equation" P=F(MV,Q) relating the money supply, price level, gross national product, and velocity of money in an economy.
Click here for a PDF file of this chapter in book edited by Goonatilaje and Treleaven
Koza, John R. 1995e. Survey of genetic algorithms and genetic programming. Proceedings
of 1995 WESCON Conference.
This paper provides an introduction to genetic algorithms and genetic programming and lists sources of additional information, including books and conferences as well as e-mail lists and software that is available over the Internet.
Click here for PDF file of this WESCON-1995 conference paper
Andre, David and Koza, John R. 1995a. Parallel genetic programming on a
network of transputers. In Rosca, Justinian (editor). Proceedings of the
Workshop on Genetic Programming: From Theory to Real-World Applications.
This paper describes the parallel implementation of genetic programming in the C programming language using a PC 486 type computer running Windows acting as a host and a network of transputers acting as processing nodes. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two. A comparison is made of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with migration rate at 5%; however, there was little difference in performance for migration rates between 1% and 8%. Moreover, this computational effort required for these migration rates was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, parallelization delivered more than linear speed-up in solving the problem using genetic programming.
Click here for PDF file of the long version of this ML-1995 workshop paper on parallelization of genetic programming
Click here for PDF file of the short version of this ML-1995 workshop paper on parallelization of genetic programming
Koza, John R. and Andre, David. 1995a. Parallel Genetic Programming on a Network of Transputers. Stanford University Computer Science Department technical report STAN-TR-CS-95-1542. January 30, 1995. (1995)
This report describes the parallel implementation of genetic programming in the C programming language using a PC 486 type computer (running Windows) acting as a host and a network of transputers acting as processing nodes. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two.
A comparison is made of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with an 8% migration rate. Moreover, this computational effort was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, parallelization delivered more than linear speed-up in solving the problem using genetic programming.
Click here for PDF file of version on Stanford University web site
Click here for Stanford University web site containing this paper in several other formats
Koza, John R. and Andre, David. 1995b. Automated discovery of protein motifs
with genetic programming. In Siegel, Eric (editor). Proceedings of AAAI-95
Fall Symposium Series - Genetic Programming.
Automated methods of machine learning may prove to be useful in discovering biologically meaningful information hidden in the rapidly growing databases of DNA sequences and protein sequences.
Genetic programming is an extension of the genetic algorithm in which a population of computer programs is bred, over a series of generations, in order to solve a problem. Genetic programming is capable of evolving complicated problem-solving expressions of unspecified size and shape. Moreover, when automatically defined functions are added to genetic programming, genetic programming becomes capable of efficiently capturing and exploiting recurring sub-patterns.
This chapter describes how genetic programming with automatically defined functions successfully evolved motifs for detecting the D-E-A-D box family of proteins and for detecting the manganese superoxide dismutase family. Both motifs were evolved without prespecifying their length. Both evolved motifs employed automatically defined functions to capture the repeated use of common subexpressions. When tested against the SWISS-PROT database of proteins, the two genetically evolved consensus motifs detect the two families either as well, or slightly better than, the comparable human-written motifs found in the PROSITE database.
Click here for PDF file of AAAI-1995 Fall Symposium paper on protein motifs
Koza, John R. and Andre, David. 1995e. Automatic discovery using genetic
programming of an unknown-sized detector of protein motifs containing
repeatedly-used subexpressions. In Rosca, Justinian (editor). Proceedings of
the Workshop on Genetic Programming: From Theory to Real-World Applications.
Automated methods of machine learning may be useful in discovering biologically meaningful patterns that are hidden in the rapidly growing databases of genomic and protein sequences. However, almost all existing methods of automated discovery require that the user specify, in advance, the size and shape of the pattern that is to be discovered. Moreover, existing methods do not have a workable analog of the idea of a reusable subroutine to exploit the recurring sub-patterns of a problem environment.
Genetic programming can evolve complicated problem-solving expressions of unspecified size and shape. When automatically defined functions are added to genetic programming, genetic programming becomes capable of efficiently capturing and exploiting recurring sub-patterns.
This paper describes how genetic programming with automatically defined functions successfully evolved motifs for detecting the D-E-A-D box family of proteins and for detecting the manganese superoxide dismutase family. Both motifs were evolved without prespecifying their length. Both evolved motifs employed automatically defined functions to capture the repeated use of common subexpressions. When tested against the SWISS-PROT database of proteins, the two genetically evolved consensus motifs detect the two families either as well, or slightly better than, the comparable human-written motifs found in the PROSITE database.
Click here for PDF file of this ML-1995 workshop paper on protein motifs
Koza, John R. and Andre, David. 1995d. Evolution of both the architecture
and the sequence of work-performing steps of a computer program using genetic
programming with architecture-altering operations. In Siegel, Eric (editor). Proceedings
of AAAI-95 Fall Symposium Series - Genetic Programming.
The goal of automatic programming is to create, in an automated way, a computer program that enables a computer to solve a problem. Ideally, an automatic programming system should require that the user pre-specify as little as possible about the problem environment. In particular, it is desirable that the user not be required to prespecify the architecture of the ultimate solution to his problem.
The question of how to automatically create the architecture of the overall program in an evolutionary approach to automatic programming, such as genetic programming, has a parallel in the biological world: how new structures and behaviors are created in living things. This corresponds to the question of how new DNA that encodes for a new protein is created in more complex organisms.
This chapter describes how the biological theory of gene duplication described in Susumu Ohno's provocative book, Evolution by Means of Gene Duplication, was brought to bear on the problem of architecture discovery in genetic programming. The resulting biologically-motivated approach uses six new architecture-altering operations to enable genetic programming to automatically discover the architecture of the solution at the same time as genetic programming is evolving a solution to the problem.
Genetic programming with the architecture-altering operations is used to evolve a computer program to classify a given protein segment as being a transmembrane domain or non-transmembrane area of the protein (without biochemical knowledge, such as the hydrophobicity values used in human-written algorithms for this task). The best genetically-evolved program achieved an out-of-sample error rate that was better than that reported for other previously reported human-written algorithms. This is an instance of an automated machine learning algorithm matching human performance on a non-trivial problem.
Click here for PDF file of this AAAI-1995 Fall Symposium paper on architecture-altering operations and the transmembrane segment identification problem
Koza, John R. (editor). 1995h. Genetic Algorithms and Genetic Programming
at Stanford 1995.
Click here for information on obtaining a copy of book of student papers for 1995.
Koza, John R. (editor). 1995f. Course Reader for Computer Science 426
(Genetic Algorithms and Genetic Programming) for Fall Quarter 1995.
Koza, John R. (editor).
This book was published by the Stanford University Bookstore in 1995. Click here for updated information on university courses on genetic and evolutionary computation
· The home page of Genetic Programming Inc. at www.genetic-programming.com.
· For information about the field of genetic programming and the field of genetic and evolutionary computation, visit www.genetic-programming.org
· The home page of John R. Koza at Genetic Programming Inc. (including online versions of most published papers) and the home page of John R. Koza at Stanford University
· For information about John Koza’s course on genetic algorithms and genetic programming at Stanford University
· Information about the 1992
book Genetic
Programming: On the Programming of Computers by Means of Natural Selection,
the 1994 book Genetic
Programming II: Automatic Discovery of Reusable Programs, the 1999
book Genetic
Programming III: Darwinian Invention and Problem Solving, and the
2003 book Genetic
Programming IV: Routine
Human-Competitive Machine Intelligence. Click here to read chapter 1 of Genetic
Programming IV book in PDF format.
· 3,440
published papers on genetic programming (as of November 28, 2003) in a
searchable bibliography (with many on-line versions of papers) by over 880
authors maintained by William Langdon’s and Steven M. Gustafson.
· For information on the Genetic Programming and Evolvable Machines journal published by Kluwer Academic Publishers
· For information on the Genetic Programming book series from Kluwer Academic Publishers, see the Call For Book Proposals
· For information about the
annual 2005 Genetic and
Evolutionary Computation (GECCO) conference (which includes the annual
GP conference) to be held on June 25–29, 2005 (Saturday – Wednesday) in
Last updated on August 21, 2004