Original PlanA General Description and Proposal

Original Plan A General Description and Proposal Prepared April 30, 1998 Last edited July 26, 1998
Contents Introduction A. Background B. Organization C. Logistics D. Personnel E. Timeline F. Budget A. Background The impulse for the research proposed here can be traced back over the past 35 years, during which time excellent experiments have demonstrated that human consciousness can interact with delicately balanced physical systems, in particular, REGs. During the past five years, these experiments have been brought out of the laboratory and into "field" situations where the REG devices show characteristic behavior in the presence of people participating in deeply engaging activities. Most recently, the experiments have been extended to gather data during extraordinary global events such as the funeral ceremonies for Princess Diana, with results indicating a measurable effect correlated with the widely shared emotional resonance. It is research which suggests that further explorations might be very productive. A.1. Implementation The Global Consciousness Project can be conceptualized as an international network designed to record subtle, direct interactions of globally coherent consciousness with a world spanning array of sensitive detectors. The core of the GCP is a network of REGs placed to record data at sites around the world and report it via the Internet to central computers where the data will be automatically archived, analyzed, and displayed on a dedicated website. To do this experiment, a refined protocol built on the foundation of earlier, rigorous anomalies research has been developed. The data will be integrated in a variety of standard, well-tested analyses summarizing local and global statistics, with particular attention to anomalous deviations from chance expectation and unexpected structure in the data arrays, correlated with predictions made prior to data acquisition. Correlations with major events that engage the world population will be the primary vehicle for tests of coherence and patterns in the globally distributed REG data. A.2. Technology The technological heart of the system will be the central servers, the computers where
all data will be gathered and redundantly archived. These machines will run sophisticated programs to create and maintain the growing database generated continuously at a rate of one or more data-points per second at each node in the EGG network. Here also will reside the analytical software and the integrative algorithms designed to reveal patterns and structure in the data. Common statistics will be calculated, but also a variety of refined measures developed in recent years to extract subtle information from backgrounds of random noise. The analyses will be augmented by graphical displays designed to serve both scientific and aesthetic purposes. Data-driven music may be generated, and may allow perception of inherent patterns that escape conventional analysis. A.3. Three Phases The GCP is designed in three phases, the first of which is focused primarily on building and testing system components in a small network. Phase 1 and part of the subsequent phase will be conducted with considerable attention to confidentiality to allow the acquisition of as large a database as possible prior to broader public engagement. This will allow an assessment of effects of growing public awareness due to publicity that might -- if the fundamental EGG hypothesis is correct -- create perturbations which can be observed in the data. During this period, preparations will proceed for Phase 2, intended to have up to 28 sites at opportune university, commercial, and individual sites. Phase 2 will permit the implementation of more complex analysis procedures derived from computational technologies such as quantitative electroencephalography, which require input from a certain minimum of sources. Assuming the results of the second phase show structure in the REG data warranting deeper exploration, Phase 3 will increase the number of sites to 100 or more and will ask more incisive scientific questions. This phase will address issues generated from first principles as well as questions raised by previous results. It will explore effects of simple factors such as the relative density of the network's spatial array, and more complex issues including potential applications. We expect to implement web-based forms to provide public access (qualified by suitable automatic filtering) for people who wish to download data for well-defined analyses. B. OrganizationB. Organization and Direction The project has been designed to be both interdisciplinary and inter-institutional. It operates under the aegis of the Institute of Noetic Sciences, a 501(c)(3) research entity, which will manage the funding for the project. A Planning Group (see below) will have responsibility for the design of the project, its implementation and subsequent analysis and interpretation. The Project Director will have day-to-day operational control of the project and will be the central communications nexus for the project. All members of the Planning Group participate in the Global Consciousness Project as individuals, and not as representatives of their affiliated institutions.
B.1. Principles and Policies This is a voluntary, participatory undertaking. We believe the talents and time needed for all the work will be given freely by people interested in the concepts and willing to help. Because this is a project to assess interactions which, while often defined and debated, are poorly understood, a significant effort has been made to: Assure that while all of us have some personal ego needs that will affect our engagement in the GCP, we intend that it should be protected from undue and avoidable influences from this source. Assure that the project is protected from destructive influences and individuals whose motivations are inimical to resonant, cooperative interactions. Assure that during the planning and development period, there will be no public descriptions of the project. More generally, we will be quiet about grand intentions and expectations, and sensitive to implications. We do not know whether there is a global consciousness, but we recognize that science is not about what we know will work; rather it illuminates what we believe is worth exploring. We acknowledge that we implicitly assume some form of global consciousness. Therefore, we will be respectful and caring when we describe the project and our expectations. We do know the Earth is beautiful and that nature's systems are elegant, with an inherent integrity. The Global Consciousness Project intends to emulate this character. The project is committed to the highest standards of integrity and scientific excellence. The members of the project are pledged to act with mutual respect, honesty, and dignity, and to work toward public presentations which are elegant and principled. B.2. Intellectual Property The Global Consciousness Project is a product of efforts and intellectual contributions from many people, all of whom share in the intellectual property rights according to the type and degree of their contribution. The descriptions, documents, plans, website pages, and other materials comprising the project are the property of the Global Consciousness Project, and these are considered to be copyrighted by members of the planning group. C. LogisticsB. Organization and Direction istics and Sup This is a project in the research portfolio of the Institute of Noetic Sciences (IONS). The project will be administered by Roger Nelson, the Project Director, who will have responsibility for the content, quality, and style of the GCP. The Institute of Noetic Sciences is an internationally known, 501(c)(3), not-for-profit research organization. All donations are tax- deductible. Checks may be made out to IONS-GCP, earmarking the donation for The Global Consciousness Project, and addressed to: Institute of Noetic Sciences 101 San Antonio Road Petaluma, CA 94952
The project will seek support from foundations, businesses, and individuals. Contributors of more than $100 are entitled to membership in an IONS circle group, and a subscription to the Institute's regular publications. There also is a sponsor page on the Global Consciousness website, giving the names of all contributors and sponsors who wish their names or organizations to be listed. If you need more information about the Project, contact the GCP. D. PersonnelD. Personnel D.1. Planning Group The creative and operational staff of the Global Consciousnes Project have full responsibility for the content and operation of the network. The Planning Group are the primary designers for the project and the main source of expertise for implementation, analysis, and display. All members of the Planning Group participate in the EGG Project as individuals, and not as representatives of their affiliated institutions. Roger Nelson, Project Director, is Research Coordinator of Princeton Engineering Anomalies Research (PEAR) at Princeton University. He serves as producer, art director, and science manager for the GCP. Dick Bierman, Professor of Psychology (albeit a physicist), University of Amsterdam, is a pioneer of World Wide Web experiments in anomalies. He is the editor of eJAP, the electronic Journal of Anomalous Phenomena. Greg Nelson, American Nuclear Systems, Knoxville, TN, is a computer scientist (Artificial Intelligence) with major interest in systems integration. He is chief architect for the EGG network backbone. John Walker, near Neuchatel, Switzerland, is founder and former CEO of Autodesk, Inc., developer of the AutoCAD computer aided design software. His current focus is a Web site (http://www.fourmilab.ch/) which hosts, among other resources, The RetroPsychoKinesis Project. Rick Berger, Innovative Software Design/Innovative Product Marketing, San Antonio, Texas, has a long history and deep interest in anomalies research. He is chiefly responsible for the logic and aesthetics of GCP's website. George deBeaumont, is a data analyst with a major utilities firm, and has a deep interest in consciousness and its possible effects in the world. He is the creator of most of the detailed graphical displays exploring the current results. Charles Overby, Lifebridge Foundation, New York, is a computer engineer with experience in secure network communications. He is a shaman and scholar, investigating subtle energies that may emanate from humans. Dean Radin, Interval Research, Palo Alto, CA, author of "The Conscious Universe", is an early explorer of effects of group consciousness. He has long-standing interest in integrating multivariate data complexes. Marilyn Schlitz is Director of Research at the Institute of Noetic Sciences. She is a medical anthropologist, with major interests in research on consciousness and the interactions of intentionality and living systems.
Stephan Schwartz, various US locations, formerly Research Director of the Mobius Society, is an entrepreneurial pioneer in applications of psi to problem solving and discovery in archaeological, medical, and other fields. Jiri Wackermann, has three hats: Neuroscience Technology Research, Prague; KEY Institute for Brain/Mind Research, Zurich; and Psychophysiology of Consciousness Laboratory, IGPP, Freiburg, Germany. He explores complex, hyperdimensional representations of states of mind. D.2. Friends of the Project A somewhat larger group of friends sharing a strong resonance with the concepts and purposes of the project provide spiritual and practical support. Some may have more explicit roles, contributing expertise or advice, or building specialized aspects of the project such as refined analytical software or aesthetic displays on the website. Other friends, especially those in far-flung locations, will provide the host sites for the remote REG machines, and will also expand the interpersonal connectedness that is the core of the hypothesis engaged by the GCP. New contributors -- friends of friends -- are welcome. E. TimelineD. Personnel imeline The timeline of development for for the GCP is approximately determined, although considerable flexibility is necessary to accommodate situational requirements. The following description is general, and reflects the status in April 1999. The project began as a result of conversations at a conference in Freiburg in November 1997. Some precedent for the concept can be found earlier in data taken in continuously running REG experiments in Princeton and more recently in Amsterdam. In addition, a few global events have been recorded by organizing a temporary collaboration of investigators using related technology and protocols. Active planning began early in 1998, and preliminary work on specifications culminated in a stable network architecture by April. Work on clear descriptions of methodology and related documentation proceeded during this time, and orders were placed for the primary server and for REG devices to be used in Phase 1 during June. Software installation was tested in mid-July, and formal data acquisition began in August, 1998. At the end of 1998, 10 host-sites were running, with more sites to be added as quickly as time and resources permit. The GCP website was designed as a semi-public resource, intended to be opened for full public access after network stabilization with 20 eggs or more. This was originally expected by end of 1998, but has taken longer. The semi-public website was professionally redesigned in March and April, 1999, for public release in late April. A Prediction Registry is regularly updated with descriptions of events and situations that are expected to correlate with indications of Global Consciousness. For example, various end-of-year celebrations will be given special attention to develop protocols and experience for the following year with its millennial transition. When the EGG network is active, the registry of explicit predictions will be used to define and categorize global events, and these will constitute a timeline for the project's analytical framework.
F. BudgetD. Personnel imeline . Budget We will start with a small implementation using a few sites -- one for each of the EGG planning group, to build and test the primary acquisition and communication software. The expected cost for this Phase 1 will be approximately $12,000. Phase 2, with the growth of the network to approximately 30 eggs, will cost an additional $25,000. We will determine the viability of Phase 3 before committing to a specific budget. A number of factors can alter the budget greatly, including, for example, the possibility that equipment or Internet connection might be donated, or the possibility that the Planning Group may determine that it is essential to have continuous, and hence more expensive connections at all sites. If you need further information to consider offering financial support, we will provide details. PART 1: DESCRIPTION AND OVERVIEW OF ISSUES INTRODUCTION This document describes some of the technical issues being discussed about the Random Event Generator Global Network, or REGG-Net, also known as EGG, an acronym for ElectroGaiaGram. . This is an edited version of a comprehensive architecture discussion and specification. Part 1 presents the general picture and some background discussions. In Part 2, which overlaps Part one, more specific proposals are detailed. Part 3 is a glossary for the many technical terms and acronyms. Note added 98-09-07: A full understanding of the issues generated in creating a real-time network of this sort comes through the actual experience, and this leads to additions and revisions. A great deal of work has now been done, and some revised specifications have been generated by John Walker. They supplement this document, which provides the basic outlines for the technical instantiation of the network. The new specifications have been implemented, and as of 98-09-06, the new versions of data collection and archiving programs are running. The specifications provide considered options and suggestions to focus discussion. This is a working draft intended to help finalize enough details that we can begin the implementation of the network's "backbone" and possibly some of the other aspects. It incorporates valuable contributions from many of the project members, only some of which have been properly acknowledged below. The items that most require input are noted, but other discussion or suggestions are welcome. Many of the default specifications will be apparent, but are quite malleable at this point (and some will remain settable options). The network will consist of one or more centralized servers ("Baskets"), acting as clearing houses for data collected at numerous client sites ("Eggs"). We will plan at this point for no more than a hundred Eggs, and likely either two or three Baskets. The connections between the various Eggs and the Internet is likely to be different in different cases. Some Eggs will be connected directly to ethernet drops at commercial
or university sites, while others will require dial-up connections. Since some of the dial-up connections may be expensive, we plan to allow for both permanent and on-demand connections. There is an expressed interest in eventually bringing in data from other types of sites running with different hardware or experimental parameters (in Jiri's terminology, "Cuckoo Eggs"). Although we are open to this possibility, there must be mechanisms for synchronizing and Co-analyzing the data with that taken from the native Eggs both in temporal and statistical terms. By creating a multi-layer protocol, it will be much easier to incorporate such data. The following set of protocol layers allows for a great deal of flexibility and reasonable independence of certain choices: 1) Hardware dependent data acquisition 2) Data encoding (timestamps, content type information, etc.) 3) Data transmission (packet organization, ports, protocol, encryption...) 4) Analytical techniques 5) Presentations of data To incorporate data from Cuckoo Eggs, the first three layers would be replaced by any desired methodology of getting the data to the Basket (though not including throwing out the original Eggs, as Cuckoos do...). However, the complexity of the analytical techniques will increase as more disparate data needs to be incorporated, so as an initial design we will assume a uniformity of everything but the first layer. (Layer 3 might also change as a function of increasing security requirements.) Each Egg may have a set of options, with some potentially remotely configurable while others might be set at compile or installation time, or only settable at the site. These options apply primarily to the first two layers, since the third layer is intended to be a uniform interface essentially independent of the settings at the Egg-site. In the remainder of the document we discuss the layers in order, followed by some more general issues that are layer-independent. At the present time, this document has very little to say about the fourth and fifth layers, since these are still very much open-ended issues. DATA ACQUISITION At this time the acquisition software is being designed primarily around the PEAR "Bradish box" or "micro" REGs. This protocol level can be replaced by a different set of code to support other devices such as "Bierman boxes." At the present time, we do not have sufficient specifications to build the layer appropriate to these devices. An option can be provided to select the type of REG/RNG device, though this is probably fixed at compile-time or only selectable at the Egg-site, rather than remotely settable.
The data acquisition should be as close as possible to a "real-time" process (or "isochronous," if possible), to guarantee that the sampling rates selected (see below) are met exactly, without any sort of systematic drift. Although there might be slight variations in the spacing of the samples (differential non-linearity) we expect that the average number of samples per second (integral non-linearity) will be controlled quite precisely, and therefore synchronized among systems. This should be easily accomplished using as a reference the system clock, which has near microsecond resolution on Linux machines. Synchronization of these clocks across machines is discussed below in the section "Broader Network and Protocol Issues." DATA ENCODING AND EGG-SITE OPTIONS Given a low-level acquisition layer, a number of choices remain about what data to collect, how much to collect, when to collect it, how to represent it, and so on. Furthermore, in order to analyze the data effectively, a variety of information is required in addition to the raw sample or trial values. In particular, for the data to be comparable across sites, some form of uniform timestamp information is required. The second layer of the protocol is designed to address these issues. Many of the choices to be made are quite arbitrary, and it seems desirable to leave them as options that can be changed later. Some may even be usefully changed at run-time, to change the nature of the experimental setup globally by a single administrative choice. We discuss these issues in terms of a set of options, with certain practical limitations. After some discussion, it seems fairly well agreed that the data should be transmitted in the form of "trials" which collect some number of bits. Although the raw bits may be of interest, the basic Egg hardware design will not have the storage capacity, and often will lack the necessary bandwidth to communicate the data. Further, to keep the communication protocol between Egg and Basket simple, it is preferable to base it always on the notion of a "trial" rather than allowing both trials and raw bitwise data to be communicated. Thus we consider our other options in terms of "trials". (An option to use bitwise data collection at some Eggs is briefly discussed below.) Trial type After some discussion, it seems to be agreed upon that either Z-scores or bit-sums are a reasonable mechanism for representing the trials. Given a known trial size, either can be transformed into the other. For this reason, we believe the technical advantages of bit-sums (reduced local computation, storage, and communication) suggest implementing only the bit-sum method. Trial size The number of bits accumulated into a trial should be variable, since there is no consensus on this at the present time. It may even be desirable to have the trial size be different at different Egg-sites, though from a technical and analytical standpoint this seems undesirable, and we have found no strong argument in favor of this possibility. The range for this (in bits/trial) should be set as needed to give some decent statistical
information within the trial, meaning a minimum of 32 and preferably 50 or more. Keeping the number below 256 has the technical advantage of efficient storage and transmission. Sample rate The sample rate (in trials/second) should also probably be uniform across all Eggs at any given time. However, it should be made easy to change this rate if the consensus is that the data is too sparse or more voluminous than necessary. Any number less than about 10 trials/second seems reasonable, with a maximum for time between trials set at 5 minutes. An initial number between 3 trials/second and 3 seconds/trial seems a good starting point. The sample rate and trial size combine to give bits/second, and there are technical limitations on certain devices that preclude very high bit rates. In the established configuration, the PEAR "Bradish Box" REGs can produce about 488 bits/second, which limits the output to less than 2 trials/second at 255 bits/trial, or 10 trials/second at 48 bits/trial. Sample spacing Two major possibilities exist for how to turn bits into trials when the device produces more bits/second than are being included in trials. In any case, all the bits produced by the device must be read and some discarded, or else (at least for serial devices) there will be a significant lag between the production of the data and the time at which it is read out of the buffer because of the FIFO nature of the serial communication. One possibility is to read all the bits required for a trial as rapidly as possible at the beginning of the trial time-interval. After this point, data is discarded until the next trial time-interval begins. Another possibility is to decimate the input stream at an appropriate rate so that the bits for a trial are (approximately) evenly spaced throughout the trial time-interval. There is probably no need to implement both methods, but we should come to a consensus on which is preferred. We encourage an agreement to use the same sample size and sample rate at all Egg-sites. This should completely mask any differences in the underlying device types. Devices which are doing bit-collection would simply do this in addition as a parallel task, and that data would presumably flow through a different stream at its own rate. Thus, even in this case, the server will see the same number and size of summed-bit trials. This will make is possible to use QEEG type computations; these are made (when the source is a brain) from the output of electrodes of the same type placed at array of locations. A strict analogy is required in order to apply similar computations, and this requires uniform data. Once we have characterized the data in these terms, it can be encapsulated in an appropriate format for network transmission. Throughout we continue to discuss binary transmission and storage formats primarily for efficiency reasons. It is understood that it may be desirable to build analytical tools that rely on more "human-readable" forms of the data; this can be accommodated through simple binary-to-hex, or more likely binary-to-text conversion utilities, which can be provided as needed. DATA TRANSMISSION/NETWORKING
Regardless of the choices made above, a uniform packet format should be implemented. We propose transmitting the data in packets built out of a variable number of records. The current proposal is for each record to contain some timestamp and checksum information, and be essentially independent of the rest of the packet. The packetization would primarily be used to increase the efficiency of the communication protocol, although other ancillary information (current settings for encoding options, for example) should probably be contained in the packet. There are some other choices that impact the packet design, often by the fact that they may vary from Egg-site to Egg-site. Networking protocol TCP and UDP over IP are the only real choices. TCP offers some advantages including built in acknowledgement, segment size optimization, and streaming data organization. UDP has the advantages of being very low-overhead in both bandwidth and implementation, and of being packet-oriented. The protocol proposed below doesn't rely on any particular aspects of either protocol, so we intend to postpone this part of the implementation until the later part of the development cycle to allow for more feedback and discovery during the implementation process. Communication mode Some sites will be connected all the time ("permanent" connections, ignoring network problems) while others will need to disconnect between data transfers (which we call "dial-and-drop" connections). In the latter case, it must be the responsibility of the Egg to let the Basket know it is available for data transfer. There seems to be no good reason why the same procedure should not be used for permanent connections. Although the Basket could contact the Egg in these cases, it creates an unnecessary complexity. There is an analogy here to the "server-push" and "client-pull" capabilities in HTTP, which accomplish essentially the same purpose with slightly different performance results. (Note that the public web-site modes are not determined here. For updates of analytical displays, both modes may be available, at least for Netscape browsers.) Communication frequency The frequency with which an Egg-Basket session is initiated depends on several factors: the cost of such a session, the cost of the connect time, the availability of data from the acquisition layer, and the desire for an interactive (and reactive) central analysis and display. At a minimum, some sites with high connect costs may wish to call infrequently (perhaps once per hour or even less often) and transfer larger blocks of data. At a maximum, directly connected sites not being charged for bandwidth may wish to transfer the data essentially as soon as it becomes available. There is probably a maximum rate beyond which there is a greatly diminishing data/protocol ratio, so we might limit the updates to minimally include a full packet worth of data. Record size Each record might contain one or more than one trial. The number is directly related to the frequency of the timestamps versus the frequency of the trials. The record size should probably be fixed, or computed by a fixed algorithm like the one described below. Packet size
If the packet contains configuration information, its size will be greater than the sum of the records. Minimally it must contain at least one record, unless we add the complexity of two different kinds of packets, one with configuration and one with data. For efficiency, the size of the packet should be as large as possible while still fitting within the typical MTU of the net connections being used. The packet size could easily be different across different sites, if necessary. By creating our own protocol for this interaction, we can optimize the efficiency of operation and at the same time perhaps reduce design effort. Most standard protocols are either difficult to implement (FTP, HTTP), inefficient (GOPHER, SMTP), insecure (TFTP), or inappropriate (NFS) for this sort of communication. Writing our own protocol also allows us to leave open the choice of TCP versus UDP (even potentially at run-time, although implementing both seems unnecessary effort). It leaves the option open to enhance the security of the entire protocol, if the effort was deemed useful. ANALYTICAL TECHNIQUES Even if the devices operate in perfect synchronization, differences in local site costs (primarily the connection type) or communication difficulties may require an analysis to "go without" data from some set of Eggs at any given time. Therefore, we will also require some flexibility in the analytical mechanism. The matrix will often be incomplete at the instantaneous level, but more complete if the display is computed from older data. It is likely that there will often be "holes" in the data array. Some of the calculations (the "complete" calculation using all active sites) will probably be delayed for hours. This should be no problem with good analytical software design, but it reinforces the need for an accurate way of synchronizing the data which arrives long after-the-fact. There has been some discussion of applying QEEG calculations as one analytical technique. At this point we are unsure of the time-scale of the consciousness under study as compared to the time-scale of a human brain. In fact, there is of course much uncertainty about whether we're likely to measure a "coating of animal consciousness" or the consciousness of Gaia as an entity, which would likely operate on very different time scales. Further details of the analytical techniques remain to be developed. DATA PRESENTATION Most of the details of the data presentation remain to be developed. However, it has been agreed that it should be possible for an interested viewer to go back in time to review the data for specified past time periods in addition to being able to view the "present" state. PART 2: SUGGESTED SPECIFIC VALUES
PROPOSED PACKETS AND THROUGHPUT ANALYSIS Given these constraints, we propose the following data packet. Each packet would include: Egg ID (2 bytes) Option information (8-20 bytes, including sample/trial size, etc.) Up to 60 records containing Beginning of second timestamp (referenced to Jan 1, 1970 UTC) Up to 10 bytes of trial data (because <= 10 trials/sec) 16-bit CRC checksum This gives us a maximum packet size of less than 1000 bytes, which should work well with most network MTUs (and the number of records can be reduced to create a packet as small as 38 bytes, which still contains over 25% data). Limiting the number of records to 60 and using seconds as a time stamp means that we will need to transfer a maximum of one packet per minute. The first data record sent would be the one indicated by the last-data-received from the Basket value (this allows some cross-verification that's probably not necessary), and the final record sent would correspond to the most-recently-acquired data. An optional connect-time limit could be set to help hold down local costs (but it would have to be set high enough that all the data would eventually get through). We next consider the latencies and bandwidths of various connection types, and their implications for throughput of data. Here are some initial estimates: Conn type Conditions Latency (Msec) Bandwidth (bps) WWW routing 30-3000 (100 typ) high Analog modem new connect 20k 14.4k Analog modem est connect 300 14.4k ISDN modem new connect 5k 128k ISDN modem est connect 100 128k Direct Ethernet <5 6M Application turnaround 1000 A full communication scenario might look like the following. The Egg would dial the server, and tell it it was online. The server (with some application level latency) would provide the Egg with a packet that included both configuration information and last-record-received (timestamp) information. The Egg would then begin sending data packets. We can now get a fair estimate of the actual cost in telephone time using this scenario. We model an Egg taking data at 10 trials per second, and dialing in hourly via 14.4kbps modem. Counting in the various expected latency terms, and ignoring data errors, retries, and compression, we get the following rough estimates: Total records sent: 3600
Total data packets sent: 60 (982 bytes or 750msec each) Total ctrl packets sent: 2 Total ack packets sent: 62 Round trip packet+ack latency: 1550msec Dial time: 20 sec Control conversation: 5.13 sec Data transfer: 93.01 sec Total: 118.14 sec, 2835.27 sec/day (Making the same estimate based purely on bandwidth would be off by over a factor of four.) Generating the same numbers for other dial-in frequencies gives: once/minute 27 sec/call 640 min/day once/5 minutes 33 sec/call 158 min/day twice/hour 72 sec/call 57 min/day once/hour 118 sec/call 47 min/day four/day 397 sec/call 39 min/day once/day 2257 sec/call 37 min/day By dropping the sampling rate to one trial/sec, the hourly connection time goes down to less than one third: once/hour (1 trial/sec) 34 sec/call, 14 min/day These numbers should help give some idea of the costs likely to be experienced by the remote dial-and-drop sites. Clearly we need to be sensitive to any costs being imposed on the Egg-site hosts by "administrative changes"! BROADER NETWORKING AND PROTOCOL ISSUES As part of the networking process, all the Egg-sites must share a common clock. This is exactly the purpose of the standard NTP protocol, and its implementation on the expected platform (Linux) appears to be such that it compensates for both clock value and clock *drift*, so that the resulting uniformity of clock time is better than the typical one second resolution transferred in the protocol. We recommend using NTP, with the Baskets serving as second-tier servers from some other canonical source and with all of the Eggs periodically synchronizing themselves to the Basket time. There may be a need to designate one Basket as primary, in case of a discrepancy, however. I recommend the following relationship between the permanent and dial-and-drop scenarios. In the case of dial-and-drop, the Egg will send a packet to the Basket which simply serves to communicate, "I'm online now." At this point, the Basket will reply with a packet describing the options and indicating the last successfully received index value. Once this packet arrives at the Egg, the Egg begins sending data from this point until all its data has been sent. In the case of a permanent connection, the Egg can elect to send an "I'm online now" message at whatever interval it desires, and the protocol continues as above. The
server need never know the difference between the two types of Eggs. (However, if we prefer, the Basket can send the "last-received" packet whenever it wishes to collect data. It must then know not to ask dial-and-drop Eggs, or to expect no response from them.) In either case, the body of the protocol is identical at both ends, and only the initiation changes. The Basket should still have responsibility for monitoring the "aliveness" (fertility?) of particular Eggs, and notifying human administrators if a particular Egg seems to be down or partitioned off from the network. It is probably desirable to be able to set at least some of the options from the Basket, so that the administration of the Eggs does not require extensive involvement of the personnel at each Egg-site. However, we need to have some method of ensuring that the settings are authenticated. As a simple security mechanism, we can assume that the Eggs will only accept updates from the Basket they have contacted in the "I'm online now" phase, using known (fixed?) IP addresses, and assuming the security of the routing tables against corruption. This is essentially an IP "dial-back" approach. If this is inadequate, it is possible to implement something like a shared secret DES encryption scheme (ala CHAP) for authentication. This requires substantially greater sophistication, and may not be necessary. To help offset the impact of ever-more-frequent network partitions, I think it is important for each Egg to know about all the Baskets. Each Egg may even be configured to prefer a different Basket, on the assumption that communications within a continent are cheaper or at least more reliable than transcontinental ones. Thus, a Scandinavian Egg might report to a Dutch Basket, and a Californian Egg to an New Jerseyan Basket, and in the end only the Baskets would need to exchange information (presumably over higher bandwidth links) to get the whole picture. In the event of a trans-atlantic partition, each Basket would still receive data, and each Egg would still be able to report to its first choice of Basket. In the event of Dutch Basket down-time (if, say, it was being borrowed by the Easter Bunny) the Scandinavian Egg would then contact the NJ Basket directly, after noticing the missing Dutch Basket. LOCAL EGG ISSUES The Eggs generally should not need any display as Eggs. However, many of the Egg hosts (people) may want to know what is going on, and indeed it may be worthwhile to have at least a status display. It could be just a text report, with indicators of time on, amount of data reported, grand deviation (as a check whether all is well), etc. In general we should avoid any aspect that requires maintenance at the Egg-site. However, some Egg-site maintainers may be comfortable with extra features that are not appropriate for everyone. These features can be set locally at the Egg-site, with an appropriate interface that warns the maintainer of the extra burden being taken on. These features should also be made robust in the event of inattention.
For example, if the local sites wish to have a data backup, one possibility is a floppy disk. However, the amount of data generated at the maximum speed of ten trials/second would roughly fill one floppy disk per day. This puts a high maintenance burden on the Egg-site maintainer. In contrast, running at one trial/second would extend the life of a floppy to over ten days, which is probably a reasonable maintenance burden for most sites. At this rate, this sort of backup adds less than $20 per site per year of media costs. If the local site maintainer forgets to change disks, the system should recognize this and either (1) discard the data on the disk and start from scratch when the disk fills up (2) stop writing data and discard data until a new disk is available, or (3) stop writing data and queue further writes until a new disk is available. We would discourage non-data-acquisition uses of the Egg machine, such as the installation of a web browser, because it potentially increases the hardware requirements, competes for bandwidth with the required data transfers, and interacts in somewhat unpredictable ways with the dialup scheme. Although most of these are not concerns for permanently-connected Egg-sites, these sites are the most likely to already have browsing capabilities. Furthermore, keeping the hardware and software platforms uniform allows for easier "hot spare" replacement. LOCAL BASKET ISSUES Some sort of utilities (probably software on the Baskets, or perhaps even a private web area) needs to be built to help view the performance of the network rather than just the results. Things like a global view of connectivity, down-time ratios, and Egg type information would be useful. Using SNMP to some extent is certainly possible, but although I would like to encourage the usage of IETF standards as much as possible, it may be quicker to roll our own details for this capability. SOCIAL IMPLICATIONS If and when our Eggs hatch, we may need to open certain cans of worms to feed the hatchlings. One might divide the issues into those related to the project "not working" and those related to it "working," but of course, terms themselves need to be defined. For now we take it to mean that the system detects some sort of global consciousness structure. If it doesn't work, there will be a need to explain why its results are different from the preliminary studies like the Diana and Theresa work. What if it does work? It seems that discovering and being able to measure something like global cohesion is a huge breakthrough, and we should consider how to communicate the discovery properly. Is there also a moral significance to demonstrating the power of group-think? What do we do if we discover that the mechanism measure other things of significance? Jiri has alluded to the fact that it could equally well pick up on the consciousness of animals other than humans. If it notices solar eclipses, it certainly has the potential to notice other significant astronomical or geological events, or our reactions to them. One thought in particular, given that animals are often sensitive to
things that people miss, is that the system might detect phenomena such as earthquakes before they actually occur. This possibility alone, if it came true, would make the project extremely significant to humankind. ACKNOWLEDGEMENTS This document evolved in response to input from many individuals, some of whom must have psychically known what input was needed since they hadn't yet seen the document. In particular, Jiri Wackermann's comments on the layered protocol suggested a much better organization for the yet-unseen document, and his comments on mass-storage backups helped convince Roger of the importance of this issue. Dick Bierman's comments on synchronizing the processing using timestamps reinforced our own belief in the necessity of this process, and his discussions of Z-score versus bit-sum representations forced Greg to review his thought processes on this matter. Charles Overby reminded us that we need to keep connection costs firmly in mind. Further feedback not specifically mentioned or visible in the final document was still greatly appreciated, in many cases forcing us to clarify our own reasoning about the issues involved. PART 3: GLOSSARY client pull - technology which allows a browser to periodically request an update of the page currently being viewed. The responsibility for the update lies with the client rather than the server, and since data flows from server to client, this is referred to as the client "pulling" data from the server. HTML - Hyper-Text Mark-up Language, the basic language used for defining web pages. An SGML derivative. HTTP - Hyper-Text Transport Protocol, a protocol used over TCP/IP to transfer WWW documents, often HTML. ICMP - Internet Control Message Protocol, a protocol used over IP for link management - the basis of the "ping" program, for example IETF - Internet Engineering Task Force, responsible for Internet standards and protocols. IP - Internet Protocol, the basic underlying packet protocol of the Internet. This can be overlaid on various transmission media (such as Ethernet or PPP), and can contain various higher level protocols such as TCP, UDP, ICMP, etc. MTU - Maximum Transfer Unit, the size (in bytes) of the largest packet that a particular connection (such as a SLIP or Ethernet connection) can handle. Typical MTUs range from 200 bytes for slow serial connections up to approximately 1.5k for Ethernet. Any larger packets must be broken into pieces to cross the particular link. (I believe) TCP handles this automatically, while UDP just drops oversized packets in the bit bucket.
NTP - Network Time Protocol, a standard mechanism allowing computers to check in with central servers for date and time information. Machines are defined as being at the n-th level depending on their distance from one of a few primary reference machines connected directly to atomic clock sources. server push - technology which allows a server to continue updating the contents of the page currently being viewed even after the initial download is complete. The responsibility for the update lies with the server rather than the server, and since data flows from server to client, this is referred to as the server "pushing" data out to the client. FYI, this is one area in which Netscape's browser supports a capability not supported by Microsoft's. SGML - Standard Generalized Mark-up Language, a general extensible language for marking up textual documents. Most modern document languages, including HTML, RTF, LaTeX, and XML, can be described as derivatives of SGML. SNMP - Standard Network Management Protocol, a methodology developed (at CMU?) for remote management of IP network devices TCP - Transmission Control Protocol, a protocol used for reliable, long term data connections such as telnet or HTTP. TCP/IP refers specifically to TCP used over IP. UUCP - Unix-to-Unix Copy Program, a now mostly outdated protocol for transferring data between machines which only communicate infrequently. Probably inappropriate for our protocols. UDP - User Datagram Protocol, (aka Unreliable Datagram Protocol), a protocol used for quick, connectionless transfers of information between machines. SNMP, NFS, talk, rwho, routed, and a variety of other protocols use UDP for many if not all of their communications.

Original PlanA General Description and Proposal

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Original PlanA General Description and Proposal

Uploaded by

Copyright:

Available Formats

Original Plan A General Description and Proposal Prepared April 30, 1998 Last edited July 26, 1998

You might also like