Professional Documents
Culture Documents
Asmaa H.Rashid 1 Arab Academy for Science, Technology Collage of computing and information technology Sheraton Heliopolis - Cairo - Egypt Rashid.asmaa@yahoo.com Prof.dr . Abd-Fatth Hegazy 2 Arab Academy for Science, Technology Collage of computing and information technology Sheraton Heliopolis - Cairo - Egypt abdheg@yahoo.com
Abstract:
While there is an increasing need to share medical information for public health research, such data sharing must preserve patient privacy without disclosing any information that can be used to identify a patient. A considerable amount of research in data privacy community has been devoted to formalizing the notion of identifiability and developing techniques for anonymization but are focused exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in medical informatics community rely on simple identifier removal or grouping techniques without taking advantage of the research developments in the data privacy community. This paper attempts to fill the above gaps and presents a framework and prototype system for de-identifying health information including both structured and unstructured data. We empirically study a simple Bayesian classifier, a Bayesian classifier with a sampling based technique, and a conditional random field based classifier for extracting identifying attributes from unstructured data. We deploy a kanonymization based technique for de-identifying the extracted data to preserve maximum data utility. We present a set of preliminary evaluations showing the effectiveness of our approach. Keywords: Anonymization - Medical text - Named entity recognition - Conditional random fieldsCostproportionate sampling - Data linkage . and organizations are increasingly recognizing the critical value of sharing such a wealth of information. However, individually identifiable information is protected under the Health Insurance Portability and Accountability Act (HIPAA).1
1.
Introduction
Current information technology enables many organizations to collect, store, and use various types of information about individuals. The government
1 Health
Insurance Portability and Accountability Act (HIPAA). http://www.hhs.gov/ocr/hipaa/. State law or institutional policy may differ from the HIPAA standard and should be considered as well.
2 Shared
2. Existing and potential solutions. Currently, investigators or institutions wishing to use medical records for research purposes have three options: obtain permission from the patients, obtain a waiver of informed consent from their Institutional Review Boards (IRB), or use a data set that has had all or most of the identifiers removed. The last option can be generalized into the problem of deidentification or anonymization (both deidentification and anonymization are used interchangeably throughout this paper) where a data custodian distributes an anonymized view of the data that does not contain individually identifiable information to a (data recipient). It provides a scalable way for sharing medical information in large scale environments while preserving privacy of patients. At the first glance, the general problem of data anonymization has been extensively studied in recent years in the data privacy community [1]. The seminal work by Sweeney et al. shows that a dataset that simply has identifiers removed is subject to linking attacks [2]. Since then, a large body of work contributes to data anonymization that transforms a dataset to meet a privacy principle such as kanonymity using techniques such as generalization, suppression (removal), permutation and swapping of certain data values so that it does not contain individually identifiable information, such as [3,4,5,1,6,7,8,9,10,11]. While the research on data anonymization has made great progress, its practical utilization in medical fields lags behind. An overarching complexity of medical data, but often overlooked in data privacy research, is data heterogeneity. A considerable amount of medical data resides in unstructured text forms such as clinical notes, radiology and pathology reports, and discharge summaries. While some identifying attributes can be clearly defined in structured data, an extensive set of identifying information is often hidden or have multiple and different references in the text. Unfortunately, the bulk of data privacy research focus exclusively on structured data. On the other hand, efforts on de-identifying medical text documents in medical informatics community [12,13,14,15,16,17,18,19] are mostly specialized for specific document types or a subset of HIPAA identifiers. Most importantly, they rely on simple identifier removal techniques without taking advantage of the research developments from data privacy community that guarantee a more formalized notion of privacy while maximizing data utility.
1.3. Contributions.
Our work attempts to fill the above gaps and bridge the data privacy community and medical informatics community by developing a framework and prototype system, HIDE, for Health Information DEidentification of both structured and unstructured data. The contributions of our work are two fold. First, our system advances the medical informatics field by adopting information extraction (also referred to as attribute extraction) and data anonymization techniques for de-identifying heterogeneous health information. Second, the conceptual framework of our system advances the data privacy field by integrating the anonymization process for both structured and unstructured data. The specific components and contributions of our system are as follows: Identifying and sensitive information extraction. We leverage and empirically study existing named entity extraction techniques [20,21], in particular, simple Bayesian classifier and sampling based techniques, and conditional random fieldsbased techniques to effectively extract identifying and sensitive information from unstructured data. Data linking. In order to preserve privacy for individuals and apply advanced anonymization techniques in the heterogeneous data space, we propose a structured identifier view with identifying attributes linked to each individual.
Anonymization. We perform data suppression and generalization on the identifier view to anonymize the data with different options including full deidentification, partial de- identification, and statistical anonymization based on kanonymization. While we utilize off-the-shelf techniques for some of these components, the main contribution of our system is that it bridges the research on data privacy and text management and provides an integrated framework that allows the anonymization of heterogeneous data for practical applications.
of QI attributes contained in the microdata. Besides its theoretical guarantee, the proposed algorithm also worked fairly well in practice, and considerably outperformed the state of the art in several aspects.
4- k-Anonymization Approach.
Organizations (such as the Census Bureau or hospitals) collect large amounts of personal information. This data has high value for the public, for example, to study social trends or to find cures for diseases. However, careless publication of such data poses a danger to the privacy of the individuals who contributed data. There has been much research over the last decades on methods for limiting disclosure in data publishing; in particular, the computer science community has made important contributions over the last ten years. The research in this area investigated various adversary models and proposed different anonymization techniques that provide rigorous guarantees against attacks. However, to the best of our knowledge, none of these techniques has so far
been implemented as part of a usable tool1. This is mainly due to the non-interactive nature of these techniques: The only interface they provide to data publishers is a set of parameters that controls the degree of privacy protection to be enforced in the anonymized data. The publishers, however seldom have enough knowledge to decide appropriate values for the parameters; setting these values requires not only a deep understanding of the underlying privacy model but also a thorough understanding of possible adversaries. Furthermore, even if the data publisher had such knowledge, she much prefers a interactive anonymization process instead of fixing the algorithm and its parameters before seeing the anonymized output data. The data publisher will select the final anonymized version of the data only after she has explored the space of anonymization parameters and adversary models. Existing anonymization techniques have not been put into such a progressive, user-centric anonymization process. In this demonstration, we bring the theory of data anonymization to practice. We developed CAT, the Cornell Anonymization Toolkit, that not only incorporates the state-of-the-art formal privacy protection methods, but also provides an intuitive interface that can interactively guide users through the data publishing workflow. CAT was designed with two objectives in mind. First, the toolkit should help users to acquire an intuitive understanding of the disclosure risk in the anonymized data, so that they can make educated decisions on releasing appropriate data. Second, the toolkit should offer the users full control of the anonymization process, allowing them to adjust various parameters and to examine the quality of the anonymized data (in terms of both privacy and utility) in a convenient manner. To the best of our knowledge, this is the first effort that employs existing anonymization techniques to provide a practical tool for data publication.[22,23,24,25] .
distinguished from at least k-1 individuals whose information also appears in the release. This paper also examines re-identification attacks that can be realized on releases that adhere to k-anonymity unless accompanying policies are respected. The kanonymity protection model is important because it forms the basis on which the real-world systems known as Datafly, m-Argus and k-Similar provide guarantees of privacy protection. In today's information society, given the unprecedented ease of finding and accessing information, protection of privacy has become a very important concern. In particular, large databases that include sensitive information (e.g., health information) have often been available to public access, frequently with identifiers stripped of in an attempt to protect privacy. However, if such information can be associated with the corresponding people's identifiers, perhaps using other publicly available databases, then privacy can be seriously violated. For example, Sweeney [36] pointed out that one can find out who has what disease using a public database and voter lists. To solve such problems, Samarati and Sweeney [37] have proposed a technique called k-anonymization. In this paper, we study how to enhance privacy in carrying out the process of k-anonymization. Consider a table that provides health information of patients for medical studies, as shown in Table 1. Each row of the table consists of a patient's date of birth, zip code, allergy, and history of illness. Although the identifier of each patient does not explicitly appear in this table, a dedicated adversary may be able to derive the identifiers of some patients using the combinations of date of birth and zip code. For example, he may be able to find that his roommate is the patient of the first row, who has allergy to penicillin and a history of pharyngitis.
attributes. (There may be other attributes in a table besides the quasi-identifier attributes and the sensitive attributes; we ignore them in this paper since they are not relevant to our investigation.) The privacy threat we consider here is that an adversary may be able to link the sensitive attributes of some rows to the corresponding identifiers using the information provided in the quasi-identifiers Aproposed strategy to solve this problem is to make the tablek-anonymous [40].
In this example, the set of attributes fdate of birth, zip codeg is called a quasi-identifier [38, 39], because these attributes in combination can be used to identify an individual with a significant probability. In this paper, we say an attribute is a quasi-identifier attribute if it is in the quasi-identifier .The attributes like allergy and history of illness are called sensitive
In a k-anonymous table, each value of the quasiidentifier appears at least k times. Therefore, if the adversary only usesthe quasi-identifiers to link sensitive attributes to the indemnifiers, then each involved entity (patient in our example) I s hidden" in at least k peers. The procedure of making a table kanonymous is calld k-anonymization. It can be achieved by suppression (i.e., replacing some entries with \") or generalization (e.g., replacing some or all occurrences of \07028"and \07029" with \0702"). Table 2 shows the result of 2- anonymization on Table 1. Several algorithmic methods have been proposed describing how a central authority can kanonymize a table before it is released to the public. During their research, we consider a related but diffierent scenario: distributed customers holding their own data interact with a miner and use kanonymization in this process to protect their own privacy. For example, imagine the above mentioned health data are collected from customers by a medical researcher. The customers will feel comfortable if the medical researcher does not need to be trusted and only sees a k-anonymized version of their data. To solve this problem, we show methods by which kanonymization can be jointly performed by the involved parties in a private manner such that no single participant, including the miner, learns extra information that could be used to link sensitive attributes to corresponding identifiers.
has high value for the public, for example, to study social trends or to find cures for diseases. However, careless publication of such data poses a danger to the privacy of the individuals who contributed data. There has been much research over the last decades on methods for limiting disclosure in data publishing; in particular, the computer science community has made important contributions over the last ten years. The research in this area investigated various adversary models and proposed different anonymization techniques that provide rigorous guarantees against attacks. However, to the best of our knowledge, none of these techniques has so far been implemented as part of a usable tool 1. This is mainly due to the non-interactive nature of these techniques: The only interface they provide to data publishers is a set of parameters that controls the degree of privacy protection to be enforced in the anonymized data. The publishers, however seldom have enough knowledge to decide appropriate values for the parameters; setting these values requires not only a deep understanding of the underlying privacy model but also a thorough understanding of possible adversaries. Furthermore, even if the data publisher had such knowledge, she much prefers a interactive anonymization process instead of fixing the algorithm and its parameters before seeing the anonymized output data. The data publisher will select the final anonymized version of the data only after she has explored the space of anonymization parameters and adversary models. Existing anonymization techniques have not been put into such a progressive, user-centric anonymization process. In this demonstration, we bring the theory of data anonymization to practice. We developed CAT, the Cornell Anonymization Toolkit, that not only incorporates the state-of-the-art formal privacy protection methods, but also provides an intuitive interface that can interactively guide users through the data publishing workflow. CAT was designed with two objectives in mind. First, the toolkit should help users to acquire an intuitive understanding of the disclosure risk in the anonymized data, so that they can make educated decisions on releasing appropriate data. Second, the toolkit should offer the users full control of the anonymization process, allowing them to adjust various parameters and to examine the quality of the anonymized data (in terms of both privacy and utility) in a convenient manner. To the best of our knowledge, this is the first effort that employs existing anonymization techniques to provide a practical tool for data publication[25].
use
anonymization
Organizations (such as the Census Bureau or hospitals) collect large amounts of personal information. This data
Data Storage
3. Table: contingency table In the steps that follow, we will evaluate the quality of this anonymized table in terms of both privacy and utility. In case the table is unsatisfactory, we can refine it by adjusting the values of l and c .
the dataset to be anonymized is kept in main memory, and all algorithms run against this main-memory resident data. In addition to the anonymizer, we have a risk analyzer for evaluating the disclosure risks of records in anonymized data, based on user-specified assumptions about the adversarys background knowledge that can be specified through the user interface. Following the l-diversity model, we consider that the adversary may have information of the nonsensitive attributes of every individual, as well as several pieces of additional knowledge about the sensitive attributes. Each of these pieces of knowledge is modeled as a negated atom, i.e., a statement declaring that an individual is not associated with a certain sensitive attribute, such as Alice does not have diabetes or Bob does not have cancer. We quantify the disclosure risk of an individual as the adversarys posterior probability of inferring the correct values of the sensitive attributes of the individual, after combing the anonymized data with the background knowledge.
To get an understanding of the utility of an anonymization, we first click the Contingency Tables tab in the lowerleft panel, to compare the contingency tables that correspond to the original and anonymized data, respectively. Specifically, a contingency table is a table that shows the frequencies for combinations of two attributes. For example, Table 1 illustrates a contingency table of gender and marital status. Intuitively, contingency tables show correlations between pairs of attributes. By examining the changes in the contingency tables before and after anonymization, we can get an idea of how the anonymization affects the characteristics of the data beyond looking at individual attributes. The two combo boxes in the top of the lower-left panel enable us to specify the two dimensions of the contingency tables. After that, we will click on the Density Graphs tab, and the system will depict two density graphs that correspond to the contingency table, as shown in Figure 3. This provides us a more intuitive way to evaluate the differences between the original and anonymized data. In general, the more similar the graphs are, the more useful information is retained in the anonymized table.
can use the slider in the bottom of the panel to define the number of negated atoms that the adversary may have about the sensitive attribute. Once the background knowledge of the adversary is decided, we click theEvaluate Riskbutton, which will trigger an update in the upper-left panel. The system first calculates the disclosure risk of every record in the dataset based on the background knowledge. Thus makes the risks of the tuples available. For example, in Figure 3, the first tuple has a 4% disclosure risk, which means that, an adversary with the specified background knowledge would have 4% confidence to infer the income of the individual corresponding to the first tuple. In addition, the system also plots a histogram on the upper right panel that illustrates the distribution of the disclosure risks of all individuals in the dataset. For the case in Figure 3, the histogram shows that the adversary has less than 20% confidence to infer the incomes of most individuals. After inspecting the disclosure risks of the tuples, we have an intuitive understanding of the amount of privacy that is guaranteed by the anonymized table. In case both the privacy guarantee and the utility of the table are deemed sufficient, we request the system to output the table. Otherwise, we can move on to the next step.
be restored whenever necessary. We can now return to Step 1 and re-adjust the parameters in the middle-right panel to generate a new anonymized table. We apply this process iteratively until we obtain a satisfactory anonymization.
and my objective in thos paper Protect and enhance privacy at medical data using k-anonymzation model Participation of medical information in order to conclude the knowledge and making the right decisions in the cases of living patients and get the best results Through cooperation and sharing data and maintaining the privacy of patients and comparison between best privacy model kanonymity model and -diversity .
References :
1: B.C.M. Fung, K. Wang, R. Chen, P.S. Yu, Privacypreserving data publishing: a survey on recent developments, ACM Computing Surveys, 2010.
2: L. Sweeney, k-Anonymity: a model for protecting privacy, International Journal on Uncertainty Fuzziness, and Knowledge-based Systems 10 (5) (2002) 3: V.S. Iyengar, Transforming data to satisfy privacy constraints, in: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 279288. 4: K. Wang, P.S. Yu, S. Chakraborty, Bottom-up generalization: a data mining solution to privacy protection, in: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), November 2004. 5:B.C.M. Fung, K. Wang, P.S. Yu, Top-down specialization for information and privacy preservation, in: Proceedings of the 21st IEEE International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, 2005, pp. 205216. 6: I. Bhattacharya, L. Getoor, Iterative record linkage for cleaning and integration, in: DMKD04: Proceedings of the 9th ACM SIGMOD Workshop on Research issues in Data Mining and Knowledge Discovery, 2004 7: S. Zhong, Z. Yang, R.N. Wright, Privacy-enhancing kanonymization of customer data, in: Proceedings of the Twenty-fourth ACM SIGACTSIGMODSIGART Symposium on Principles of Database Systems, 2005. 8: K. LeFevre, D. Dewitt, R. Ramakrishnan, Incognito: efficient full-domain k-anonymity, in: ACM SIGMOD International Conference on Management of Data, 2005. 9: K. LeFevre, D. DeWitt, R. Ramakrishnan, Mondrian multidimensional k-anonymity, in: Proceedings of the 22nd International Conference on Data Engineering, ICDE 2006, 2006. 10: X. Xiao, Y. Tao, Anatomy: simple and effective privacy preservation, in: Thirrty-second International Conference on Very Large Databases (VLDB), 2006, pp. 139150. 11: Q. Zhang, N. Koudas, D. Srivastava, T. Yu, Aggregate query answering on anonymized tables, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, 2007, pp. 116125. 12: L. Sweeney, Replacing personally-identifying information in medical the records scrub system, Journal of the American Informatics Association (1996) 333337 13: L. Sweeney, Guaranteeing anonymity when sharing medical data, the datafly system, in: Proceedings of AMIA Annual Fall Symposium, 1997 14: S.M. Thomas, B. Mamlin, G.S. Adn, C. McDonald, A successful technique for removing names in pathology reports, in: Proceedings of AMIA Symposium, 2002, pp. 777781 15: R.K. Taira, A.A. Bui, H. Kangarloo, Identification of patient name references within medical documents using
semantic selectional restrictions, in: Proceedings of AMIA Symposium, 2002, pp. 757761. 16: D. Gupta, M. Saul, J. Gilbertson, Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research, American Journal of Clinical Pathology (2004) 76186. 17: T. Sibanda, O. Uzuner, Role of local context in deidentification of ungrammatical fragmented text, in: North American Chapter of Association for Computational Linguistics/Human Language Technology, 2006. 18: R.M.B.A. Beckwith, U.J. Balis, F. Kuo, Development and evaluation of an open source software tool for deidentification of pathology reports, BMC Medical Informatics and Decision Making 6 (12) (2006). 19: O. Uzuner, Y. Luo, P. Szolovits, Evaluating the stateof-the-art in automatic de-identification, Journal of the American Medical Informatics Association 14 (5) (2007). 20: C. Manning, H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. 21: D. Nadeau, S. Sekine, A survey of named entity recognition and classification, Linguisticae Investigationes 30 (7) (2007). 22: K. LeFevre, D. J. DeWitt, and R. Ramakrishnan. Incognito: Efficient full-domain k-anonymity. In SIGMOD, pages 4960, 2005. 23: A. Machanavajjhala, J. Gehrke, D. Kifer, and M. Venkitasubramaniam. l-diversity: Privacy beyond k-anonymity. TKDD, 1(1), 2007 24: A. Machanavajjhala, D. Kifer, J. M. Abowd, J. Gehrke, and L. Vilhuber. Privacy: Theory meets practice on the map. In ICDE, pages 277286, 2008 25: P. Samarati. Protecting respondents identities in microdata release. TKDE, 13(6):10101027, 2001. 26: X. Xiao and Y. Tao. Anatomy: Simple and eective privacy preservation. In VLDB, pages 139{150, 2006. 27: X. Xiao and Y. Tao. Dynamic anonymization: Accurate statistical analysis with privacy preservation. In SIGMOD, pages 107{120, 2008. 28: X. Xiao, Y. Tao, and N. Koudas. Title suppressed due to double blind review requirements. Submitted to TODS. 29: X. Xiao, K. Yi, and Y. Tao. The hardness and approximation algorithms for l-diversity. Submitted to VLDB Journal. 30: X. Xiao and Y. Tao. Personalized privacy preservation. In SIGMOD, pages 229{240, 2006.
31: X. Xiao and Y. Tao. m-invariance: Towards privacy preserving re-publication of dynamic datasets. In SIGMOD, pages 689{700, 2007} 32: X. Xiao, K. Yi, and Y. Tao. The hardness and approximation algorithms for l-diversity. 33: X. Xiao, Y. Tao, and N. Koudas. Title suppressed due to double blind review requirements. Submitted to TODS. 34: X. Xiao and Y. Tao. Anatomy: Simple and effective privacy preservation. In VLDB, pages 139{150, 2006} 35: X. Xiao and Y. Tao. Dynamic anonymization: Accurate statistical analysis with privacy preservation. In SIGMOD, pages 107{120, 2008} 36: L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557{570, 2002} 37: P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, page 188. ACM Press, 1998. 38: T. Dalenius. Finding a needle in a haystack c or identifying anonymous census record. Journal of Offcial Statistics, 2(3):329{336, 1986} 39: L. Sweeney. k-anonymity: a model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557{570, 2002} 40: P. Samarati and L. Sweeney. Generalizing data to provide anonymity when disclosing information (abstract). In Proc. of the 17th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, page 188. ACM Press, 1998. 41: L. Sweeney, k-Anonymity: a model for protecting privacy, International Journal on Uncertainty Fuzziness, and Knowledge-based Systems 10 (5) (2002). 42: A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, L-diversity: privacy beyond kanonymity, in: Proceedings of the 22nd InternationalConference on Data Engineering (ICDE06), 2006, pp. 24. 43: T.M. Truta, B. Vinay, Privacy protection: p-sensitive kanonymity property, in: Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 2006, pp. 94. 44: N. Li, T. Li, T-closeness: privacy beyond k-anonymity and l-diversity, in: Proceedings of the 23rd International Conference on Data Engineering, ICDE 2007, 2007 .
45: X. Xiao, Y. Tao, M-invariance: towards privacy preserving re-publication of dynamic datasets, in: SIGMOD Conference, 2007, pp. 689700. 46: S.M. Thomas, B. Mamlin, G.S. Adn, C. McDonald, A successful technique for removing names in pathology reports, in: Proceedings of AMIA Symposium,2002, pp. 777781. 47: D. Gupta, M. Saul, J. Gilbertson, Evaluation of a deidentification (de-id) software engine to share pathology reports and clinical documents for research, American Journal of Clinical Pathology (2004) 76186. 48: R.M.B.A. Beckwith, U.J. Balis, F. Kuo, Development and evaluation of an open source software tool for deidentification of pathology reports, BMC Medical Informatics and Decision Making 6 (12) (2006). 49: R.K. Taira, A.A. Bui, H. Kangarloo, Identification of patient name references within medical documents using semantic selectional restrictions, in: Proceedings of AMIA Symposium, 2002, pp. 757761. 50: S.M. Thomas, B. Mamlin, G.S. Adn, C. McDonald, A successful technique for removing names in pathology reports, in: Proceedings of AMIA Symposium,2002, pp. 777781. 51: T. Sibanda, O. Uzuner, Role of local context in deidentification of ungrammatical fragmented text, in: North American Chapter of Association for Computational Linguistics/Human Language Technology, 2006. 52: C. Manning, H. Schutze, Foundations of Statistical Natural Language Processing, MIT Press, 1999. 53: D. Nadeau, S. Sekine, A survey of named entity recognition and classification, Linguisticae Investigationes 30 (7) (2007). 54: R.M.B.A. Beckwith, U.J. Balis, F. Kuo, Development and evaluation of an open source software tool for deidentification of pathology reports, BMC Medical Informatics and Decision Making 6 (12) (2006). 55: A. Meyerson and R. Williams. On the complexity of optimal k-anonymity. In Proc. 22nd ACM SIGMODSIGACT-SIGART Symposium on Principles of Database Systems, Paris, France, June 2004. 56: L. Sweeney. Guaranteeing anonymity when sharing medical data, the datay system. In Proc. of Journal of the American Medical Informatics Association, 1997 57: DWORK, C.. Differential privacy. In Proc. of the 33rd International Colloquium on Automata, Languages and Programming (ICALP). Venice, Italy, 112, 2006.