SEQUENT CALCULUS: A LOGIC AND A LANGUAGE FOR COMPUTATION AND DUALITY by PAUL DOWNEN A DISSERTATION Presented to the Department of Computer and Information Science and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy June 2017 DISSERTATION APPROVAL PAGE Student: Paul Downen Title: Sequent Calculus: A Logic and a Language for Computation and Duality This dissertation has been accepted and approved in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Computer and Information Science by: Zena M. Ariola Chairperson Michal Young Core Member Boyana Norris Core Member Mark Lonergan Institutional Representative and Scott L. Pratt Dean of the Graduate School Original approval signatures are on file with the University of Oregon Graduate School. Degree awarded June 2017 ii c© 2017 Paul Downen iii DISSERTATION ABSTRACT Paul Downen Doctor of Philosophy Department of Computer and Information Science June 2017 Title: Sequent Calculus: A Logic and a Language for Computation and Duality Truth and falsehood, questions and answers, construction and deconstruction; most things come in dual pairs. Duality is a mirror that reveals the new from the old via opposition. This idea appears pervasively in logic, where duality inverts “true” with “false” and “and” with “or.” However, even though programming languages are closely connected to logics, this kind of strong duality is not so apparent in practice. Sum types (disjoint tagged unions) and product types (structures) are dual concepts, but in the realm of programming, natural biases obscure their duality. To better understand the role of duality in programming, we shift our perspective. Our approach is based on the Curry-Howard isomorphism which says that programs following a specification are the same as proofs for mathematical theorems. This thesis explores Gentzen’s sequent calculus, a logic steeped in duality, as a model for computational duality. By applying the Curry-Howard isomorphism to the sequent calculus, we get a language that combines dual programming concepts as equal opposites: data types found in functional languages are dual to co-data types (interface- based objects) found in object-oriented languages, control flow is dual to information flow, induction is dual to co-induction. This gives a duality-based semantics for iv reasoning about programs via orthogonality: checking safety and correctness based on a comprehensive test suite. We use the language of the sequent calculus to apply ideas from logic to issues relevant to program compilation. The idea of logical polarity reveals a symmetric basis of primitive programming constructs that can faithfully represent all user-defined data and co-data types. We reflect the lessons learned back into a core language for functional languages, at the cost of symmetry, with the relationship between the sequent calculus and natural deduction. This relationship lets us derive a pure λ- calculus with user-defined data and co-data which we further extend by bringing out the implicit control-flow in functional programs. Explicit control-flow lets us share and name control the same way we share and name data, enabling a direct representation of join points, which are essential for tractable optimization and compilation. This dissertation includes previously published co-authored material. v CURRICULUM VITAE NAME OF AUTHOR: Paul Downen GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon, Eugene, OR Lawrence Technological University, Southfield, MI DEGREES AWARDED: Doctor of Philosophy in Computer Science, 2017, University of Oregon Bachelor of Science in Mathematics, 2010, Lawrence Technological University Bachelor of Science in Computer Science, 2010, Lawrence Technological University Bachelor of Science in Computer Engineering, 2010, Lawrence Technological University AREAS OF SPECIAL INTEREST: Programming Language Theory Type Theory Compilers PROFESSIONAL EXPERIENCE: Graduate Teaching Fellow, University of Oregon, Eugene, Oregon, September 2010 – June 2011 Graduate Research Fellow, University of Oregon, Eugene, Oregon, June 2011 – Present Research Intern, Université Paris Diderot, INRIA, PPS, Paris, France, June 2011 – August 2011 Visiting Researcher, Université Paris Diderot, INRIA, PPS, Paris, France, November 2012 – June 2013 Visiting Researcher, Microsoft Research, Cambridge, UK, July 2015 – August 2015 vi GRANTS, AWARDS AND HONORS: Oregon Doctoral Research Fellowship, University of Oregon Computer and Information Science Department, 2017 Oregon Doctoral Research Fellowship Nomination, University of Oregon Computer and Information Science Department, 2016 Upsilon Pi Epsilon Honors Society, Inducted by University of Oregon Computer and Information Science Department, 2015 Gurdeep Pall Graduate Student Fellowship, University of Oregon, 2015 Erwin & Gertrude Juilfs Scholarship in Computer and Information Science, University of Oregon, 2014 Erwin & Gertrude Juilfs Scholarship in Computer and Information Science, University of Oregon, 2012 Best GTF Award, University of Oregon Computer and Information Science Department, 2011 PUBLICATIONS: Maurer, Luke, Downen, Paul, Ariola, Zena M., & Peyton Jones, Simon. (2017). Compiling without continuations. Pages 482–494 of: Proceedings of the 38th ACM SIGPLAN conference on programming language design and implementation. PLDI ’17. New York, NY, USA: ACM. Distinguished Paper Award. Johnson-Freyd, Philip, Downen, Paul, & Ariola, Zena M. (2017). Call-by-name extensionality and confluence. Journal of functional programming, 27, e12. Downen, Paul, Maurer, Luke, Ariola, Zena M., & Peyton Jones, Simon. (2016). Sequent calculus as a compiler intermediate language. Pages 74–88 of: Proceedings of the 21st ACM SIGPLAN international conference on functional programming. ICFP ’16. New York, NY, USA: ACM. Johnson-Freyd, Philip, Downen, Paul, & Ariola, Zena M. (2016). First class call stacks: Exploring head reduction. Proceedings of the workshop on continuations, WoC 2016, London, UK, April 12th 2015. EPTCS, vol. 212. vii Downen, Paul, Johnson-Freyd, Philip, & Ariola, Zena M. (2015). Structures for structural recursion. Pages 127–139 of: Proceedings of the 20th ACM SIGPLAN international conference on functional programming. ICFP ’15. New York, NY, USA: ACM. Downen, Paul, & Ariola, Zena M. (2014a). Compositional semantics for composable continuations: From abortive to delimited control. Pages 109– 122 of: Proceedings of the 19th ACM SIGPLAN international conference on functional programming. ICFP ’14. New York, NY, USA: ACM. Downen, Paul, & Ariola, Zena M. (2014b). Delimited control and computational effects. Journal of functional programming, 24, 1–55. Downen, Paul, & Ariola, Zena M. (2014c). The duality of construction. Pages 249– 269 of: Shao, Zhong (ed), Programming languages and systems: 23rd European symposium on programming, ESOP 2014, held as part of the European joint conferences on theory and practice of software, ETAPS 2014. Lecture Notes in Computer Science, vol. 8410. Springer Berlin Heidelberg. Downen, Paul, Maurer, Luke, Ariola, Zena M., & Varacca, Daniele. (2014). Continuations, processes, and sharing. Pages 69–80 of: Proceedings of the 16th international symposium on principles and practice of declarative programming. PPDP ’14. New York, NY, USA: ACM. Ariola, Zena M., Downen, Paul, Herbelin, Hugo, Nakata, Keiko, & Saurin, Alexis. (2012). Classical call-by-need sequent calculi: The unity of semantic artifacts. Pages 32–46 of: Schrijvers, Tom, & Thiemann, Peter (eds), Functional and logic programming: 11th international symposium. Lecture Notes in Computer Science, vol. 7294. Berlin, Heidelberg: Springer Berlin Heidelberg. Downen, Paul, & Ariola, Zena M. (2012). A systematic approach to delimited control with multiple prompts. Pages 234–253 of: Seidl, Helmut (ed), Programming languages and systems: 21st European symposium on programming, ESOP 2012, held as part of the European joint conferences on theory and practice of software, ETAPS 2012. Lecture Notes in Computer Science, vol. 7211. Springer Berlin Heidelberg. Best Paper Award Nominee. viii ACKNOWLEDGEMENTS First of all, I would like to thank those that have supported me financially during my studies. To the National Science Foundation who supported me through the grants 0917329 “A Foundation for Effects” and 1423617 “SEQUBE: A Sequent Calculus Foundation for High- Level and Intermediate Programming Languages.” And to the University of Oregon and donors Gurdeep Pall and John Juilfs who supported me through fellowships and scholarships awarded by the university and Computer and Information Science department. I can’t thank my advisor Zena Ariola enough for the countless hours she has dedicated to mentoring me. I could not have hoped for a more attentive and encouraging advisor, and our frequent discussions and collaborations throughout my time at the University of Oregon has shaped this thesis in so many ways, both big and small. I would also like to thank other professors at the University of Oregon—Michal Young, Daniel Lowd, Boyana Norris, Hank Childs, Kathleen Freeman and others—who have given their time to advise me in matters outside research during my time here. I would like to thank my office mates and co-members of the Oregon programming languages group, Luke Maurer and Philip Johnson-Freyd, for our collaborations and making my years at Oregon all the better. Through our joint work, they have both had their direct influence on this thesis, pulling it in different directions that I would not have gone on my own, and helped paint a fuller and brighter picture through their own work. I am grateful for the gracious hosts that have welcomed me during my studies. To Alexis Saurin, Hugo Herbelin, and Pierre-Louis Curien at INRIA, I would like to thank them for inviting me to Paris and expanding my horizons. Much of what I learned about logic I owe to them, which has shaped this thesis. To Simon Peyton ix Jones, Andrew Kennedy, and others at Microsoft Research, I would like to thank them for hosting Luke Maurer and I at Cambridge while we worked on applying our ideas to the Glasgow Haskell Compiler (GHC). Simon Peyton Jones’ invaluable advice and positive attitude helped us turn the theories into real benefits for Haskell programmers. And to Iavor Diatchki and others at Galois, I would like to thank them for graciously lending their time to help kick off the work on GHC. There have been many visitors to the University of Oregon while I was a student which enriched my experience here: Olivier Danvy, Keiko Nakata, Alexis Saurin, Pierre- Louis Curien, Marco Gaboardi, Kenichi Asai, and Jacob Johannsen. To each, I would like to say thank you for taking the time to share your ideas to me and others here at Oregon. And finally, I would like to thank my family. To my parents Dale and Terry Downen for giving their support while I moved across the country to pursue these studies. And to my husband Chris Hoffman for his tremendous patience, support, and encouragement that made this thesis possible. x TABLE OF CONTENTS Chapter Page I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II. Natural Deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Gentzen’s NJ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 The λ-Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Proofs as Programs . . . . . . . . . . . . . . . . . . . . . . . . . . 38 A Critical Look at the λ-Calculus . . . . . . . . . . . . . . . . . . 40 III. Sequent Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Gentzen’s LK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 The Core Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 The Dual Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 IV. Polarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Additive and Multiplicative LK . . . . . . . . . . . . . . . . . . . . 102 Pattern Matching and Extensionality . . . . . . . . . . . . . . . . . 108 xi Chapter Page Polarizing the Fundamental Dilemma . . . . . . . . . . . . . . . . 115 Focusing and Polarity . . . . . . . . . . . . . . . . . . . . . . . . . 124 Self-Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 V. Data and Co-Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 The Essence of Evaluation: Substitutability . . . . . . . . . . . . . 137 The Essence of Connectives: Data and Co-Data . . . . . . . . . . . 144 Evaluating Data and Co-Data . . . . . . . . . . . . . . . . . . . . . 152 Combining Strategies in Connectives . . . . . . . . . . . . . . . . . 166 Combining Strategies in Evaluation . . . . . . . . . . . . . . . . . 174 Duality of Connectives and Evaluation . . . . . . . . . . . . . . . . 181 A (De-)Construction of the Dual Calculi . . . . . . . . . . . . . . . 189 VI. Induction and Co-Induction . . . . . . . . . . . . . . . . . . . . . 201 Programming with Structures and Duality . . . . . . . . . . . . . . 204 Polymorphism and Higher Kinds . . . . . . . . . . . . . . . . . . . 211 Well-Founded Recursion Principles . . . . . . . . . . . . . . . . . . 219 Indexed Recursion in the Sequent Calculus . . . . . . . . . . . . . 229 Encoding Recursive Programs via Structures . . . . . . . . . . . . 237 VII. Parametric Orthogonality Models . . . . . . . . . . . . . . . . 243 Poles, Spaces, and Orthogonality . . . . . . . . . . . . . . . . . . . 245 xii Chapter Page Computation, Worlds, and Types . . . . . . . . . . . . . . . . . . . 256 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Adequacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 VIII. The Polar Basis for Types . . . . . . . . . . . . . . . . . . . . . . 306 Polarizing User-Defined (Co-)Data Types . . . . . . . . . . . . . . 309 Type Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . 321 A Syntactic Theory of (Co-)Data Type Isomorphisms . . . . . . . 326 Laws of the Polarized Basis . . . . . . . . . . . . . . . . . . . . . . 345 The Faithfulness of Polarization . . . . . . . . . . . . . . . . . . . 361 IX. Representing Functional Programs . . . . . . . . . . . . . . . . 365 Pure Data and Co-Data in Natural Deduction . . . . . . . . . . . . 367 Natural Deduction versus Sequent Calculus . . . . . . . . . . . . . 386 Multiple Consequences . . . . . . . . . . . . . . . . . . . . . . . . . 405 X. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 xiii LIST OF FIGURES Figure Page 2.1. The NJ natural deduction system for second-order propositional logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2. NJ (natural deduction) proof of ` ((A ∧B) ∧ C) ⊃ (B ∧ A). . . . . . . . 18 2.3. The simply typed λ-calculus. . . . . . . . . . . . . . . . . . . . . . . . . 33 2.4. Typing derivation of the λ-calculus term λx. (pi2(pi1(x)), pi1(pi1(x))). . . . 34 2.5. The polymorphic λ-calculus (i.e. system F). . . . . . . . . . . . . . . . . 38 3.1. Truth tables for conjunction, disjunction, and implication. . . . . . . . . 47 3.2. The orientation of deductions for conjunction. . . . . . . . . . . . . . . . 47 3.3. The orientation of deductions for disjunction. . . . . . . . . . . . . . . . 47 3.4. The orientation of deductions for implication. . . . . . . . . . . . . . . . 47 3.5. The LK sequent calculus for second-order propositional logic. . . . . . . 49 3.6. Duality in the LK sequent calculus. . . . . . . . . . . . . . . . . . . . . . 62 3.7. µµ˜: The core language of the sequent calculus. . . . . . . . . . . . . . . . 66 3.8. The call-by-value (V) rewriting rules for the core µµ˜V-calculus. . . . . . 71 3.9. The call-by-name (N ) rewriting rules for the core µµ˜N -calculus. . . . . . 71 3.10. Scoping rules for (co-)variables in commands, terms, and co-terms. . . . 73 3.11. Implicit (co-)variable scope in the core µµ˜ typing. . . . . . . . . . . . . . 75 3.12. The syntax and types for the dual calculi. . . . . . . . . . . . . . . . . . 78 3.13. The β laws for the call-by-value (V) half of the dual calculi. . . . . . . . 80 3.14. The β laws for the call-by-name (N ) half of the dual calculi. . . . . . . . 80 3.15. LKQ: The focused sub-syntax and types for the call-by-value dual calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 xiv Figure Page 3.16. LKT: The focused sub-syntax and types for the call-by-name dual calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.17. The focusing ς laws for the call-by-value half of the dual calculi. . . . . . 89 3.18. The focusing ς laws for the call-by-name half of the dual calculi. . . . . . 89 3.19. The Q-focusing translation to the LKQ sub-syntax. . . . . . . . . . . . . 93 3.20. The T -focusing translation to the LKT sub-syntax. . . . . . . . . . . . . 94 3.21. The duality relation between the dual calculi. . . . . . . . . . . . . . . . 99 4.1. An additive and multiplicative LK sequent calculus. . . . . . . . . . . . 104 4.2. The positive/negative and additive/multiplicative classification of binary connectives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.3. The syntax and types for system L . . . . . . . . . . . . . . . . . . . . . 109 4.4. The polarized extensional η laws for system L. . . . . . . . . . . . . . . 114 4.5. The polarized core µµ˜P-calculus: its static and dynamic semantics. . . . 119 4.6. The syntax for polarized system L. . . . . . . . . . . . . . . . . . . . . . 120 4.7. Logical typing rules for polarized system L. . . . . . . . . . . . . . . . . 122 4.8. The operational β laws for polarized system L. . . . . . . . . . . . . . . 123 4.9. Focused sub-syntax and core typing rules for polarized system L. . . . . 125 4.10. Focused logical typing rules for polarized system L. . . . . . . . . . . . . 127 4.11. The focusing ς laws for polarized system L. . . . . . . . . . . . . . . . . 129 4.12. Extending polarized system L with subtraction. . . . . . . . . . . . . . . 132 4.13. The self-duality of system L types . . . . . . . . . . . . . . . . . . . . . 133 4.14. The self-duality of system L programs. . . . . . . . . . . . . . . . . . . . 134 5.1. A parametric theory, µµ˜S , for the core µµ˜-calculus. . . . . . . . . . . . . 139 5.2. Call-by-value (V) and call-by-name (N ) strategies for the core µµ˜-calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 5.3. “Lazy-call-by-value” (LV) strategy for the core µµ˜-calculus. . . . . . . . 141 xv Figure Page 5.4. “Lazy-call-by-name” (LN ) strategy for the core µµ˜-calculus. . . . . . . . 141 5.5. Nondeterministic (U) strategy for the core µµ˜-calculus. . . . . . . . . . . 141 5.6. Declarations of the basic data and co-data types. . . . . . . . . . . . . . 148 5.7. Adding data and co-data to the core µµ˜ sequent calculus. . . . . . . . . 148 5.8. Types of declared (co-)data in the parametric µµ˜ sequent calculus. . . . 150 5.9. Call-by-value (V) and call-by-name (N ) substitution strategies extended with arbitrary (co-)data types. . . . . . . . . . . . . . . . . . . 152 5.10. “Lazy-call-by-value” (LV) and “lazy-call-by-name” (LN ) substitution strategies extended with arbitrary (co-)data types. . . . . . 152 5.11. The βη laws for declared data and co-data types. . . . . . . . . . . . . . 157 5.12. The parametric βSςS laws for arbitrary data and co-data. . . . . . . . . 158 5.13. Declarations of the basic single-strategy data and co-data types. . . . . . 169 5.14. Declarations of basic mixed-strategy data and co-data types. . . . . . . . 170 5.15. Kinds of multi-strategy (co-)data declarations and types. . . . . . . . . . 171 5.16. Types of multi-strategy (co-)data in the parametric µµ˜ sequent calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.17. Type-agnostic kind system for the core µµ˜ sequent calculus. . . . . . . . 177 5.18. Type-agnostic kind system for multi-kinded (co-)data. . . . . . . . . . . 178 5.19. Composite #»S strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.20. Composite core polarized strategy P = V ,N . . . . . . . . . . . . . . . . 178 5.21. Composite core LV and LN strategy. . . . . . . . . . . . . . . . . . . . 179 5.22. The duality of types of the parametric µµ˜-calculus. . . . . . . . . . . . . 183 5.23. The duality of programs of the parametric µµ˜-calculus. . . . . . . . . . . 184 5.24. Translation between the call-by-value half of the simply-typed dual calculi and µµ˜. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 5.25. Translation between the call-by-name half of the simply-typed dual calculi and µµ˜. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 xvi Figure Page 5.26. The η laws for the dual calculi and extended (co-)values (V ′,N ′). . . . . 194 6.1. The syntax of types and programs in the higher-order µµ˜-calculus. . . . 212 6.2. The kind system for the higher-order parametric µµ˜ sequent calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 6.3. Types of higher-order (co-)data in the parametric µµ˜ sequent calculus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 6.4. βη conversion of higher-order types. . . . . . . . . . . . . . . . . . . . . 216 6.5. The βη laws for higher-order data and co-data types. . . . . . . . . . . . 217 6.6. The parametric βSςS laws for arbitrary higher-order data and co-data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 6.7. Type-agnostic kind system for higher-order multi-kinded (co-)data. . . . 218 6.8. The syntax of recursion in the higher-order µµ˜-calculus. . . . . . . . . . 230 6.9. The kind system for size-indexed higher-order µµ˜ sequent calculus. . . . 231 6.10. Rewriting theory for recursion in the parametric µµ˜-calculus. . . . . . . 234 6.11. Type erasure for the higher-order parametric µµ˜-calculus. . . . . . . . . 236 7.1. Core parallel conversion rules. . . . . . . . . . . . . . . . . . . . . . . . . 282 7.2. Parallel conversion rules for (co-)data types. . . . . . . . . . . . . . . . . 283 8.1. Declarations of the primitive polarized data and co-data types. . . . . . 312 8.2. Declarations of the shifts between strategies as data and co-data types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 8.3. A polarizing translation from G into P . . . . . . . . . . . . . . . . . . . 315 8.4. A theory for structural laws of data type declaration isomorphisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 8.5. A theory for structural laws of co-data type declaration isomorphisms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 8.6. Isomorphism laws of positively polarized data sub-structures. . . . . . . 341 8.7. Isomorphism laws of negatively polarized co-data sub-structures. . . . . 342 8.8. Algebraic laws of the polarized basis of types. . . . . . . . . . . . . . . . 346 xvii Figure Page 8.9. De Morgan duality laws of the polarized basis of types. . . . . . . . . . . 353 8.10. Identity laws of the redundant self-shift connectives. . . . . . . . . . . . 358 8.11. Derived laws of polarized functions. . . . . . . . . . . . . . . . . . . . . . 360 9.1. Untyped syntax for a natural deduction language of data and co-data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 9.2. A natural deduction language for the core calculus. . . . . . . . . . . . . 370 9.3. Natural deduction typing rules for simple (co-)data. . . . . . . . . . . . 371 9.4. Natural deduction typing rules for higher-order (co-)data. . . . . . . . . 372 9.5. A core parametric theory for the natural deduction calculus. . . . . . . . 373 9.6. The untyped parametric βς laws for arbitrary data and co-data types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 9.7. The typed βη laws for declared data and co-data types. . . . . . . . . . 375 9.8. Call-by-value (V) strategy in natural deduction. . . . . . . . . . . . . . . 375 9.9. Call-by-name (N ) strategy in natural deduction. . . . . . . . . . . . . . 376 9.10. Call-by-need (LV) strategy in natural deduction. . . . . . . . . . . . . . 377 9.11. Type-agnostic kind system for multi-kinded natural deduction terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378 9.12. The pure, recursive size abstractions in natural deduction. . . . . . . . . 381 9.13. The β and ν laws for recursion. . . . . . . . . . . . . . . . . . . . . . . . 382 9.14. Translations between λlet and single-consequence µµ˜. . . . . . . . . . . 388 9.15. λµ: adding multiple consequences to natural deduction. . . . . . . . . . 405 9.16. The laws of control in λµ. . . . . . . . . . . . . . . . . . . . . . . . . . . 406 9.17. Translations between natural deduction and the sequent calculus with many consequences. . . . . . . . . . . . . . . . . . . . . . . . . . . 415 xviii CHAPTER I Introduction Truth and falsehood, questions and answers, construction and deconstruction; as Alcmaeon (510BC) once said, most things come in dual pairs. Duality is a guiding force, a mirror that reveals the new from the old via opposition. This idea appears pervasively in logic, where duality is expressed by negation that inverts “true” with “false” and “and” with “or.” However, even though the theory of programming languages is closely connected to logic, this kind of strong duality is not so apparent in the practice of programming. For example, sum types (disjoint tagged unions) and pair types (structures) are related to dual concepts. But in the realm of programming, the duality between these two features is not easy to see, much less use for any practical purpose. The situation is even worse for more complicated language features, where two concepts, both important to the theory and practice of programming, are connected by duality but one is well understood while the other is enigmatic and underdeveloped. In the case of recursion and looping, inductive data types (like lists and trees of arbitrary, but finite, size) are known to be dual to co-inductive infinite processes (like streams of input or servers that are indefinitely available) (Hagino, 1987).1 However, while proof assistants like Coq (Coq 8.4, 2012) have a sophisticated treatment of induction, their treatment of co-induction is problematic (Giménez, 1996; Oury, 2008). The bias towards induction and inadequate treatment of co-induction in type theory and proof assistants is a road block for program verification and correctness. Our main philosophy for approaching these questions is known as the Curry- Howard isomorphism or proofs-as-programs paradigm (Curry et al., 1958; Howard, 1980; de Bruijn, 1968). The Curry-Howard isomorphism reveals a deep and profound connection between logic and programming wherein mathematical proofs are algorithmic programs. The canonical example of the isomorphism is the correspondence between Gentzen’s (1935a) natural deduction, a system that formalizes common mathematical reasoning by laying down the rules of intuitionistic logic, and 1In general, adding the prefix “co-” to a term or concept means “the dual of that thing,” and we use the shorthand “(co-)thing” to mean “both thing and co-thing.” 1 Church’s (1932) λ-calculus, one of the first models of computation and the foundation for functional programming languages. The rules for justifying proofs in intuitionistic logic correspond exactly to the rules for writing programs in functional languages, and simplifying proofs corresponds to running programs. This connection has led technical advances flow both ways: not only can we use mathematics to help write programs in functional languages, but we can also write programs to help develop mathematics with proof assistants. However, the λ-calculus is not an ideal setting for studying duality in computation. Dualities that are simple in other settings, like the De Morgan laws in logic, are far from obvious in the λ-calculus. The root of the problem is related to a lack of symmetry: natural deduction is only concerned with verifying truth and the λ-calculus is only concerned with producing results. Natural deduction is not the only logic, however. In fact, natural deduction has a twin sibling called the sequent calculus, born at the same time within the seminal paper of Gentzen (1935a). Whereas the rules of natural deduction more closely mimic the reasoning that might occur in the minds of mathematicians, the rules of the sequent calculus are themselves easier to reason about, for example, if we want to show that the logic is consistent. Furthermore, unlike natural deduction’s presentation of intuitionistic logic, Gentzen’s sequent calculus provides a native language for classical logic which admits additional reasoning principles like proof by contradiction: if a logical statement cannot be false, then it must be true. As a consequence, the sequent calculus clarifies and reifies the many dualities of classical logic as pleasant symmetries baked into the very structure of its rules. In this formal system of logic, equal attention is given to falsity and truth, to assumptions and conclusions, such that there is perfect symmetry. Yet, even though these two systems look very different from each other and have their own distinct advantages and limitations, they are closely connected and give us different perspectives into the underlying phenomena of logic. When interpreted as a programming language, the natural symmetries of the sequent calculus reveal hidden dualities in programming—input and output, production and consumption, construction and deconstruction, structure and pattern— and makes them a prominent part of the computational model. Fundamentally, the sequent calculus expresses computation as an interaction between two opposed entities: a producer representing a program that creates information, and a consumer representing an environment or context that observes information. Computation then occurs as a communication protocol allowing a producer and consumer to speak to one 2 another. This two-party method of computation gives a different view of computation than the one shown by the λ-calculus. In particular, programs in the sequent calculus can also be seen as configurations of an abstract machine (Ariola et al., 2009a), in which the evaluation context is reified as a syntactic object that may be directly manipulated. And due to the connection between classical logic (Griffin, 1990) and control operators like Scheme’s (Kelsey et al., 1998) callcc or Felleisen’s (1992) C, the built-in classicality of the sequent calculus also gives an effectful language for manipulating control flow. The computational interpretation of the sequent calculus is not just an intellectual curiosity. Thanks to the relationship between natural deduction and the sequent calculus as sibling logics (Gentzen, 1935b), the sequent calculus gives us another angle for investigating real issues that arise in the λ-calculus and functional programming, from source languages down to the machine. For example, McBride (Singh et al., 2011) points out how the poor foundation for the computational interpretation of co-induction is a road block for program verification and correctness, which is in contrast to the robust and powerful treatment of induction in functional languages and proof assistants. However, we show here how the symmetries of the sequent calculus show us how both induction and co-induction can be represented as equal and opposite reasoning principles under the unifying umbrella of structural recursion for both ordinary recursive types and generalized algebraic datatypes (a.k.a. GADTs). This computational symmetry between induction and co-induction is based on the duality between data types in functional languages and co-data types as objects, and gives a more robust way for proof assistants to handle recursion in infinite objects. Moving down into the intermediate representation of programs that exists within optimizing compilers, the logic of the sequent calculus shows how compilers can use continuations in a more direct way with a “strategically defunctionalized” (Reynolds, 1998) continuation-passing style (CPS). This compromise between continuation- passing and direct style makes it possible to transfer techniques between CPS (Appel, 1992) and static single assignment (SSA) (Cytron et al., 1991) compilers like SML/NJ with direct style compilers like the Glasgow Haskell Compiler. For example, CPS can faithfully represent join points in control flow (Kennedy, 2007), whereas direct style can use arbitrary transformations expressed in terms of the original program (Peyton Jones et al., 2001). Finally, the sequent calculus can also be interpreted as an even lower-level, machine-like language for functional programs (Ohori, 1999), which 3 can be used to reason about fine details like manual memory management (Ohori, 2003). Therefore, the computational interpretation of the sequent calculus acts like a beacon illuminating murky areas in both the design and implementation of functional languages. Overview The structure of this dissertation can be broken down into three major parts. First, Chapters II to IV review the background on the Curry-Howard isomorphism for logics and languages based on natural deduction and sequent calculus. Second, Chapters V and VI give the design and semantics of programming language features in the setting of the sequent calculus based on an analysis of the background in the first part. Third, Chapters VII to IX study the theory and application of the language features in the second part for the purpose of reasoning about and implementing programs. Chapters II to VI have a linear dependency order; Chapter III depends on Chapter II, Chapter IV depends on Chapter III, and so on. After that, Chapters VII to IX depend on the preceding Chapters II to VI, but not on each other, and can be read in any order. Chapter V is an extended and rewritten material from the previous publication (Downen & Ariola, 2014c) which I co-authored with Zena M. Ariola, Chapter VI is a revised version of (Downen et al., 2015) which I co-authored with Philip Johnson- Freyd and Zena M. Ariola, and Chapter VII uses some ideas from the supporting materials in the appendix of (Downen et al., 2015) that I developed in collaboration with Philip Johnson-Freyd. Background Chapter II reviews the logical system NJ of natural deduction, the core programming language represented by the simply typed λ-calculus, and the Curry- Howard correspondence between them. After considering the strength of their correspondence and its application to functional programming, the chapter concludes with some criticisms of issues in programming that are not readily addressed by these two corresponding systems. Chapter III is about how the idea behind the Curry-Howard isomorphism leads to a foundational programming language based off the LK sequent calculus, which 4 is an alternative view of logic from natural deduction. A core calculus—called µµ˜— is introduced, which lies at the heart of all the languages of the sequent calculus to follow in the dissertation. The µµ˜-calculus brings up the fundamental dilemma of computation in classical logic as corresponding to the need to fix an evaluation order (like eager or lazy evaluation) for programming languages. The rest of LK’s logical features are layered on top of this core which lets us talk about how ideas from logic—such as de Morgan duality and focusing—translate to important concepts in programming. Chapter IV is about the application of polarity from logic to programming. In logic, polarity tells us that types have one of two fundamental orientations— positive or negative—which can be observed from the nature of their rules and impact their meaning both in proof theory and computation. This brings into focus the connection between pattern matching (from functional programming languages) and extensionality (i.e. the idea that the only thing that can be observed about objects is how they react to stimuli), and tells us how to combine both call-by-value and call-by-name evaluation orders within a single program. Language design Chapter V presents a general framework that captures the previous interpretations of the sequent calculus as a programming language (from Chapters III and IV), and separates several independent concepts that were previously entangled. The main ideas of this chapter are: – All the individual logical connectives considered previously in the dissertation can be represented by either data or co-data which are dual programming constructs to one another and represent the mechanisms that both functional and object- oriented languages use to let programmers declare new custom types. – The impact of evaluation strategies on the behavior of programs can be described by a discipline on substitution (i.e. what could a variable in a program possibly stand for?), which lets us abstract away the differences caused by evaluation orders out of the syntactic semantics of programming languages. This abstract view of evaluation strategies encompasses the simple and canonical strategies, namely call-by-value and call-by-name, as well as more complex and nuanced strategies like call-by-need or radically non-deterministic evaluation. 5 – Programs can make use of multiple evaluation strategies by combining many substitution disciplines (from the previous point) which are kept separate by specifying a particular evaluation strategy for each type, so there are several distinct kinds of types with each kind corresponding to a specific strategy. This corresponds to the way that the Glasgow Haskell Compiler uses unboxed types (Peyton Jones & Launchbury, 1991) to distinguish the different evaluation orders of (necessarily strict) machine numbers and arrays from the otherwise lazy Haskell programs. Chapter VI extends Chapter V with well-founded induction and co-induction, giving a fair treatment of co-induction by representing both as just specific use-cases of structural recursion that can’t loop forever. The main ideas of this chapter are: – Type abstraction (i.e. generics and modules) can be achieved by generalizing the language of types with type functions, and letting (co-)data types quantify over private type parameters that are not externally visible in their interface. – Recursion in types (i.e. recursive types like lists or trees) can be achieved by recursive data or co-data declarations using both the primitive recursion and noetherian recursion principles from mathematics, where the recursive argument is an index that tracks the “size” of the type (like the length of the list or height of the tree). – Recursion in programs (i.e. loops which must terminate on all inputs) can be achieved by abstracting over the size index to recursive types, so that the program cannot loop forever since the statically-known size must always decrease each cycle. Theory and application Chapter VII develops a semantics for the programming language designed in Chapters V and VI based on the idea of orthogonality (also known as bi-orthogonality, >>-closure, or classical realizability). This gives a model connecting compile-time types to run-time behavior useful for confirming language-wide safety properties in the style of exhaustive testing: the collection of safe programs of a type are selected from a pool of potential candidate implementations by checking them against a test suite of observations; or dually a collection of safe observations of a type are 6 selected from the possible ones that might be considered by checking them against the blessed specification programs. The chapter begins with a general introduction to orthogonality and a comparison to negation in intuitionistic logic, and then builds a specific model for the sequent-based language which is parameterized by both the declared (co-)data types and the evaluation strategy(ies) used to interpret programs. The adequacy of the model—that is, the fact that the syntactic typing rules implies their semantic equivalent—is then applied to confirm several safety properties of the language, including type safety, strong normalization, and the soundness of (typed) extensionality laws with respect to the (untyped) operational semantics. Chapter VIII applies the ideas from Chapter V to the problem of how polarity (from Chapter IV) informs us of a small, finite collection of data and co-data types which are capable of faithfully encoding every other (simple) type that a programmer could possibly come up with. The emphasis here is on the “faithfulness” of encodings which requires that some care is taken about which evaluation strategy is used at each point in the program, so that the encodings don’t accidentally introduce the possibility of rogue behavior that the programmer’s original type disallowed. To that point, this chapters give a formal verification based on a theory of type isomorphisms of the common folklore from polarized logic that complex types from both call-by-value and call-by-name functional programming languages can be represented with the primitive polarized types by sprinkling the special polarity shift connectives in the appropriate places. However, the broader view of evaluation strategies and (co-)data types taken here lets us consider how to encode types from call-by-need languages as well, which uses four (rather than just the normal two) different shifts to and from the canonical call-by-value and call-by-name strategies. Chapter IX goes full circle, and relates back to natural deduction and the λ-calculus, demonstrating how languages from Chapters V and VI based on the sequent calculus can impact functional programming. The canonical relationship between natural deduction and the sequent calculus gives a strong, bi-directional correspondence to the intuitionistic restriction of the µµ˜-calculus and λ-calculus family of languages. This correspondence can be applied to functional programming languages, which are based on the λ-calculus, in one of two ways: (1) in the one direction, functional programs can be compiled down to a machine-like representation based on the sequent calculus, and (2) in the other direction, theories and ideas from the sequent calculus can be translated back to the λ-calculus and the functional 7 paradigm. Afterward, the intuitionistic restriction is lifted, and the correspondence is generalized to cover the full classical µµ˜-calculus by generalizing the λ-calculus with first-class control. This generalization gives us a foundational language and a starting point for talking about join points—a general technique for efficiently representing shared control flow in programs—in direct style. 8 CHAPTER II Natural Deduction The foundations of mathematics and computation have connections that took root in the early 1900s, when Hilbert posed the decision problem: Is there an effectively calculable procedure that can decide whether a logical statement is true or false? This question, and its negative answer, prompted an investigation into the rigorous meaning of what is “effectively computable” from Church (1936), Turing (1936), and Gödel (1934). Later on, a much deeper connection between models of computation and formalizations of logic was independently discovered and rediscovered many times (Curry et al., 1958; Howard, 1980; de Bruijn, 1968). The most typical form of this amazing coincidence, now known as the Curry-Howard isomorphism or the proofs-as- programs paradigm, gives a structural isomorphism between Church’s (1932) λ-calculus, a system for computing with functions, and Gentzen’s (1935a) natural deduction, a system for formalizing mathematical logic. To illustrate the connection between logic and programming, we will review the two systems and show how both they reveal similar core concepts in different ways. In particular, two principles important for characterizing the meaning of various structures, which we call β and η from the tradition of the λ-calculus, arise independently in both fields of study. Gentzen’s NJ In 1935, Gentzen formalized an intuitive model of logical reasoning called natural deduction, as it aimed to symbolically model the “natural” way that mathematicians reason about proofs. A proof in natural deduction is a tree-like structure made up of several inferences: .... H1 .... H2 . . . .... Hn J where we infer the conclusion J from proofs of the premises H1, H2, . . . , H3. The conclusion J and premises Hi are all judgments that make a statement about logical 9 propositions (which we denote by the variables A,B,C, . . . ) that may be true or false, such as “0 is greater than 1.” For example, we can make the basic judgment that a proposition A is true, which we will write as ` A. Proof trees are built by stacking together compatible inferences of the above form; we say that a proof tree is closed if all leaves of the tree end with an axiom—that is, the special case of an inference with zero premises—otherwise it is open. Open proof trees represent (partial) proofs that rely on unsubstantiated assumptions, whereas closed proof trees represent self-contained (complete) proofs. Syntax and rules The propositions that we deal with in logics like natural deduction are meant to represent falsifiable or verifiable claims in a particular domain of study, such as “1 + 1 = 2.” However, in their simplest form, these systems don’t account for domain-specific knowledge and leave such basic propositions as atoms or uninterpreted variables. Instead, the primary interest of the logic is to characterize the meaning of connectives that combine (zero or more) existing propositions, which are the logical glue for putting together the basic building blocks. These connectives become the central focus in Gentzen’s NJ, whose syntax and rules are given in Figure 2.1. For example, the idea of logical conjunction is expressed formally as a connective, written A∧B in NJ and read “A and B,” along with some associated rules of inference for building proofs involving conjunction. On the one hand, in order to deduce that A ∧B is true we may use the introduction rule ∧I: ` A ` B ` A ∧B ∧I That is to say, if we have a proof that A is true and a proof that B is true, then we have a proof that A∧B is true. On the other hand, in order to use the fact that A∧B is true we may use either one of the elimination rules ∧E1 or ∧E2: ` A ∧B ` A ∧E1 ` A ∧B ` B ∧E2 That is to say, if we have a proof that A ∧B is true, then it must be the case that A is true and also that B is true. 10 X, Y, Z ∈ PropVariable ::= . . . A,B,C ∈ Proposition ::= X | > | ⊥ | A ∧B | A ∨B | A ⊃ B | ∀X.A | ∃X.A H, J ∈ Judgement ::= ` A ` > >I no >E rule no ⊥I rule ` ⊥ ` C ⊥E ` A ` B ` A ∧B ∧I ` A ∧B ` A ∧E1 ` A ∧B ` B ∧E2 ` A ` A ∨B ∨I1 ` B ` A ∨B ∨I2 ` A ∨B ` A x....` C ` B y....` C ` C ∨Ex,y ` A x....` B ` A ⊃ B ⊃Ix ` A ⊃ B ` A ` B ⊃E .... (X /∈ FV (∗))` A ` ∀X.A ∀IX ` ∀X.A ` A {B/X} ∀E ` A {B/X} ` ∃X.A ∃I ` ∃X.A ` A x.... (X /∈ FV (∗))` C (X /∈ FV (C)) ` C ∃EX,x FIGURE 2.1. The NJ natural deduction system for second-order propositional logic: with truth (>), falsehood (⊥), conjunction (∧), disjunction (∨), implication (⊃), and both universal (∀) and existential (∃) propositional quantification. 11 NJ also gives an account of logical implication as a connective in natural deduction, written A ⊃ B and read “A implies B” or “if A then B,” in a similar fashion. In order to deduce that A ⊃ B is true we may use the introduction rule ⊃I for implication: ` A x....` B ` A ⊃ B ⊃Ix Notice that the introduction rule for implication has a more complex form of introduction rule than the one for conjunction. In particular, the single premise of the ⊃I rule introduces a local assumption that is only visible in the proof tree of that premise. This premise says that if we can prove that B is true by assuming that A is true, then we can conclude that A ⊃ B is true without the extra free assumption about A. As a matter of bookkeeping, the identifier x used to mark the local axiom whose scope within the overall proof is delimited by a corresponding ⊃Ix introduction rule for proving the truth of an implication. Note that this local axiom x may be used as many times as necessary in the sub-proof—be it zero times or several times—so long as it is not used outside the scope created by the ⊃Ix rule. Once we have a proof of A ⊃ B, we may make use of it with the elimination rule ⊃E for implication: ` A ⊃ B ` A ` B ⊃E This is a formulation of the traditional reasoning principle modus ponens: if we believe that A implies B is true and that A is true as well, then we must believe B is true. The last binary connective in NJ, written A ∨B and read “A or B,” formalizes logical disjunction. There are two different ways to prove that A ∨ B is true, which corresponds to two different introduction rules ∨I for disjunction: ` A ` A ∨B ∨I1 ` B ` A ∨B ∨I2 If we have a proof that A is true or a proof that B is true, then we have a proof that A ∨ B is true. Notice how the elimination rules for ∨ are like upside-down versions of the introduction rules for ∧. Unfortunately, making use of a proof that A ∨ B is true is awkward in natural deduction, compared to connectives like conjunction and implication. The elimination rule ∨E for disjunction is the most complex one of the 12 binary connectives of NJ: ` A ∨B ` A x....` C ` B y....` C ` C ∨Ex,y This elimination rule assumes three premises: that A ∨B is true, that assuming A is true lets us prove that C is true, and that assuming B is true lets us prove that C is true. The conclusion of the rule asserts that C must be true because we know how to prove it in either possible case where A or B is true. Note that the ∨E elimination rule relies (twice) on the same mechanism of local assumptions for the two sub-proofs of C that was also used in the ⊃I introduction rule. Hence, we use the same bookkeeping identifiers connecting both local axioms x and y with the rule ∨Ex,y that delimits their scope in the overall proof. In the degenerate case, connectives that join zero propositions together serve as logical constants. For example, consider a connective that internalizes the notion of truth or validity into the system, written > and pronounced “true.” By its intuitive meaning, we may always deduce that> is true with no additional premise, as described by the introduction rule >I: ` > >I However, we can do nothing interesting with a proof that > is true. In other words, “nothing in, nothing out.” Notice how > can be understood as the nullary version of the binary connective ∧ for conjunction: > has a single introduction rule with zero premises similar to ∧’s two-premise introduction rule, and > has no elimination rules compared with ∧’s two eliminations. We can also consider a connective for internalizing the notion of falsehood, written ⊥ and pronounced “false.” In contrast to >, we should never be able to prove that ⊥ is true in any sensible context since that would be, well, false. In other words, there is no valid introduction rule ⊥I. But if we are in some context where ⊥ is true for some reason, then for all intents and purposes any proposition C might as well be true, as described by the elimination rule ⊥E: ` ⊥ ` C ⊥E 13 Again, notice how ⊥ can be understood as the nullary version of the binary connective ∨ for disjunction: ⊥ has no introduction rules compared to ∨’s two introductions, and ⊥ has a single elimination rule with ` ⊥ as the only premise compared to ∨’s elimination rule that assumes two premises in addition to ` A ∨B. Using connectives described above, we can also define a derived connective for negation, written ¬A and pronounced “not A,” which can be used to (indirectly) state that a proposition is not true. For example, we should intuitively expect to be able to prove ` ¬⊥ (“false is not true”) in NJ but be unable to derive ` ¬> (“true is not true”). In lieu of treating ¬ as a proper connective,1 it can be defined in terms of implication (⊃) and falsehood (⊥) ¬A , A ⊃ ⊥ so that the derived rules for negation that come from this encoding are: ` A x....` ⊥ ` ¬A ⊃Ix ` ¬A ` A ` ⊥ ⊃E Finally, the most complex form of propositions in NJ are the quantifiers: logical connectives which abstract over a proposition variable (denoted by X, Y, Z, and of which there are countably many) that occurs inside of a proposition.2 The first such quantifier is the universal quantifier, written ∀X.A and pronounced “for all X, A,” which codifies when the quantified proposition variable X may stand for any proposition. For example, NJ has the property that for any proposition A, ` A ⊃ A is provable. This fact can be represented more formally by proving ` ∀X.X ⊃ X, where X is the universally quantified proposition variable. The second quantifier is the existential quantifier, written ∃X.A and pronounced “there exists an X such that A,” which codifies when the quantified proposition variable X stands for a specific but unknown proposition. For example, there are propositions in NJ that are provably 1Although Gentzen (1935a) did originally treat negation as a proper connective in NJ, it was defined in terms of the ⊥ connective so that the associated introduction and elimination rules for negation are identical to the ones given here. 2For simplicity, we limit the presentation of NJ to second-order propositional logic. That is to say, the quantifiers ∀ and ∃ abstract over propositions themselves, as opposed to objects of some particular domain of interest like numbers. 14 true (such as the aforementioned A ⊃ A or simply the trivial truth >), which can be represented formally by proving ` ∃X.X. Since both of these quantifiers bind variables in propositions, all the usual subtleties in programming languages involving static variables applies. In summary, an occurrence of a proposition variable X in a proposition A is bound if it is within the context of an ∀X or an ∃X and free otherwise (FV denotes the function that computes the set of free variables of a proposition), and A {B/X} denotes the usual capture- avoiding substitution operation where all free occurrences of X in A are replaced with B such that all free occurrences of variables within B are still free after substitution. We also do not distinguish propositions based on the choice of bound variable names, commonly known as α equivalence, as stated by the two equalities for quantifiers: ∀X.A {X/Z} =α ∀Y.A {Y/Z} ∃X.A {X/Z} =α ∃Y.A {Y/Z} where X and Y must not be free in A. The important property α equivalence and capture-avoiding substitution is that they commute with one another, so that renaming bound variables does not affect the result of substitution up to α equivalence. Stated more formally, for all propositions A,B, and C, ifA =α B then A {C/X} =α B {C/X}. A more thorough introduction to static variables and substitution is given by Barendregt (1985) and Pierce (2002). In general, throughout this thesis we will take α- equivalence for granted whenever static variable binders are present without belaboring the formalities. Establishing universal truths is a delicate matter, and requires the proper discipline when crafting well-formed proofs. This subtlety rears its head in the universal introduction rule ∀I for proving ∀X.A, which requires a new form of constraint on its premise: .... (X /∈ FV (∗))` A ` ∀X.A ∀IY The side condition X /∈ FV (∗) on the proof in the premise of the ∀I rule means that the variable X cannot appear free in any of the propositions in the open leaves of the sub-proof tree. Intuitively, this side condition on the variable X ensures that X is totally generic in the sub-proof, so that we do not accidentally assume anything about X that could leak into another part of the overall proof. Therefore, the ∀I rule can be understood as stating that if we prove A is true where X is generic, then ∀X.A must 15 also be true. In contrast, the universal elimination rule ∀E has no such side condition and can apply to any premise: ` ∀X.A ` A {B/X} ∀E In other words, from a proof that ∀X.A is true, any instance of A with an arbitrary B substituted for X is also true. In contrast to the universal quantifier, establishing existential truths is easy. We may deduce that ∃X.A is true by using the introduction rule ∃I: ` A {B/X} ` ∃X.A ∃I which says that if A is true for some choice of B substituted forX, then it must be that ∃X.A is true. Notice that the introduction rule for ∃ is like an upside-down version of the elimination rule for ∀; neither of the two rules impose any special criteria on their premise. However, it is harder to use the fact that ∃X.A is true with the corresponding elimination rule ∃E: ` ∃X.A ` A x.... (X /∈ FV (∗))` C (X /∈ FV (C)) ` C ∃EX,x The same side condition X /∈ FV (∗) that appeared in the premise of ∀I also appears in the second premise of ∃E, so that X cannot appear free in any open leaves (besides uses of the axiom x) of the sub-proof, but additionally the existential elimination rule must also ensure that X is not free in the conclusion C. Intuitively, both of these side conditions ensure that both the result ` C as well as its sub-proof is generic in the choice of X. Therefore, the ∃E rule can be understood as stating that if we can prove that ∃X.A is true and that C can be proved true from assuming A is true with a generic X, then C must be true in general. Example 2.1. Consider how we might build a proof that ((A∧B)∧C) ⊃ (B∧A) is true. To start searching for a proof, we may begin with our goal ` ((A ∧B) ∧ C) ⊃ (B ∧ A) at the bottom of the proof tree, and then try to simplify the goal by applying the 16 implication introduction rule “bottom up:” ` (A ∧B) ∧ C x.... ` (A ∧B) ∧ C ` B ∧ A ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃Ix This move adds the assumption (A∧B)∧C to our local hypothesis for the duration of the proof, which we may use to finish off the proof at the top by the Ax rule. We are still obligated to fill in the missing gap between Ax and ⊃I, but our job is now a bit easier, since we have gotten rid of the ⊃ connective from the consequence in the goal. Next, we can try to simplify the goal again by applying the conjunction introduction rule to get rid of the ∧ in the goal: ` (A ∧B) ∧ C x....` B ` (A ∧B) ∧ C x....` A ` B ∧ A ∧I ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃Ix We now have two sub-proofs to complete: a deduction concluding B and a deduction concluding A from our local hypothesis (A∧B)∧C. At this point, the consequences of our goals are as simple as they can be—they no longer contain any connectives for us to work with. Therefore, we instead switch to work “top down” from our assumptions. We are allowed to assume (A∧B)∧C, so let’s eliminate the unnecessary proposition C using a conjunction elimination rule in both sub-proofs: ` (A ∧B) ∧ C x ` A ∧B ∧E1....` B ` (A ∧B) ∧ C x ` A ∧B ∧E1....` A ` B ∧ A ∧I ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃Ix We can now finish off the entire proof by using conjunction elimination “top down” in both sub-proofs, closing the gap between assumptions and conclusions as shown in Figure 2.2. Since there are no unjustified branches at the top of the tree (every leaf is 17 an axiom provided by the ⊃I introduction rule) and there are no longer any gaps in the proof, we have completed the deduction of our goal. End example 2.1. Remark 2.1. The bookkeeping that keeps track of the scope of local axioms introduced by the ⊃I, ∨E, and ∃E rules is important for ruling out bogus proofs that appear to be closed but manage to deduce something like ` ⊥ that should be impossible. For example, we could build a closed proof of ` ⊥ by using the ⊃I rule incorrectly as follows: ` > >I ` ⊥ ⊃ > ⊃Ix ` ⊥ x ` (⊥ ⊃ >) ∧ ⊥ ∧I ` ⊥ ∧E2 Notice how the local axiom x that is introduced by the ⊃Ix rule in the left sub-proof has been improperly “leaked” into the right sub-proof. This leak goes against the constraints of the ⊃Ix rule and so the above proof tree is not well-formed. Likewise, we can build another proof of ` ⊥ by incorrectly applying the ∨E rule as follows: ` > >I ` > ∨ ⊥ ∨I1 ` ⊥ y ` ⊥ y ` ⊥ ∨Ex,y Again, the above proof is not well-formed because the constraints of the ∨Ex,y rule are not met: the local axiom y has been used in the middle premise but its scope is limited to only the right premise. The use of identifiers for local axiom bookkeeping is more explicit than many other presentations of natural deduction systems, but every system of natural deduction must enforce equivalent restrictions on these kinds of rules with local axioms. End remark 2.1. ` (A ∧B) ∧ C x ` A ∧B ∧E1 ` B ∧E2 ` (A ∧B) ∧ C x ` A ∧B ∧E1 ` A ∧E1 ` B ∧ A ∧I ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃Ix FIGURE 2.2. NJ (natural deduction) proof of ` ((A ∧B) ∧ C) ⊃ (B ∧ A). 18 Remark 2.2. The side conditions on free proposition variables in the ∀I and ∃E rules are perhaps the most complex ones to understand, but are nonetheless crucial for the overall logic to make sense. For example, it makes intuitive sense that if A is true for all choices of X, then there is some choice of X such that A is true. Stated formally, this intuition can be encoded into the proposition (∀X.A) ⊃ (∃X.A), which can be proved in NJ as follows: ` ∀X.X y ` Y ∀E ` ∃X.X ∃I ` (∀X.X) ⊃ (∃X.X) ⊃Iy The converse implication (∃X.A) ⊃ (∀X.A)—that if A is true for some X then it must be true for all X—does not intuitively make sense, and indeed is not provable in NJ. However, we can prove such a statement if we are sloppy with the side conditions in ∀I and ∃E as follows: ` ∃X.X y ` X z ` X ∃EX,z ` ∀X.X ∀IX ` (∃X.X) ⊃ (∀X.X) ⊃Iy This proof is not well-formed because the conclusion of the ∃EX,z rule is ` X, which contains a free occurence ofX (as just plainly itself). It is fortunate that the restrictions on free proposition variables prevent a proof of ` (∃X.X) ⊃ (∀X.X) in NJ since that leads to clearly wrong conclusions like ` ⊥, similar to Remark 2.1, as follows: ` ∃X.X y ` X z ` X ∃EX,z ` ∀X.X ∀IX ` (∃X.X) ⊃ (∀X.X) ⊃Iy ` > >I ` ∃X.X ∃I ` ∀X.X ⊃E ` ⊥ ∀E End remark 2.2. Logical harmony Now that we know about some connectives and their rules of inference in our system of natural deduction, we would like to have some assurance that what we have defined is sensible in some way. To this end, we can insist on logical harmony, an idea that has roots in arguments by Dummett (1991), to justify that the inference rules are meaningful. Just like Goldilocks, we want rules that are neither too strong (leading to 19 an inconsistent logic) nor too weak (leading to gaps in our knowledge), but are instead just right. Logical harmony for a particular connective can be broken down into two properties of that connective’s inference rules: local soundness and local completeness (Pfenning & Davies, 2001). For a single logical connective, we need to check that its inference rules are not too strong, meaning that they are locally sound, so that the results of the elimination rules are always justified. In other words, we cannot get out more than what we put in. Local soundness is expressed in terms of proof manipulations: a (potentially open) proof in which an introduction is immediately followed by an elimination can be simplified to a more direct proof. On the one hand, in the case of conjunction, if we follow ∧I with ∧E1, then we can perform the following reduction on the proof tree: .... D1` A .... D2` B ` A ∧B ∧I ` A ∧E1  .... D1` A where D1 and D2 stand for proofs that deduce ` A and ` B, respectively. If we had forgotten to include the first premise ` A in the ∧I rule, then this soundness reduction would have no proof to justify its conclusion. On the other hand, if we follow ∧I with ∧E2, then we have a similar reduction: .... D1` A .... D2` B ` A ∧B ∧I ` B ∧E2  .... D2` B Additionally, we should ensure that the rules are not too weak, so that all the information that goes into a proof can still be accessed somehow. In this respect, we say that the inference rules for a logical connective are locally complete if they are strong enough to break an arbitrary (potentially open) proof ending with that connective into pieces and then put them back together again. For conjunction, this is expressed by the following proof transformation: .... D` A ∧B ≺ .... D` A ∧B ` A ∧E1 .... D` A ∧B ` B ∧E2 ` A ∧B ∧I 20 If we had forgotten the elimination rule ∧E2, then local completeness would fail because we would not have enough information to satisfy the premise of the ∧I introduction rule. As a result, the rules will still be sound but we would be unable to prove a basic tautology like A ∧ B ⊃ B ∧ A, which should hold by our intuitive interpretation of A ∧B. We also have local soundness and completeness for the inference rules of logical implication, although they require a few properties about the system as a whole. For local soundness, we can reduce ⊃I immediately followed by ⊃E as follows: ` A x.... D` B ` A ⊃ B ⊃Ix .... E` A ` B ⊃E  .... E` A.... D {E/x}` B where D {E/A} is the substitution of the proof E for any uses of the local axiom x in D. The substitution gives us a modified proof that no longer needs that particular local axiom x of ` A, since any time the x axiom was used we instead place a full copy of the E proof of ` A. For local completeness, we can expand an arbitrary proof D of ` A ⊃ B as follows: .... D` A ⊃ B ≺ .... D` A ⊃ B ` A x ` B ⊃E ` A ⊃ B ⊃Ix Notice that on the right hand side the additional axiom x introduced by the use of the ⊃Ix introduction rule is implicitly unused in the proof D. The local soundness for the inference rules of logical disjunction follow from the techniques used to show soundness of both conjunction and implication: disjunction both uses a choice of two alternatives as well as a substitution for local axioms. By letting i stand for either 1 or 2, we have the following reduction for either case when ∨I1 is followed by ∨E or ∨I2 is followed by ∨E: .... D Ai ` A1 ∨ A2 ∨Ii A1 x1 .... E1 C A2 x1 .... E2 C ` C ∨Ex1,x2  .... D` Ai.... Ei {D/xi}` E 21 This reduction uses the same substitution operation as for the local soundness of implication, where the correct premise Ei is selected to match the possible choice of introduction rules. The local completeness, we can expand an arbitrary proof D of ` A ∨B as follows: .... D` A ∨B ≺ .... D` A ∨B ` A x ` A ∨B ∨I1 ` B y ` A ∨B ∨I2 ` A ∨B ∨Ex,y Note that this expansion may appear different from the ones that came before because the introduction rules ∨I1 and ∨I2 appear above the elimination rule ∨E instead of below by the typographic structure of the proof tree, but still the introductions logically occur after the elimination by the meaning of the proof tree. Demonstrating local soundness and completeness for the inference rules of the nullary connectives for truth and falsehood may be deceptively basic. Since there is no >E rule, local soundness of the > inference rules is trivially true: there is no possible way to have a proof where >I is followed by >E because there is no >E rule, and so local soundness is vacuous. Likewise, the local soundness of the ⊥ inference rules is trivially true because there is no ⊥I rule, so soundness is again vacuous. However, we still have to demonstrate local completeness by transforming arbitrary proofs of ` > and ` ⊥ into ones that apply all possible introduction and elimination rules for the connectives. In the case of >, because the >I rule is always available, this transformation just throws away the original, unnecessary proof and replaces it with just >I: .... D` > ≺ > >I In the case of ⊥, because ⊥E only requires a proof of ` ⊥ as its premise, this transformation just adds on a final ⊥E inference: .... D` ⊥ ≺ .... D` ⊥ ` ⊥ ⊥E Note that both of these transformations are nullary versions of local completeness for logical conjunction and disjunction illustrated above. Therefore, we can be sure that the inference rules for > and ⊥ are sensible. 22 Finally, the soundness and completeness of the quantifiers relies on the additional side conditions on their inference rules restricting the allowable free proposition variables. For the local soundness of the inference rules for universal quantification, we can reduce ∀I immediately followed by ∀E as follows: .... D (X /∈ FV (∗))` A ` ∀X.A ∀IX A {B/X} ∀E  .... D {B/X} ` A {B/X} Note that in order to perform the reduction and get the same conclusion, we must substitute B for X in the entire proof D. The fact that X is not free in any of the open leaves in the proof D (which is a required condition of the premise of ∀IX) means that those leaves are left unchanged by the substitution, so that the overall fringe of the proof tree follows the same pattern. For the local completeness of the inference rules for universal quantification, we can expand an arbitrary proof D of ` ∀X.A as follows: .... D (X /∈ FV (∗))` ∀X.A ≺ .... D (X /∈ FV (∗))` ∀X.A ` A ∀E ` ∀X.A ∀IX Note that since there are countably many proposition variables, we can pick some X which does not appear in the leaves of D without loss of generality since the choice doesn’t matter (because we can always rename the bound X in ∀X.A by α equivalence as necessary), which lets us satisfy the side condition imposed by the ∀IX rule. The local soundness and completeness of the inference rules for existential quantification combines ideas previously seen in disjunction and universal quantification. We can reduce ∃I immediately followed by ∃E as follows: .... D ` A {B/X} ` ∃X.A ∃I ` A x.... E (X /∈ FV (∗))` C (X /∈ FV (C)) ` C ∃EX,x  .... D ` A {B/X}.... E {B/X,D/x}` C Note how this reduction involves two different kinds of substitution: substituting the proposition B for the proposition variable X and substituting the proof D for the local axiom x. The side condition that X is not free in C, nor in the leaves of E , is 23 important to make sure that the conclusion and leaves remain the same after the substitutions. We can expand an arbitrary proof D of ` ∃X.A as follows: .... D` ∃X.A ≺ .... D` ∃X.A ` A x ` ∃X.A ∃I ` ∃X.A ∃EX,x which follows the general pattern of the disjunction completeness expansion, but notice that the conclusion ` ∃X.A and right sub-proof follows the extra side conditions about the free proposition variable X. The λ-Calculus The λ-calculus, first defined by Church in the 1930s, is a remarkably simple yet powerful model of computation. The original language of terms (denoted by M,N) is defined by only three parts: abstracting a program with respect to a parameter (i.e. a function term: λx.M), reference to a parameter (i.e. a variable term: x), and applying a program to an argument (i.e. a function application term: M N). Despite this simple list of features, the untyped λ-calculus is a complete model of computation equivalent to Turing machines. It is often used as a foundation for understanding the static and dynamic semantics of programming languages as well as a platform to experiment with new language features. In particular, functional programming languages are sometimes thought of as notational convenience that desugars to an underlying core language based on the λ-calculus. Dynamic semantics The dynamic behavior of the λ-calculus is defined by three principles. The most basic principle is called the α law or α equivalence, and it asserts that the particular choice of names for bound variables does not matter; the defining characteristic for a variable is where it was introduced, enforcing a notion of static scope. We already saw the principle of α equivalence arise for logical quantifiers in Section 2.1, and the same idea helps understand the meaning of functions as λ-abstractions λx.M which bind the variable x in M . For instance, the identity function that immediately returns its argument unchanged may be written as either λx.x or λy.y, both of which are considered α equivalent which is written λx.x =α λy.y. As with the logical quantifiers, 24 we will never be more discerning of λ-calculus terms than α equivalence: if M =α N then we will always treat M and N as the “same” term. The other dynamic principles of the λ-calculus deserve a more explicit treatment because of how drastically they can alter terms. For this purpose, we will employ rules that explain how to rewrite one λ-calculus term into another. More specifically, a rewriting rule R, written M R N and pronounced “M rewrites (by R) to N ,” is a binary relation between terms. Rewriting rules can be combined by offering a choice between them, so thatM RS N , pronounced “M rewrites (by R or S) to N ,” whenever M R N or M S N . We also denote the inverse rewriting rule by flipping the direction of the  relation symbol, so that N ≺R M exactly when M R N . The second principle is called the β law orβ reduction, and it provides the primary computational force of the λ-calculus. Given a λ-abstraction (i.e. a term of the form λx.M) that is applied to an argument, we may calculate the result by substituting the argument for every reference to the λ-abstraction’s parameter: (λx.M) N β M {N/x} The term M {N/x} is notation for performing capture-avoiding substitution of the term N for the free occurrences of variable x in M , such that the static bindings of variables are preserved.3 The third principle is called the η law orη expansion, and it imbues functions with a form of extensionality. In essence, a λ-abstraction that does nothing but forward its parameter to another function the same as that original function: M ≺η (λx.M x) (x /∈ FV (M)) Note that this rule is restricted so that M may not refer to the variable x introduced by the abstraction, denoted by the function FV (M) that computes the set of free variables in v, again to preserve static binding. 3As before, more details about α equivalence and capture-avoiding substitution in the λ-calculus are given by Barendregt (1985) and Pierce (2002). 25 Even though the λ-calculus with just functions alone is sufficient for modeling all computable functions, it is often useful to enrich the language with other constructs. For instance, we may add pairs to the λ-calculus by giving a way to build a pair out of two other terms, (M,N), as well as projecting out the first and second components from a pair, pi1(M) and pi2(M). We may define the dynamic behavior of pairs in the λ-calculus similarly to the way we did for functions. Since pairs do not introduce any parameters, they are a bit simpler than functions. The main computational principle, by analogy called β reduction for pairs, extracts a component out of a pair when it is demanded: pi1 (M,N) β M pi2 (M,N) β N The extensionality principle, here called η expansion for pairs, expands a termM with the pair formed out of the first and second components of M : M ≺η (pi1(M), pi2(M)) Along with pairs, we can add a unit value to the λ-calculus, which is a nullary form of pair containing no elements, written (), that expresses a lack of any interesting information. On the one hand, since the unit value contains no elements, there are no projections out of it, and therefore it has no meaningful β reduction. On the other hand, the extensionality principle is quite strong, and the η expansion for the unit replaces any term M with the canonical unit value: M ≺η () This rule can be read as the nullary version of the η rule for pairs, where M did not contain any interesting information, and so it is irrelevant. We can also add explicit choice to the λ-calculus by extending the language with (tagged, disjoint) unions, which are like boolean values that carry some extra information. First, we add the two ways to build a value of the union by tagging a term with our choice, either ι1 (M) or ι2 (M). Second, we add the method of using a tagged union by performing case analysis, caseM of ι1 (x1) ⇒ N1 | ι1 (x2) ⇒ N2, that checks the discriminant M to pick which branch ι1 (x1) ⇒ N1 or ι2 (x2) ⇒ N2 to pursue. Since the term for case analysis introduces variables like function terms 26 do, the dynamic behavior of tagged unions also relies on substitution. The main computational principle of β reduction for tagged unions checks which of the two tags were used to build the discriminant and then extracts the payload of the union by binding it to a variable within the term of the corresponding branch: case ι1 (M)of ι1 (x1)⇒ N1 ι2 (x2)⇒ N2 β N1 {M/x1} case ι1 (M)of ι1 (x1)⇒ N1 ι2 (x2)⇒ N2 β N2 {M/x2} The extensionality principle of η expansion for tagged unions says that every tagged union value must be constructed by one of the two possible tagging methods by expanding a term M with one that is computed by using case analysis on M to determine which tag was chosen and then returning the same payload and tag: M ≺η caseM of ι1 (x1)⇒ ι1 (x1) ι2 (x2)⇒ ι2 (x2) As before, we can add the nullary form of the binary tagged unions which represent an impossible void value: since tagged unions provide a choice of two ways to build results, there is no way to build a void result. To go along with impossible results, we also have an empty case analysis void terms, caseM of , which will explicitly never produce any answer because a void term M cannot produce an answer. Like with units, there is no meaningful β reduction for void expressions because there is no void value for the empty case analysis to inspect. However, the extensionality principle is again strong, as it asserts that there is no value of the void type by explicitly discards any potential result a void term M might return through an empty case analysis: M ≺η caseM of This rule can be understood as the nullary version of the η rule for tagged unions, where there are no possible options for the program to proceed. Intuitively, there should be no way to encounter a void term during evaluation, since there are no ways to create void results, and so this η rule explicitly acknowledges that a void term M can only exist in a dead code branch and its results are therefore irrelevant. 27 Remark 2.3. A basic rewriting rule like R does not necessarily confer any general properties about the relation, so we systematically denote the enrichment of a rewriting relation with useful closure properties by changing the shape of the relation symbol . First off, we have general R reduction, denoted by M →R N and pronounced “M R-reduces to N ,” which is the compatible closure of R allowing for the R rule to be applied in any context within M . Syntactically, a context (denoted by C) is a λ-calculus term with a single hole (denoted by ), and we can plug a term M into a context C (written as the operation C[M ]) by replacing the  in C with M . In terms of contexts, general R reduction is defined as the smallest relation →R that includes R and is closed under compatibility (comp) as follows: M R N M →R N M →R N C[M ]→R C[N ] comp Unlike the capture-avoiding substitution operation M {N/x}, plugging a term M into a context C might capture free variables of the term, so that even if x is free in M , x might not be free in C[M ]. As a consequence, α equivalence does not commute with context filling in the same way that it commutes with capture-avoiding substitution. For example, we might say that λx. =α λy., but (λx.)[x] = λx.x 6=α λy.x = (λy.)[x]. Next up, we have the R reduction theory (or R rewriting theory), denoted by M R N , which is the reflexive-transitive closure of →R allowing for zero or more repetitions ofR reductions. The R reduction theory is defined as the smallest relation R that includes →R and is closed under reflexivity (refl) and transitivity (trans) as follows: M →R N M R N M R M refl M R M ′ M ′ R N M R N trans Note that above definition of R is the same as taking the compatible-reflexive- transitive closure of R directly. For the most generality, we have the R equational theory, denoted by M =R N and pronounced as “M R-equals N ,” which is the symmetric-transitive closure ofR that allows for reductions to be applied in both directions as many times as desired. The R equational theory is defined as the smallest relation =R that includes R and 28 is closed under symmetry (symm) and transitivity (trans) as follows: M R N M =R N N =R M M =R N symm M =R M ′ M ′ =R N M =R N trans Note that the above definition of =R is the same as taking the compatible-reflexive- symmetric-transitive closure of R directly. Finally, we have R operational reduction, denoted by M 7→R N , which gives us the R operational semantics, denoted by M 7→ R N , as the reflexive-transitive closure of 7→R. Both of these are restrictions on the above more general reduction relations: R operational reduction is a limited form of general R reduction and the R operational semantics is a limited form of the R reduction theory. The purpose of the operational semantics is to specify how programs are to be executed by specifying a clear order on when each reduction step of the program occurs; there should be enough possible reductions to reach a result, but not so many that there are gratuitously many choices for what to do at every step. This ordering for selecting the next reduction step can be achieved by restricting compatibility, which allowed reduction to occur in any context, to only allowing reduction to occur in a specially chosen subset of contexts called evaluation contexts, usually denoted by the variable E. Given a choice of evaluation contexts, R operational reduction and the R operational semantics are defined as the smalled relations 7→R and 7→ R closed under the following rules: M R N E[M ] 7→R E[N ] eval M 7→R N M 7→ R N M 7→ RM refl M 7→ RM ′ M ′ 7→ R N M 7→ R N trans Since we have to make a choice for which contexts are evaluation contexts, there can be many possible operational semantics for a given language. As an example, we can define a call-by-name operational semantics 7→ β for our λ-calculus discussed so far by choosing the following evaluation contexts: E ∈ EvalCxt ::=  | pi1(E) | pi2(E) | E N | (caseE of ) | (caseE of ι1 (x)⇒ N1 | ι2 (y)⇒ N2) by using the family of operational β laws. As with a basic rewriting rule R, we denote the inverse of the directed reduction relations→R,R, 7→R, and 7→ R by flipping the direction of the arrow, so that N ←R 29 M if and only if M →R N and so on. Since the equational theory =R is symmetric, it is undirected, so it is its own inverse. End remark 2.3. Static semantics So far, we have only considered the dynamic meaning of the λ-calculus without any mention of its static properties. In particular, now that we have both functions and pairs, we may want to statically check and rule out programs that might “go wrong” during calculation. For instance, if we apply a pair to an argument, (x, y) z, then there is nothing we can do to reduce this program any further. Likewise, it is nonsensical to ask for the second component of a function, pi2(λx.x). We may rule out such ill-behaved programs by using a type system which guarantees that such situations never occur by assigning a type to every term and ensuring that programs are used in accordance to their types. For instance, we may give a function type, A→ B, to λ-abstractions as follows: x : A x.... M : B λx.M : A→ B →Ix where λx.M : A→ B means that the function λx.M has type A→ B. The premise to this rule requires that M has type B assuming that all free occurrences of x in M have type A. Since the variable x bound in the conclusion, it is closed off by the premise of the rule because the type values that x can stand for in M has nothing to do with any other x that might occur elsewhere in a larger term. Having given a rule for introducing a term of function type, we can now restrict application to only occur for terms of the correct type: M : A→ B N : A M N : B →E This rule ensures that if we apply a term M to an argument, then M must have a function type. Likewise, we may give a product type, A×B, to the creation of a pairs M : A N : B (M,N) : A×B ×I 30 as well as limiting first and second projection to terms of a product type: M : A×B pi1(M) : A ×E1 M : A×Bpi2(M) : B ×E2 The unit type, 1, is a degenerate form of product types with a single canonical value () : 1 1I and no other typing rules. Tagged unions belong to sum types, A+B, which has two different rules for the creation of the two distinctly tagged values: M : A ι1 (M) : A+B +I1 M : Bι2 (M) : A+B +I2 The case analysis term for sum types has the most complex rule, requiring three premises (one for the discriminant and two for the branches), two of which bind variables which appear free in their respective sub-terms just like in the rule for λ-abstractions: M : A+B x1 : A x1 .... N1 : C x2 : B x2 .... N2 : C (caseM of ι1 (x1)⇒ N1 | ι1 (x2)⇒ N2) : C +Ex1,x2 This rule says that a case analysis expression on a term M with the sum type A+B has a result of type C if the terms N1 and N2 in both branches have the type C, under the assumption that all free occurrences of x1 in N1 has type A and all free occurrences of x2 in N2 has type B. The void type, 0, is a degenerate form of sum types with no possible values and one case analysis term following the typing rule M : 0 caseM of : C 0E which says that the result of an empty case analysis on a term M of type 0 can be said to have any type C because there will never be any result. With all these rules in place, nonsensical programs like pi1(λx.x) are now ruled out, since they cannot be given a type. The static semantics (i.e. the typing rules) and 31 the dynamic semantics (i.e. the reduction and expansion relationships) of this simply typed λ-calculus are summarized in Figure 2.3. Note that the η laws, if left unchecked, have the potential to cause unwanted relationships between terms. The different ways that η has the potential to cause problems can be very subtle (Klop & de Vrijer, 1989), but the issue is most clearly seen for units. In particular, η1 expansion for units says that any term can be replaced with unit value (). But this apparently far-reaching law is clearly nonsensical for representing programs: if every possible program is just () then there’s no point in evaluating anything because there is never an interesting answer! The other direction is not much better; η1 reduction says that the unit value () could just as well become anything else, leading to many different conflicting answers whenever we encounter a unit value. This conundrum is somewhat self-imposed, however: clearly the η1 law shouldn’t apply to every term, but only to terms we expect will result in a unit value anyway. Therefore, the η laws are all restricted to apply only to terms of an appropriate type, so for example the η1 law only expands terms of type 1 with (). This creates an interesting split in the relationships between terms, where we have the β laws that do not depend on types, so that they still make sense for reasoning about untyped terms, in contrast with the η laws that do depend on types to make sense, so that they require typing information to ensure that they are correctly applied. Remark 2.4. We should note that some care needs to be taken during a type derivation to make sure that the distinction between variables in different scopes is clear. For example, consider the following typing derivation of the function λx.λx.x: x : A x λx.x : B → A →Ix λx.λx.x : A→ B → A →Ix This typing derivation is not valid! In particular, note that the function λx.λx.x is α equivalent to λx.λy.y by renaming the second bound variable, which represents a binary function that returns its second argument. The problem is that by rebinding the same variable x within the same scope, it is easy to have confusion about which of the two arguments is meant when referring to x. This is why typing rules like →I for terms which bind variables introduce a new scope in their premise to prevent this 32 X, Y, Z ∈ TypeVariable ::= . . . A,B,C ∈ Type ::= X | 1 | 0 | A×B | A+B | A→ B x, y, z ∈ Variable ::= . . . M,N ∈ Term ::= x | () | caseM of | (M,N) | pi1(M) | pi2(M) | ι1 (M) | ι2 (M) | (caseM of ι1 (x)⇒ N1 | ι2 (y)⇒ N2) | λx.M |M N H, J ∈ Judgement ::= M : A () : 1 1I no 1E rule no 0I rule M : 0 caseM of : C 0E M : A N : B (M,N) : A×B ×I M : A×B pi1(M) : A ×E1 M : A×Bpi2(M) : B ×E2 M : A ι1 (M) : A+B +I1 M : Bι2 (M) : A+B +I2 M : A+B x : A x.... N1 : C y : B y .... N2 : C caseM of ι1 (x)⇒ N1|ι2 (y)⇒ N2 : C +Ex,y x : A x.... M : B λx.M : A→ B →Ix M : A→ B N : A M N : B →E (β1) no rule (η1) M : 1 ≺η () (β0) no rule (η0) M : 0 ≺η caseM of (β×) pii(M1,M2) β Mi (η×) M : A×B ≺η (pi1(M), pi2(M)) (β+) case ιi (M)of ι1 (x1)⇒ N1 ι2 (x2)⇒ N2 β Ni {M/xi} (η+) M : A+B ≺η caseM of ι1 (x1)⇒ ι1 (x1) ι2 (x2)⇒ ι2 (x2) (β→) (λx.M) N β M {N/x} (η→) M : A→B ≺η λx.M x (x /∈ FV (M)) FIGURE 2.3. The simply typed λ-calculus: with unit (1), void (0), product (×), sum (+), and function (→) types. 33 confusion. In particular, the typing derivation for the sub-term λx.x is: x : B x λx.x : B → B →Ix In this derivation, the variable x is already closed off, because it is bound by the λ-abstraction in the conclusion. Therefore, when we continue the derivation to type the outer λ-abstraction, the type of the bound reference of x is already fixed, and cannot be changed as in x : B x λx.x : B → B →Ix λx.λx.x : A→ B → B →Ix which is the correct typing derivation for this term. End remark 2.4. Example 2.2. For an example of how to program in the λ-calculus, consider the following function which takes a nested pair, of type (A×B)×C, and swaps the inner first and second components, while discarding the outer component: λx. (pi2(pi1(x)), pi1(pi1(x))) We can check that this function is indeed well-typed, using the typing rules given in Figure 2.3, by the constructing the typing derivation in Figure 2.4. Notice how the derivation bears a close structural resemblance to the proof of ` ((A ∧B) ∧ C) ⊃ (B ∧ A) given in Figure 2.2 of Example 2.1. In addition, we can check that this function behaves as intended by applying it to a nested pair, x : (A×B)× C x pi1(x) : A×B ×E1 pi2(pi1(x)) : B ×E2 x : (A×B)× C x pi1(x) : A×B ×E1 pi1(pi1(x)) : A ×E1 (pi2(pi1(x)), pi1(pi1(x))) : B × A ×I λx. (pi2(pi1(x)), pi1(pi1(x))) : ((A×B)× C)→ (B × A) →Ix FIGURE 2.4. Typing derivation of the λ-calculus term λx. (pi2(pi1(x)), pi1(pi1(x))). 34 ((M1,M2),M3), and evaluating it with the reductions given in Figure 2.3: (λx. (pi2(pi1(x)), pi1(pi1(x)))) ((M1,M2),M3) →β→ (pi2(pi1((M1,M2),M3)), pi1(pi1((M1,M2),M3))) β× (pi2(M1,M2), pi1 (M1,M2)) β× (M2,M1) which confirms that this is the function we wanted. End example 2.2. Type abstraction If we only stick to typed terms, then the language we have described so far is rather rigid and painful to use because every term must have a fixed specific type even if it doesn’t matter. For example, the identity function λx.x, which just returns its given input, works uniformly for values of any type. However, it must be given a single type like Int → Int or String → String, meaning that the integer and string identity functions must be defined separately even though their definition is the same. Statically typed programming languages combat this useless redundancy with features called polymorphism or generics that correspond to universal types in the λ-calculus, which has been co-discovered in Girard’s (1971) system F and Reynolds’s (1974) polymorphic λ-calculus. The main idea is to let generic terms abstract over type variables, so that we have the term ΛX.M similar to the λ-abstractions that represent functions, and to specialize generic terms to specific types, so that we have the term M A similar to function application. The computational β reduction for polymorphism also mimics functions by substituting the specialized type for the abstracted type variable: (ΛX.M) A β M {A/X} Likewise, the extensional η expansion for polymorphism says that a generic term that just immediately specializes another generic term M with its applied type is the same as M : M ≺η ΛX.M X 35 These generic terms can be given a universal type of the form ∀X.A. Specialization of generic terms just involves plugging in the applied type for the variable X in the result, but the typing rule for abstraction is more tricky: .... (X /∈ FV (∗)) M : A ΛX.M : ∀X.A ∀IX M : ∀X.A M B : A {B/X} ∀E The ∀I rule imposes a side condition on its premise, X /∈ FV (∗), which says that the type variable X cannot appear in the type of any free variable of M . With universal types, we can finally give a single, polymorphic definition of the identity function once and for all, ΛX.λx.x : ∀X.X → X, which is typed as follows: x : X x λx.x : X → X →Ix ΛX.λx.x : ∀X.X → X ∀IX There is another complementary form of type abstraction with a very different purpose in programming languages. For the sake of supporting more modular programs, many typed languages allow for modules or other basic program units to hide some of their representation. That way, the implementor of the module may use details of its representation, but users of the module can only see the public interface do not have access to these private details since peaking into the private details of a module’s implementation would break the abstraction and prevent the user code from linking with a different implementation. For example, we might have a module for integer sets with four components in its public interface: the empty set, a function for creating the singleton set of a given integer, a union function, and a membership function that decides if an integer is in the set. Now there are many different ways that a program could represent integer sets—arrays, linked lists, hash tables, balanced trees, higher-order functions, etc.—but the code which uses integer sets should be independent of the implementors choice of representation so that it can plug in with several different implementations of the same public interface. This type of abstraction can be modeled by existential types that make a choice of type private to a small fragment of the overall program. For our example of integer sets, their interface is described by the type ∃X.X × (Int→ X)× (X → X → X)× (Int→ X → Bool) 36 where the ∃ abstracts over a private type denoted by the variable X, and the four components of the public interface are given by the four components of the product: the empty set of type X, the singleton function of type Int→ X, the union function of type X → X → X, and the membership function of type Int→ X → Bool. How do we write programs with existential types? To be explicit about when we are abstracting over a private type A used within a term M , we can package them together as A@M where the term is tagged with its private type. We can then use a packaged term by employing a new form of case analysis, caseM of X@y ⇒ N , which locally unpacks M and separates out its private type (bound to the type variable X) from the contents (bound to the variable y) for the purpose of evaluating the result of N . The computational β reduction for existential types unpacks a type-packaged term A@M that is in the eye of case analysis, substituting the concrete type A and the implementation M for the abstract type variable X and the reference x within their local scope: caseA@M of X @ x⇒ N β N {A/X,N/x} The extensional η principle for existential types says that every value of an existential type must be a type-packaged value by expanding an existential term M into one that is computed by unpacking M to extract its private type and value, only to return a new package with the same type and value: M ≺η caseM of X @ x⇒ X @ x This form of existential type abstraction for packages can be enforced with the following typing rules: M : A {B/X} B @M : ∃X.A ∃I M : ∃X.A x : A x.... (X /∈ FV (∗)) N : C (X /∈ FV (C)) (caseM of X @ x⇒ N) : C ∃EX,x To form a new package B@M : ∃X.A, we only need to check that the underlying term M does indeed implement a program of type A with the chosen type B substituted for X. Unpacking a type abstraction is more complex, as we need to ensure that the hidden type information cannot “leak” outside its scope. Therefore, the generic 37 A,B,C ∈ Type ::= . . . | ∀X.A | ∃X.A M,N ∈ Term ::= . . . | ΛX.M |M A | A@M | caseM of X @ x⇒ N .... X /∈ FV (∗) M : A ΛX.M : ∀X.A ∀IX M : ∀X.A M B : A {B/X} ∀E M : A {B/X} B @M : ∃X.A ∃I M : ∃X.A x : B x.... X /∈ FV (∗) N : C X /∈ FV (C) caseM of X @ x⇒ N : C ∃EX,x (β∀) (ΛX.M) A β M {A/X} (η∀) M : ∀X.A ≺η ΛX.M X (X /∈ FV (M)) (β∃) caseA@M of X @ x⇒ N β N {A/X,M/x} (η ∃) M : ∃X.A ≺η caseM of X @ x⇒ X @ x FIGURE 2.5. The polymorphic λ-calculus (i.e. system F): extending the simply typed λ-calculus with universal (∀) and existential (∃) type abstraction. type variable X that is brought into scope by the case analysis cannot appear in the types of any other free variables (besides the corresponding variable x) in its scope. Additionally, the generic type X bound by the unpacking case analysis cannot appear in the return type C, which is the other source of potential leak. The static and dynamic semantics of the universal (∀) and existential (∃) forms of type abstraction are summarized in Figure 2.5, which extends the simply typed λ-calculus from Figure 2.3 to be a full-fledged model of statically typed (functional) programming languages. Proofs as Programs Amazingly, despite their different origins and presentations, both the systems have a close, one-for-one correspondence to each other. Example 2.1 and Example 2.2 correspond to different ways of expressing the same idea. Both natural deduction and the λ-calculus end up revealing the same underlying ideas in different ways. The propositions of natural deduction are isomorphic to the types of the λ-calculus, where 38 conjunctions are the same as pair types, disjunctions are the same as sum types, implications are the same as function types, logical truth and falsehood is the same as the unit and void types, and the two quantifiers are the same in both systems. Furthermore, the proofs of natural deduction are isomorphic to the (typed) terms of the λ-calculus. This structural similarity between the two systems gives us the slogan, “proofs as programs and propositions as types.” From this point of view, natural deduction may be seen as the essence of the type system for the λ-calculus and the λ-calculus may be seen as a more concise term language for expressing proofs in natural deduction. For this reason, we may say that the λ-calculus is a natural deduction language. The correspondence between these two systems is not just between their syntax and static structures, but also extends to the dynamic properties as well. Local soundness and completeness in natural deduction are exactly the same as the β and η laws of terms in the λ-calculus, respectively, for all the discussed types: functions, products, sums, unit, void, universal, and existential types. Therefore, it is no coincidence that the β and η rules for functions in the λ-calculus appeared as they originally did, or that conjunction and disjunction have their given introduction and elimination rules in NJ. Effectively, both the study of logic and the study of computability have lead mathematicians to (re)discover different perspectives of the same essential phenomena (Wadler, 2015). Surprisingly, there is also a third entity in this correspondence: an algebraic structure known as Cartesian closed categories (Lambek & Scott, 1986). In general, a category is made up of: – some objects A, B and C (“points”), – some morphisms between those objects (“arrows”), a morphism f from A to B is written f : A→ B, – a trivial morphism from every object to itself (“identity”), and – the ability to chain together any two morphisms passing through the same object (“composition”). Given f : A → B and g : B → C then g ◦ f is a morphism from A to C , along with some laws about identity and composition. And Cartesian closed categories in particular are also guaranteed to have some special objects: a terminal object 1, a 39 product object A×B for any objects A and B, and an exponential object BA for any objects A and B. As it turns out, the terminal (1), product (A×B), and exponential (BA) objects correspond to unit (1), pair (A × B), and function (A → B) types in the λ-calculus and to truth (>), conjunction (A ∧ B), and implication (A ⊃ B) in natural deduction, respectively. Cartesian closed categories may be seen as a variable- free presentation of the λ-calculus, where λ-abstractions (which bind variables) are replaced by primitive functions. Furthermore, the categorical concept of the initial object (0) and sums of objects (A+B) correspond with the empty (0) and sum (A+B) types and with logical falsehood (⊥) and disjunction (A ∨B), respectively. Since the same idea has been stumbled upon three different times from three different angles, the connection between proofs and programs cannot be a simple coincidence. A Critical Look at the λ-Calculus The Curry-Howard isomorphism lead to striking discoveries and developments that likely would not have arisen otherwise. The connection between logic and programming languages led to the development of mechanized proof assistants, notably the Coq system (Coquand, 1985), which are used in both the security and verification communities for validating the correctness of programs. The connection between category theory and programming languages suggested a new compilation technique for ML (Cousineau et al., 1987). However, let us now look at the λ-calculus with a more critical eye. There are some defining principles and computational phenomena that are important to programming languages, but are not addressed by the λ-calculus. For example, what about: – Duality? The concept of duality is important in category theory where it comes for free as a consequence of the presentation. Since the morphisms in category theory have a direction, we can just “flip all the arrows” to find its dual without any effort or creativity on our part. This action gives us a straightforward method to find the dual of any category or diagram. For example, consider the diagram that describes products in categorical terms: C A A×B B f g !(f,g) pi1 pi2 40 Here, for any two objects,A and B, and morphisms, f , and g, there is the product A×B object with the projection morphisms pi1 and pi2 out of the product and a unique morphism into the product. The description of sums pops out for free by just turning that diagram around: A A+B B C ι1 f ![f,g] ι2 g Now, the two projections have become two injections, ι1 and ι2, into the sum object and we have a unique morphism out of the sum for any f and g. Duality also appears in logic, for example in the traditional De Morgan laws like ¬(A ∨B) = (¬A) ∧ (¬B). Predictably, the corresponding concept of a sum object (the dual of a product) in logic is disjunction (the dual of a conjunction). If we look at the rules of NJ from Figure 2.1, the introduction rules for A ∨B bear a resemblance to the elimination rules for A∧B: one is just flipped upside- down from the other. However, the elimination rule for disjunction is quite different from the introduction for conjunction. This dissimilarity comes from the asymmetry in natural deduction. We may have many premises, but only a single conclusion. It seems like a more symmetrical system of logic would be easier to methodically determine duality just like in category theory. Likewise, this form of duality is not readily apparent in the λ-calculus. Since the λ-calculus is isomorphic to NJ, it shares the same biases and lack of symmetry. The emphasis of the language is entirely on the production of information: a λ-abstraction produces a function, a function application produces the result, etc. For this reason, the relationship between a pair, (M,N), and case analysis on tagged unions, caseM of ι1 (x) ⇒ N1|ι2 (y) ⇒ N2, is not entirely obvious. For this reason, we would like to study a language which expresses duality “for free,” and which corresponds to a more symmetrical system of logic. – Evaluation strategy? Reynolds (1998) observed that while functional or applicative languages may be based on the λ-calculus, the true λ-calculus implies a lazy (call-by-name) evaluation order, whereas many languages are evaluated by a strict (call-by-value) order that first reduces arguments before performing a function call. 41 To resolve this mismatch between the λ-calculus and strict programming languages, Plotkin (1975) defined a call-by-value variant of the λ-calculus along with a continuation-passing style (CPS) transformation that embeds the evaluation order into the program itself. Sabry & Felleisen (1993) give a complete set of equations for reasoning about the call-by-value λ-calculus based on Fischer’s (1993) call-by-value CPS transformation, and which corresponds to Moggi’s (1989) computational λ-calculus. The equations were later refined into a complete theory for call-by-value reduction by Sabry & Wadler (1997). More recently, there has been work on a theory for reasoning about call-by- need evaluation of the λ-calculus (Ariola et al., 1995; Ariola & Felleisen, 1997; Maraist et al., 1998), which is the strategy commonly employed by Haskell implementations, and the development of Levy’s (2001) call-by-push-value framework which includes both call-by-value and call-by-name evaluation but not call-by-need. Different evaluation strategies used by implementations of functional programming languages have been studied as different versions of the λ-calculus that embody the implementation, including various calculi for call-by-value (Plotkin, 1975; Moggi, 1989; Sabry & Felleisen, 1993; Sabry & Wadler, 1997) and call-by-need (Ariola et al., 1995; Ariola & Felleisen, 1997; Maraist et al., 1998) evaluation. What we would ultimately want is not just another calculus, but instead a framework that gives a clear justification of the evaluation strategies found in programming languages, and where the relationships between strategies can be naturally expressed. Can we have a logical foundation for programming languages that is naturally strict, in the same way that the λ-calculus is naturally lazy? And which readily accounts for programs that utilize more than one evaluation strategy in the same language? Can we express the duality between evaluation strategies Filinski (1989); Curien & Herbelin (2000); Wadler (2003) generically, between arbitrarily many pairs of strategies? – Object-oriented programming? The object-oriented paradigm has become a prominent part of the mainstream programming landscape. Unfortunately, what is meant by an “object” in the object-oriented sense is fuzzy, since the exact details of “what is an object” depend on choices made by the particular programming language. One concept of objects that is universal across every 42 programming language is dynamic dispatch which is used to select the behavior of a method call based on the value or type of an object. Dynamic dispatch is emphasized by Kay (1993) in the form of message passing in the design of Smalltalk. Abadi & Cardelli (1996) give a theoretical formulation for the many features of object-oriented languages, wherein dynamic dispatch plays a central role. Can we give an account of the essence of objects, and in particular messages and dispatch, that is connected to logic and category theory in the same way as the λ-calculus? Even more, can this foundation for objects refer back to basic principles discovered independently in the field of logic? – Control flow? Every programming language has some concept of control flow which can describe the order that instructions are executed, the flow of data dependencies between parts of a program, or the call-and-return protocol of functions. The λ-calculus serves as a wonderful formalization of for pure functions. However, many languages include additional computational effects, like exceptions, that let programs manipulate control flow in ways not possible with pure functions, and so they lie outside of the expressive power of the λ- calculus (Felleisen, 1991). For example, Scheme (Kelsey et al., 1998) is a language based on the λ-calculus that nonetheless has operators like callcc that reifies control flow as a first-class object, which follows a traditional approach for representing control flow by adding new primitives to the λ-calculus. Instead, we would rather understand the flow of control in a setting where it is naturally expressed as a consequence of the language, rather than added on as an afterthought. Surprisingly, certain programmatic manipulations of control flow, like Scheme’s callcc, correspond to axioms of classical logic (Griffin, 1990; Ariola & Herbelin, 2003). Since these classical reasoning principles are a well established part of logic, can we also have a corresponding language with a naturally classical representation of control as a first-class citizen? With the aim of answering each of these questions, we will put the λ-calculus aside and we look to another logical framework instead of natural deduction. Most surprisingly, we do not have to look very far, since Gentzen (1935a) introduced the sequent calculus along side natural deduction as an alternative system of formal logic. Gentzen developed sequent calculus in order to better understand the properties of natural deduction. Therefore, to answers these questions about programming, we will 43 look for the computational interpretation of the sequent calculus and its corresponding programming language. 44 CHAPTER III Sequent Calculus Natural deduction is not an only child; it was born with a twin sibling called the sequent calculus. One of Gentzen’s (1935a) ground-breaking insights with the sequent calculus is the use of its namesake sequents to organize the information we have about the various propositions in question. In its most general form, a sequent is a conditional conglomeration of propositions: A1, A2, . . . , An ` B1, B2, . . . , Bm pronounced “A1, A2, . . ., and An entail B1, B2, . . ., or Bm,” which states that assuming each of A1, A2, . . . , An are true then at least one of B1, B2, . . . , Bm must be true. The turnstyle (`) in the middle of the sequent separates the hypotheses on the left, which we collectively write as Γ, from the consequences on the right, which we collectively write as ∆. This separation between the left and right sides of the sequent gives the essential skeletal structure of the sequent calculus as a logic. As special cases, we can form several basic judgements about logical propositions using our above interpretation of the meaning of sequents by observing that an empty collection of hypotheses denotes “true” and an empty collection of consequences denotes “false.” A single consequence without hypotheses ` A means “A is true,”1 a single hypothesis without consequences A ` means “A is false,” and the empty sequent ` is a primitive contradiction “true entails false.” So already, the basic structure of the sequent gives us a language for speaking about truth, falsehood, and contradiction without knowing anything else about the logic at hand. 1Note how sequents gracefully extend the single judgement ` A of the NJ system of natural deduction, which only directly asserts the truth of propositions, so that statements of falsehood or contradiction must be represented indirectly through logical connectives like ` ⊥ (i.e. “false is true”) for contradiction and ` ¬A (i.e. “not A is true”) or ` A→ ⊥ (i.e. “A implies false is true”) for falsehood. A consequence of these indirect encodings is that simplified versions of NJ without a false connective ⊥ will have trouble speaking about contradictions, and likewise simplifications of NJ without negation ¬ will have trouble speaking about falsehoods. 45 Let’s now revisit the basic binary connectives—conjunction (A ∧B), disjunction (A ∨B), and implication (A ⊃ B)—by giving their meaning in terms of truth tables that describe the relationship between the truth of a compound proposition and the truth of its parts, as shown in Figure 3.1. Coupled with the interpretation of sequents, this interpretation of connectives gives a simple method of determining the validity of inference rules by checking if the conclusion does indeed follow from the premises. For example, we can validate the inference rules involving conjunction shown in Figure 3.2. Due to the interaction between entailment in the sequent (separating hypotheses from consequences) and the line of inference (separating premises from conclusions), we have two dimensions for orienting inference rules based on the location of their primary proposition (marked with a box in Figure 3.2). On the horizontal axis, rules where the primary proposition appears to the right or left of the turnstyle are called right and left rules, respectively. On the vertical axis, rules where the primary proposition appears below or above the line of inference are called introduction and elimination rules, respectively. This gives us four quadrants where the rules of inference for conjunction might live. – Right introduction: knowing that A is true and B is true is sufficient to conclude that A ∧B is true. – Right elimination: known that A ∧B is true is sufficient to conclude that A is true and likewise that B is true. – Left introduction: knowing that A is false is sufficient to conclude that A∧B is false, and likewise when B is false. – Left elimination: knowing that A ∧ B is false while both A and B are true is sufficient to deduce a contradiction, as this represents an impossible situation. Similar inference rules with similar readings can be given for disjunction and implication under the same right/left and introduction/elimination orientations as shown in Figure 3.3 and Figure 3.4. Notice how the extra judgemental structure provided by sequents allows for simpler versions of some of the particularly complex inference rules from natural deduction in Figure 2.1 that introduce localized assumptions to select premises. In contrast to the NJ inference rule ⊃I for right implication introduction which proves A→ B by introducing a local assumption that A is true (` A) in the premise which 46 A B A ∧B False False False False True False True False False True True True A B A ∨B False False False False True True True False True True True True A B A ⊃ B False False True False True True True False False True True True FIGURE 3.1. Truth tables for conjunction (∧), disjunction (∨), and implication (⊃). Left Right Elimination A ∧B ` ` A ` B ` ` A ∧B ` A ` A ∧B ` B Introduction A ` A ∧B ` B ` A ∧B ` ` A ` B ` A ∧B FIGURE 3.2. The orientation of deductions for conjunction (∧). Left Right Elimination A ∨B ` A ` A ∨B ` B ` ` A ∨B A ` B ` ` Introduction A ` B ` A ∨B ` ` A ` A ∨B ` B ` A ∨B FIGURE 3.3. The orientation of deductions for disjunction (∨). Left Right Elimination A ⊃ B ` ` A A ⊃ B ` B ` ` A ⊃ B ` A ` B Introduction ` A B ` A ⊃ B ` A ` B ` A ⊃ B FIGURE 3.4. The orientation of deductions for implication (⊃). 47 proves B is true (` B), the sequent-based right introduction rule in Figure 3.4 instead stores A as a hypothesis in the premise A ` B which asserts that A entails B, thereby reducing the implication connective to the implication built into the meaning of the turnstyle. Likewise, In contrast to the NJ inference rule ∨E for right disjunction elimination which introduces local assumptions for both possibilities A and B into two different premises, instead the sequent-based right elimination rule in Figure 3.3 stores the possibilities as hypotheses in the premises A ` and B ` which assert that A and B are false. With the dimensions of logical orientation illustrated in Figure 3.2, Figure 3.3, and Figure 3.4, we can identify one of the primary distinctions between natural deduction and the sequent calculus. Natural deduction is exclusively made up of right rules— including both right introduction and right elimination—and the sequent calculus is exclusively made up of introduction rules—including both right introduction and left introduction.2 Or in other words, natural deduction is concerned with deducing and using the truth of propositions, whereas the sequent calculus is concerned with introducing true and false applications of logical connectives. With this fundamental characterization of the sequent calculus in mind, we will delve into Gentzen’s LK: the original sequent-based logic. Gentzen’s LK Gentzen’s LK, a simple logic based extensively on the use of sequents to trace local hypotheses and consequences throughout a proof, is given in Figure 3.5. The sequents are built out of (ordered) lists of propositions Γ and ∆, and the inference rules let us build proof trees by stacking inferences on top of one another. We include all the same connectives in LK as we had in NJ: the nullary constants > and ⊥, the binary operators ∧, ∨, and ⊃, and quantifiers ∀ and ∃. Additionally, notice that negation is included as a full-fledged unary connective ¬A, whose logical inference rules are easy to define in terms of sequents, instead of encoding it with implication and falsehood as in NJ. The various inference rules of LK can be thought of in three groups that collectively work toward different objectives. The first group, containing just the axiom (Ax) and cut (Cut) rules, gives the core of LK. The Ax rule lets us draw consequences 2But no one, it seems, is interested in left eliminations. A rare exception is the stack calculus (Carraro et al., 2012) which characterizes implication entirely by left rules only. 48 X, Y, Z ∈ PropVariable ::= . . . A,B,C ∈ Proposition ::= X | > | ⊥ | A ∧B | A ∨B | ¬A | A ⊃ B | ∀X.A | ∃X.A Γ ∈ Hypothesis ::= A1, . . . , An ∆ ∈ Consequence ::= A1, . . . , An Judgement ::= Γ ` ∆ Core rules: A ` A Ax Γ ` A,∆ Γ′, A ` ∆′ Γ′,Γ ` ∆′,∆ Cut Logical rules: Γ ` >,∆ >R no >L rule no ⊥R rule Γ,⊥ ` ∆ ⊥L Γ ` A,∆ Γ ` B,∆ Γ ` A ∧B,∆ ∧R Γ, A ` ∆ Γ, A ∧B ` ∆ ∧L1 Γ, B ` ∆ Γ, A ∧B ` ∆ ∧L2 Γ ` A,∆ Γ ` A ∨B,∆ ∨R1 Γ ` B,∆ Γ ` A ∨B,∆ ∨R2 Γ, A ` ∆ Γ, B ` ∆ Γ, A ∨B ` ∆ ∨L Γ, A ` ∆ Γ ` ¬A,∆ ¬R Γ ` A,∆ Γ,¬A ` ∆ ¬L Γ, A ` B,∆ Γ ` A ⊃ B,∆ ⊃R Γ ` A,∆ Γ′, B ` ∆′ Γ′,Γ, A ⊃ B ` ∆′,∆ ⊃L Γ ` A,∆ X /∈ FV (Γ ` ∆) Γ ` ∀X.A,∆ ∀R Γ, A {B/X} ` ∆ Γ,∀X.A ` ∆ ∀L Γ ` A {B/X} ,∆ Γ ` ∃X.A,∆ ∃R Γ, A ` ∆ X /∈ FV (Γ ` ∆) Γ,∃X.A ` ∆ ∃L Structural rules: Γ ` ∆ Γ ` A,∆ WR Γ ` ∆ Γ, A ` ∆ WL Γ ` A,A,∆ Γ ` A,∆ CR Γ, A,A ` ∆ Γ, A ` ∆ CL Γ ` ∆, A,B,∆′ Γ ` ∆, B,A,∆′ XR Γ′, B,A,Γ ` ∆ Γ′, A,B,Γ ` ∆ XL FIGURE 3.5. The LK sequent calculus for second-order propositional logic: with truth (>), falsehood (⊥), conjunction (∧), disjunction (∨), negation (¬), implication (⊃), and both universal (∀) and existential (∃) propositional quantification. 49 from hypotheses with the understanding that “A entails A” for any proposition A. The Cut rule lets us eliminate intermediate propositions from a proof. For example, the special case of the Cut rule where the hypothesis Γ and Γ′ and consequences ∆ and ∆′ are all empty is: ` A A ` ` Cut In other words, if we know that a proposition A is both true ( ` A) and false (A ` ), then we can conclude that a contradiction has taken place ( ` ). We can then use the intuitive reading of sequents to extend this reasoning to the general form of Cut, meaning that it is valid to allow additional hypotheses and alternate consequences in both premises when eliminating a proposition in this fashion so long as they are all gathered together in the resulting conclusion. If Γ entails either A or ∆, and both Γ′ and A entails ∆′, then both Γ′ and Γ entails either ∆′ or ∆ by cases on which of A or ∆ is entailed by Γ: if A is a consequence of Γ, then ∆′ is a consequence of the combination of A and Γ′, otherwise ∆ must be a consequence of Γ. Both Ax and Cut play an important part in the overall structure of LK proof trees. The Ax serves as the primitive leaves of the proof, signifying that there is nothing interesting to justify because we have just what is needed. The Cut lets us use auxiliary proofs or “lemmas” without them appearing in the final conclusion, where on the one hand we show how to derive a proposition A as a consequence and on the other hand we assume A as a hypothesis that may be used in another proof. The second group of inference rules aims to characterize the logical connectives. These logical rules are generalizations of the introduction rules for the connectives from Figure 3.2, Figure 3.3, and Figure 3.4: the left rules are named with an L and the right rules are named with an R. Compared to the basic inference rules that came from an intuitive understanding of connectives as truth tables, each logical rule is generalized with additional hypotheses and alternative conclusions that are “along for the ride,” similar to Cut. For example, the two left introduction rules for conjunction in Figure 3.2 are generalized to: Γ, A ` ∆ Γ, A ∧B ` ∆ ∧L1 Γ, B ` ∆ Γ, A ∧B ` ∆ ∧L2 which say that if ∆ is a consequence of A and Γ, then ∆ is just as well a consequence of A ∧B and Γ (and similarly for B). Since we also consider logical negation ¬A as 50 a connective, it too is equipped with left and right introduction rules in Figure 3.5. These rules have the following special cases when Γ and ∆ are empty: A ` ` ¬A ¬R ` A ¬A ` ¬L In other words, whenever A is false we can infer that ¬A true, and whenever A is true we know ¬A is false. Similarly, the logical rules of the nullary connectives > and ⊥ are easy verify by the meaning of sequents. Clearly Γ entails either > or ∆ for any Γ and ∆, since > is always true, and likewise both Γ and ⊥ entail ∆ because ⊥ is never true. The most subtle logical connectives in LK are the quantifiers ∀ and ∃. The special cases of the introduction rules for ∀X.A and ∃X.A when Γ and ∆ are: ` A ` ∀X.A ∀R A {B/X} ` ∀X.A ` ∀L ` A {B/X} ` ∃X.A ∃R A ` ∃X.A ` ∃L For universal quantification over the variable X in A, if we can prove that A is true without knowing anything about X then we can infer that ∀X.A is true, and if we can exhibit a counterexample for a specific B such that A with B for X is false then we know the general ∀X.A must be false. Existential quantification over the variable X in A is reversed, so that exhibiting an example for a specific B such that A with B for X is true means that ∃X.A must be true, whereas showing that A is false without knowing anything about X lets us infer that ∃X.A is false. The extra subtlety of the quantifiers lies in ensuring that we “know nothing else about X.” In natural deduction, this fact was expressed as a property of an entire proof sub-tree by checking all the leaves. In the sequent calculus, however, this extra constraint is more easily captured locally as a simple side condition because the “leaves” are all immediately known within the sequents. This side condition states that the variable X does not appear free anywhere else in the sequent, written as the premise X /∈ FV (Γ ` ∆) in both the ∀R and ∃L rules. Just as in NJ, this extra side condition really is necessary, since without it both quantifiers collapse into one, which is clearly not what we want. In LK, we should expect that a ∀ entails the corresponding ∃, for 51 example ∀X.X ` ∃X.X which is proved as follows: Y ` Y Ax ∀X.X ` Y ∀L ∀X.X ` ∃X.X ∃R But intuitively it shouldn’t be that an ∃ always entails the corresponding ∀. However, consider the following attempted proof of ∃X.X ` ∀X.X: X ` X Ax X /∈ FV ( ` X) ∃X.X ` X ∃L X /∈ FV (∃X.X ` ) ∃X.X ` ∀X.X ∀R The only reason that this proof is not valid is because the side conditions on X are not met: X /∈ FV (∃X.X ` ) is true but X /∈ FV ( ` X) does not hold. Therefore, the side conditions on the free type variables of sequents in the ∀R and ∃L rules are essential for keeping the intended distinct meanings of the quantifiers. The third group of inference rules aim to describe the structural properties of the sequents themselves that arise from their meaning. The weakening rules say that we can make any proof weaker by adding additional unused hypotheses (WL) or considering alternative unfulfilled consequences (WR) since the presence of irrelevant propositions doesn’t matter. The contraction rules say that duplicate hypotheses (CL) and duplicate consequences (CR) can just as well be merged into one since redundant repetitions don’t matter. And finally, the exchange rules say that hypotheses (XL) and consequences (XR) can be swapped since the order of propositions doesn’t matter. Remark 3.1. It may seem strange that the meaning of a sequent with multiple consequences is that only one consequence must be true instead of all consequences being true. In other words, the consequences of a sequent are disjunctive rather than conjunctive so that, for example, A ` B,C means “A entails B or C” instead of “A entails B and C.” One reason for this interpretation is that disjunctive consequences can be weakened but conjunctive consequences cannot. For example, if we already know that “A entails B or C” then we can deduce “A entails B or C or D” for any D because we already know that either B or C is a consequence of A, so the status of D is irrelevant. However, if we already know that “A entails B and C” then we don’t know much about “A entails B and C and D” in general, since D might not actually follow from A at all. A similar argument also explains why the hypotheses of 52 a sequent are conjunctive rather than disjunctive. Therefore, the meaning of sequents, where all hypotheses must entail one consequence, is essential for enabling weakening on both sides of entailment. End remark 3.1. Example 3.1. Through the exclusive use of introduction rules for treating logical connectives, LK enables a “bottom up” style of building proofs by starting with a final sequent as a goal that we would like to prove and building the rest of the proof up from there. When read in reverse, each logical rule identifies a connective in the goal below the line of inference and breaks it down into simpler sub-goals above the line. For example, let’s revisit Example 2.1 and consider how to build an LK proof that the proposition ((A ∧ B) ∧ C) ⊃ (B ∧ A) is true. As in NJ, we begin with the sequent ` ((A∧B)∧C) ⊃ (B∧A) as the goal and notice that the primary connective exposed in the only proposition available is implication, so we can apply the right implication rule: .... (A ∧B) ∧ C ` B ∧ A ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃R Next, we may break down the conjunction in the consequence B ∧ A with the right conjunction rule, splitting the proof into two parts: .... (A ∧B) ∧ C ` B .... (A ∧B) ∧ C ` A (A ∧B) ∧ C ` B ∧ A ∧R ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃R At this point, the consequences of both our goals are generic, lacking any specific connectives to work with, which is where the proof differs from the proof in Example 2.1. Instead of moving to build the proof top-down as in NJ, in LK we shift our attention to the left and begin breaking down the hypotheses. Since the hypothesis (A∧B)∧C contains a superfluous C, we use the first left conjunction rule in both branches of the proof to discard it: .... A ∧B ` B (A ∧B) ∧ C ` B ∧L1 .... A ∧B ` A (A ∧B) ∧ C ` A ∧L1 (A ∧B) ∧ C ` B ∧ A ∧R ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃R 53 Now we may apply another left conjunction rule to select the appropriate hypothesis needed for both sub-proofs: .... B ` B A ∧B ` B ∧L2 (A ∧B) ∧ C ` B ∧L1 .... A ` A A ∧B ` A ∧L1 (A ∧B) ∧ C ` A ∧L1 (A ∧B) ∧ C ` B ∧ A ∧R ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃R And finally, we can now close off both sub-proofs with the Ax rule, finishing the proof: B ` B Ax A ∧B ` B ∧L2 (A ∧B) ∧ C ` B ∧L1 A ` A Ax A ∧B ` A ∧L1 (A ∧B) ∧ C ` A ∧L1 (A ∧B) ∧ C ` B ∧ A ∧R ` ((A ∧B) ∧ C) ⊃ (B ∧ A) ⊃R End example 3.1. Remark 3.2. The traditional LK sequent calculus from Figure 3.5 presents the structural properties of sequents—exchange, weakening, and contraction—explicitly in the form of inference rules. However, there are alternate sequent calculi and variations on LK that forgo these structural rules by baking the properties deeper into the logic itself. The first change along this line is to treat the hypotheses and consequences of sequents as unordered collections of propositions, for example building sequents out of sets or multisets. This way, the exchange rules XL and XR don’t do anything at all, since the sequents in the premise and conclusion are considered identical. The second change is to rephrase the core axiom and cut rules in a way that bakes in weakening and contraction as follows: Γ, A ` A,∆ Ax Γ ` A,∆ Γ, A ` ∆ Γ ` ∆ Cut Contraction can be derived from these new Ax and Cut rules. CL is derived as: Γ, A,A ` ∆ Γ, A ` A,∆ Ax Γ, A ` ∆ Cut 54 and the derivation of CR is similar. Weakening, unfortunately, cannot be directly derived in the same manner as contraction, but instead it is admissible. That is to say, given any proof of the sequent Γ ` ∆, we can build similar proofs Γ, A ` ∆ and Γ ` A,∆ by pushing the unused A through the proof until it is finally discarded by the generalized Ax rule. In terms of provability—the question of which sequents can conclude a valid proof tree—the versions of LK with explicit and implicit structural rules are the same. In the implicit system, exchange is invisible, contraction is a consequence of axiom and cut, and all weakening is pushed to the leaves. Furthermore, the two different versions of the axiom and cut rules are interderivable with respect to their different logics. The explicit Ax rule in Figure 3.5 is a special case of the implicit one above, whereas the implicit Ax rule can be expanded into many weakenings followed by the explicit rule. Likewise, the explicit Cut rule can be derived from the implicit rule by weakening the two premises until they match, whereas the implicit Cut rule can be derived from the explicit rule by contracting the result of the conclusion to remove the duplication. Therefore, up to provability, the choice between these two different styles for handling the structural properties of sequents are a matter of taste. On the same subject, it’s also sensible to consider an alternate version of left implication introduction that duplicates rather than splitting hypotheses and consequences among the premises in the style of our revised Cut above: Γ ` A,∆ Γ, B ` ∆ Γ, A ⊃ B ` ∆ ⊃L In the presence of structural properties (either explicit or implicit), these two ⊃L rules are equivalent up to provability. However, if we want a more refined view of the structural properties, as in sub-structural logics like linear logic (Girard, 1987), then these differences become more acute and must be considered carefully. End remark 3.2. Consistency and cut elimination One of Gentzen’s motivations for developing the LK sequent calculus was to study the consistency of natural deduction. A consistent logic does not prove a contradiction, so that no proposition is proven both true and false. More specifically, we can say that a sequent calculus is consistent whenever there is no proof of the empty sequent `. 55 For a logic like LK, these two conditions are the same: from a contradiction weakening gives us ` A and A ` for any A, and from any A, that’s proven both true and false, Cut gives us `. Consistency of logics like LK is important because without consistency provability is meaningless: it’s not particularly interesting to exhibit a proof that some proposition A is true when we already know of a single proof that shows every proposition is true (and false)! So in the interest of showing LK’s consistency, how we might possibly begin to build a proof of the empty sequent from the bottom up? Let’s consider which of LK’s inference rules (from Figure 3.5) could possibly deduce `. It can’t be any of the structural rules because they all force at least one hypothesis or consequence in the conclusion below the line. Likewise, it can’t be any of the logical rules: since they are introduction rules, they all include at least one proposition built from a connective on either side of the deduced sequent. It also can’t be the axiom rule, which only deduces simple non-empty sequents of the form A ` A. Indeed, the only inference rule that might ever deduce an empty sequent—and therefore lead to inconsistency—is Cut as shown previously. This observation that only cuts can lead to contradictions is Gentzen’s (1935b) great insight to logical consistency. If we want to know that a sequent calculus like LK is consistent, it’s enough to ask if the Cut rule is important for provability. If Cut is not essential in any proof, so any provable sequent can be deduced without the help of Cut, then ` is unprovable since it cannot be deduced without Cut. This application highlights the importance of Gentzen’s (1935a) cut elimination (originally called Hauptsatz), and its phrasing in the sequent calculus, which says that every LK proof can be reduced to a cut-free one. Theorem 3.1 (Cut elimination). For all LK proofs of Γ ` ∆, there exists an alternate LK proof of Γ ` ∆ that does not contain any use of the Cut rule. The proof of cut elimination can be divided into two main parts: the logical steps and the structural steps. The logical steps of cut elimination consider the cases when we have a cut between two proof trees ending in the left and right rules for the same connective occurring in the same proposition, and show how to rewrite the proof into a new one that does not mention that particular connective. The structural steps of cut elimination handle all the other cases where we do not have a left and right introduction for the same proposition facing one another in a cut. These steps involve rewriting the structure of the proof and propagating the rules until the relevant logical 56 steps can take over. The final ingredient is to ensure that this procedure for eliminating cuts always gives a definite result, and does not spin off into an infinite regress. Example 3.2. Notice how different inference rules of LK treat the division of extraneous hypotheses and consequences among multiple premises differently. On the one hand, rules like ∧R and ∨L duplicate the side propositions Γ and ∆ from the conclusion to both premises. On the other hand, rules like Cut and ⊃L merge different side propositions from the two premises into the common conclusion, creating an ordering between them during the merge. Why are these particular rules given in such different styles, and why is the particular merge order chosen? One way to understand the impact of these details is to look at the interaction between the logical and structural rules during cut elimination, so let’s examine a few exemplar steps of the cut elimination procedure. The first, and the most trivial, case is when we cut an axiom with an existing proof D of Γ ` A,∆ or E of Γ, A ` ∆. This particular maneuver doesn’t add anything interesting to the nature of the existing proof, and so correspondingly eliminating the cut should just give the same proof back unchanged, as we can see in both cases: D.... Γ ` A,∆ A ` A Ax Γ ` A,∆ Cut =⇒ D.... Γ ` A,∆ A ` A Ax E.... Γ, A ` ∆ Γ, A ` ∆ Cut =⇒ E.... Γ, A ` ∆ Notice here that cutting an axiom with both D and E does not change the sequent in either conclusion, which comes from the precise way that Cut merges the side propositions in the two premises. For D, the extra consequence A coming from the axiom A ` A replaces the cut A in exactly the right position, and likewise for E . If Cut put the propositions of its conclusion in any other order, then we would need to exchange the result of one or both of the above steps with XL and XR to put them back into the right order. Moving on to a logical step, consider what happens when compatible ∧R and ∧L1 introductions, with premises D1, D2, and E respectively, meet in a Cut: D1.... Γ ` A,∆ D2.... Γ ` B,∆ Γ ` A ∧B,∆ ∧R E.... Γ′, A ` ∆′ Γ′, A ∧B ` ∆′ ∧L1 Γ′,Γ ` ∆′,∆ Cut =⇒ D1.... Γ ` A,∆ E.... Γ′, A ` ∆′ Γ′,Γ ` ∆′,∆ Cut 57 Reducing this cut involves selecting the appropriate premise D1 of the ∧R introduction so that it can meet with the single premise of ∧L1. The number of cuts are not reduced by this step, but instead the primary proposition A∧B of the cut has been reduced to A, which (non-trivially) justifies why this step is making progress in the cut elimination procedure. Not every cut-elimination step winds up so neatly organized, unfortunately, and sometimes the result is necessarily out of order and must be corrected. For example, consider the following reduction step of a Cut between compatible ¬R and ¬L inferences with premises D and E respectively: D.... Γ, A ` ∆ Γ ` ¬A,∆ ¬R E.... Γ′ ` A,∆′ Γ′,¬A ` ∆′ ¬L Γ′,Γ ` ∆′,∆ Cut =⇒ E.... Γ′ ` A,∆′ D.... Γ, A ` ∆ Γ,Γ′ ` ∆,∆′ Cut Γ′,Γ ` ∆′,∆ XL,XR Here, the Cut we get from reducing the proposition ¬A to A results in a sequent that is out of order compared to the conclusion we started with. Thus, we need to re-order the sequent with some number of XL and XR exchanges to restore the original conclusion. The fact that reducing a negation introduction cut inverts the order of propositions comes from the inherent inversion of negation: there’s no obvious way to prevent this scenario by modifying Cut. A similar re-ordering occurs with implication, where a Cut between compatible ⊃R and ⊃L inferences, with premises D, E1, and E2, can be reduced as follows: D.... Γ, A ` B,∆ Γ ` A ⊃ B,∆⊃R E1.... Γ′ ` A,∆′ E2.... Γ′′, B ` ∆′′ Γ′′,Γ′, A ⊃ B ` ∆′′,∆′⊃L Γ′′,Γ′,Γ ` ∆′′,∆′,∆ Cut =⇒ E1.... Γ′ ` A,∆′ D.... Γ, A ` B,∆ E2.... Γ′′, B ` ∆′′ Γ′′,Γ, A ` ∆′′,∆ Cut Γ′′,Γ,Γ′ ` ∆′′,∆,∆′ Cut Γ′′,Γ′,Γ ` ∆′′,∆′,∆XL,XR Here, we start with the side-propositions of E1 and E2 merged together with ⊃L, but after reducing the Cut, D cuts in between the two of them, so the final sequent must be re-ordered to match the original conclusion. The need to place D in the middle comes from the fact that its concluding sequent has A on the left and B on the right, 58 so our only available cuts must correspondingly place E1 to the left and E2 to the right, no matter how they are nested. Finally, we can see how the free variable side conditions on the ∀R and ∃L rules play a key role in cut elimination. For example, consider the following reduction step of a cut between compatible ∀R and ∀L inferences with D and E respectively: D.... Γ ` A,∆ Γ ` ∀X.A,∆ ∀R E.... Γ′, A {B/X} ` ∆ Γ′,∀X.A ` ∆′ ∀L Γ′,Γ ` ∆′,∆ Cut =⇒ D{B/X}.... Γ ` A {B/X} ,∆ E.... Γ′, A {B/X} ` ∆ Γ′,Γ ` ∆′,∆ Cut Notice that in order to make a direct cut between D and E , we need to substitute B for X in D to make the two sides match up properly. The fact that X does not occur free in Γ ` ∆ means that after substitution, both Γ and ∆ remain unchanged in the conclusion of the proof. If instead X appeared free somewhere in Γ or ∆, then the logical cut elimination step for ∀ would change the conclusion which ruins the result of the procedure. End example 3.2. Remark 3.3. The side conditions on the ∀R and ∃L rules are not just a useful aid to cut elimination, but are crucial to the entire endeavor. More specifically, if we removed the side condition from these two inference rules, then LK is inconsistent because we can directly derive a contradiction; and since cut elimination implies that contradictions cannot be derived, it therefore becomes impossible. One such contradiction is built in three parts, and is similar to the faulty NJ proof of false in Remark 2.2. First, we can prove that ∃X.X is true because there is some provably true proposition in LK, for example Y ⊃ Y or just >. Second, we can prove that ∀X.X is false because there is some provably false proposition in LK, for example (¬Y ) ∧ Y or just ⊥. Third, recall that without the side conditions on free propositional variables, we can derive a proof of ∃X.X ` ∀X.X, which is the glue that connects the first two parts together via cuts. In total, we would be able to derive the following contradiction in LK: ` > >R ` ∃X.X ∃R X ` X Ax X /∈ FV ( ` X) ∃X.X ` X ∃L X /∈ FV (∃X.X ` ) ∃X.X ` ∀X.X ∀R ` ∀X.X Cut ⊥ ` ⊥L ∀X.X ` ∀L ` Cut 59 which is only ruled out by the side conditions on ∀R and ∃L that prevent a proof of the sequent ∃X.X ` ∀X.X. In this particular proof, the side condition X /∈ FV (∃X.X `) is satisfied becauseX is bound in ∃X.X soX is indeed not free in ∃X.X `, but the side condition X /∈ FV (` X) is clearly violated. The other possible proof which switches the order of the ∃L and ∀R rules similarly violates the side condition X /∈ FV (X `) forced by ∀R. End remark 3.3. Logical duality Another application of sequent calculi is to study the dualities of logic through the deep symmetries of the system (Gentzen, 1935b). The turnstyle of entailment (`) provides the pivot of duality separating left from right and true from false. Logical duality in the LK sequent calculus expresses a relationship between the connectives that follows De Morgan’s laws about the way negation distributes over conjunction and disjunction: ¬(A ∧B) a` (¬A) ∨ (¬B) ¬(A ∨B) a` (¬A) ∧ (¬B) Where we interpret the equivalence relation A a` B as the mutual provability of A and B: that both A ` B and B ` A are provable. Focusing on the opposite roles of the left and right sides of a sequent, we can immediately observe that the introduction rules of conjunction and disjunction from Figure 3.5 are mirror images of one another by flipping the sequents across their turnstyle. Similarly, both the ∀ and ∃ are duals to one another, and negation is its own dual, with both ¬R and ¬L reflecting the same inference flipped about entailment. But what about implication? After examining Figure 3.5, there doesn’t seem to be any logical connective that serves as implication’s dual counterpart. Fortunately, the symmetric nature of sequents lets us discover the dual of implication by just syntactically flipping the ⊃R and ⊃L inferences, giving us the following inferences rules for a new connective B − A: Γ, A ` ∆ Γ′ ` B,∆′ Γ,Γ′ ` B − A,∆,∆′ −R Γ, B ` A,∆ Γ, B − A ` ∆ −L 60 But what does this new connective, the dual of implication, mean? By excluding all side hypotheses and consequences so that Γ,Γ′,∆,∆′ are all empty in the style of Figure 3.4, we can read off the basic truth and falsehood facts from the above rules. On the one hand, the −R rule says that B − A is true whenever B is true and A is false. On the other hand, the −L rule says that B − A must be false whenever B entails A. Therefore, the proposition B −A can be thought of as the subtraction of A from B or equivalently the complement of A with respect to B, so that B −A can be read as “B but not A.” Remark 3.4. Another method for discovering the implication’s dual is by reducing these two rather complex connectives into simpler forms. Notice that, since LK is a classical logic, implication is equivalent to an encoding based on disjunction and negation, up to provability: A ⊃ B a` (¬A) ∨B since A implies B is true if and only if either B is true or A is false. The proofs justifying this encoding in LK are: A ` A Ax A, (¬A) ` ¬L B ` B Ax A, (¬A) ∨B ` B ∨L (¬A) ∨B,A ` B XL (¬A) ∨B ` A ⊃ B ⊃R A ` A Ax ` ¬A,A ¬R ` (¬A) ∨B,A ∨R1 ` A, (¬A) ∨B XR B ` B Ax B ` (¬A) ∨B ∨R2 A ⊃ B ` (¬A) ∨B, (¬A) ∨B ⊃L A ⊃ B ` (¬A) ∨B CR We also have an encoding of subtraction in terms of conjunction and negation: B − A a` B ∧ (¬A) which is provable similarly to the encoding of implication. We can now use the above encodings to calculate the negation of implication with De Morgan’s laws, using the fact that conjunction is provably commutative—A ∧B a` B ∧ A for any A and B: ¬(A ⊃ B) a` ¬((¬A) ∨B) a` (¬(¬A)) ∧ (¬B) a` (¬B) ∧ (¬(¬A)) a` (¬B)− (¬A) 61 Duality of sequents: (Γ ` ∆)⊥ , ∆⊥ ` Γ⊥ (A1, . . . , An)⊥ , A⊥n , . . . , A⊥1 Duality of propositions: (X)⊥ , X (¬A)⊥ , ¬(A⊥) >⊥ , ⊥ ⊥⊥ , > (A ∧B)⊥ , (A⊥) ∨ (B⊥) (A ∨B)⊥ , (A⊥) ∧ (B⊥) (A ⊃ B)⊥ , (B⊥)− (A⊥) (B − A)⊥ , (A⊥) ⊃ (B⊥) (∀X.A)⊥ , ∃X.(A⊥) (∃X.A)⊥ , ∀X.(A⊥) FIGURE 3.6. Duality in the LK sequent calculus. The dual is then recovered from the fact that A⊥ a` (¬A)∗, where A∗ stands for A with all propositional variables X replaced with ¬X. Therefore, we can also derive the dual of implication by encoding it and its dual with conjunction, disjunction, and negation. End remark 3.4. With the dual of implication at hand, we can properly express the duality of sequent calculus proofs—for every LK proof D of a sequent: D.... An, . . . , A2, A1 ` B1, B2, . . . , Bm there is a dual proof D⊥ of the dual sequent: D⊥.... B⊥m, . . . , B ⊥ 2 , B ⊥ 1 ` A⊥1 , A⊥2 , . . . , A⊥n The duality relation on judgements and propositions, is given in Figure 3.6. Note that the duality operation A⊥ may be understood as taking the negation of the proposition, ¬A, and pushing the negation inward all the way using the De Morgan laws, until an unknown proposition variable X is reached (Gentzen, 1935b).3 3Note that Gentzen did not consider the dual counterpart to implication as a connective, as we do, but rather eliminated implication from the system by encoding it in terms of disjunction and negation given above for the purposes of establishing duality. 62 Theorem 3.2 (Logical duality). For any LK proof D of the sequent Γ ` ∆, there exists a dual proof D⊥ of the dual sequent ∆⊥ ` Γ⊥. Due to the natural syntactic symmetry of the LK sequent calculus, logical duality comes from an exchange between left and right: left rules mirror right rules and hypotheses to the left of entailment mirror consequences to the right. Thus, establishing logical duality in the sequent calculus follows from a straightforward induction on the structure of proofs, working from the bottom conclusion up to the axioms. Example 3.3. To illustrate how the left and right sides of proofs get swapped, consider the case when the bottom conclusion is inferred from a use of the ∧R rule: D.... Γ ` A,∆ E.... Γ ` B,∆ Γ ` A ∧B,∆ ∧R Then by the inductive hypothesis, we get a proof D⊥ of (Γ ` A,∆)⊥ , ∆⊥, A⊥ ` Γ⊥ and a proof E⊥ of (Γ ` B,∆)⊥ , ∆⊥, B⊥ ` Γ⊥, from which we can deduce (Γ ` A ∧B,∆)⊥ , ∆⊥, (A⊥) ∨ (B⊥) ` Γ⊥ by ∨L: D⊥.... ∆⊥, A⊥ ` Γ⊥ E⊥.... ∆⊥, B⊥ ` Γ⊥ ∆⊥, A⊥ ∨B⊥ ` Γ⊥ ∨L End example 3.3. Remark 3.5. The duality of proofs in the LK sequent calculus means that if a proposition A is true, so that we have a proof of ` A, then its dual must be false, so that we have a proof of A⊥ ` . Analogously, if a proposition A is false, then its dual must be true. For example, consider the following general proof that the contradictory proposition A ∧ (¬A) is false: A ` A Ax A ∧ (¬A) ` A ∧L1 A ∧ (¬A),¬A ` ¬L A ∧ (¬A), A ∧ (¬A) ` ∧L2 A ∧ (¬A) ` CL 63 For free, duality gives us a general proof that the law of excluded middle, A ∨ (¬A), is true: A ` A Ax A ` A ∨ (¬A) ∨R1 ` ¬A,A ∨ (¬A) ¬R ` A ∨ (¬A), A ∨ (¬A) ∨R2 ` A ∨ (¬A) CR This is not a trivial property—the fact that the LK sequent calculus can prove the law of excluded middle means that it is a proof system for classical logic. In contrast, intuitionistic logic is missing duality since it accepts non-contradiction, ¬(A∧(¬A)), in general but rejects the universal truth of laws like excluded middle or double negation elimination ((¬(¬A)) ⊃ A), only allowing for specialized proofs depending on the particular proposition A in question. Intuitionistic logic also only validates three of the four aforementioned De Morgan laws, rejecting ¬(A ∧ B) ` (¬A) ∨ (¬B) in particular, showing another break of duality. Gentzen’s (1935a) system NJ of natural deduction is naturally a proof system for intuitionistic logic, in contrast with the LK sequent calculus which is classical. However, notice that the LK proof of excluded middle made critical use of multiple consequences and contraction on the right of the sequent in order to apply both ∨R2 and ∨R1 to the same original consequence. Without the ability to manipulate sequents with multiple consequences, the proof that A ∨ (¬A) is true would not be possible. Indeed, such a restriction would break the symmetry of LK—as multiple hypotheses cannot be mirrored into multiple consequences—and destroy the duality that let us convert the law of non-contradiction into law of excluded middle. As it turns out, Gentzen (1935a) also introduced a sequent calculus called LJ as a restriction of LK where sequents could only ever contain one consequence, which is instead a sequent calculus system for intuitionistic logic of equal provability strength as NJ. Note that with this restriction, LJ effectively removes the right structural rules WR, CR, and XR since they involve sequents with more than one consequence. From the other perspective, generalizing natural deduction with multiple consequences turns it into a proof system for classical logic (Parigot, 1992; Ariola & Herbelin, 2003). Therefore, we can summarize that the difference between a single-consequence and multiple-consequence proof systems can mean the difference between intuitionistic and classical logic. End remark 3.5. 64 The Core Calculus Today, the Curry-Howard isomorphism (Curry et al., 1958; Howard, 1980; de Bruijn, 1968) is a far-reaching thesis that each logic corresponds to a foundational programming language: the propositions of logic can be seen as types of programs and the proofs of those propositions can be seen as programs themselves. The shining example of this recurring correspondence is between Gentzen’s (1935a) natural deduction and Church’s (1932) λ-calculus. However, the logics of natural deduction and the sequent calculus are rather different from one another. As previously discussed, one major point of distinction between the two styles of logic is that natural deduction is right-handed, favoring exclusively right rules for logical connectives, whereas the sequent calculus is ambidextrous, favoring introduction rules on both the left and right sides of entailment. That means that the sequent calculus does not correspond to the λ-calculus the same way that natural deduction does. So what might a programming language based on a sequent calculus like LK look like? From natural deduction’s right-handed nature, we get an expression-oriented language like the λ-calculus: all the phrases of the language work toward producing some result corresponding to the primary consequence on the right, and so they may all be (potentially) composed together. But the sequent calculus is ambidextrous, containing both left- and right-handed rules, and regularly deals with sequents like A,¬A ` that lack any particular consequence to speak of. Without a consequence, how can we say what type of result to expect from a program corresponding to the sequent A,¬A `, or that it even produces a result at all? More generally, notice that we can classify the rules of LK from Figure 3.5 by the three different kinds of sequents they can deduce: those with a primary consequence of interest like in the right rules, those with a primary hypothesis of interest like in the left rules, and those with no particular proposition of interest (including possibly the empty sequent) like in the cut rule. If we interpret LK as a programming language, it seems reasonable that each of these different kinds of sequents correspond to a different basic kind of phrase in the language, whose composition is guided by the forms of the inference rules. Before delving into the entirety of LK, let’s first consider a core language shown in Figure 3.7, Herbelin’s (2005) µµ˜-calculus, that corresponds to the core part of LK and lies in the heart of every sequent-based language we will explore. Notice that the language of types in this core lacks any logical connectives, so that the only types are uninterpreted variables X, Y , Z, etc. The µµ˜-calculus is a bare language for describing 65 A,B,C ∈ Type ::= X X, Y, Z ∈ TypeVariable ::= . . . c ∈ Command ::= 〈v||e〉 v ∈ Term ::= x | µα.c x, y, z ∈ Variable ::= . . . e ∈ CoTerm ::= α | µ˜x.c α, β, γ ∈ CoVariable ::= . . . Γ ∈ InputEnv ::= x1 : A1, . . . , xn : An ∆ ∈ OutputEnv ::= α1 : A1, . . . , αn : An Judgement ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ | e : A ` ∆) Core rules: x : A ` x : A | VR | α : A ` α : A VL c : (Γ ` α : A,∆) Γ ` µα.c : A | ∆ AR c : (Γ, x : A ` ∆) Γ | µ˜x.c : A ` ∆ AL Γ ` v : A | ∆ Γ′ | e : A ` ∆′ 〈v||e〉 : (Γ′,Γ ` ∆′,∆) Cut FIGURE 3.7. µµ˜: The core language of the sequent calculus. only input, output, and interactions: the types on the right side of a sequent describe the outputs of a program and the types on the left side of a sequent describe the inputs of a program. When the two opposite sides come together—when the opposed forces of input and output meet—we have an interaction that sparks computation. Note that the type system brings out an aspect of deduction that was implicit in the sequent calculus: the role of a distinguished active proposition that is currently under consideration. For example, in the ∧R rule from Figure 3.5, we are currently trying to prove the proposition A ∧B, so it is considered the active proposition of the sequent Γ ` A ∧B,∆. By putting attention on (at most one) active proposition, we get three classifications of sequents: active on the right, active on the left, or passive (without an active proposition on either side). These three forms of sequents likewise classify three different forms of µµ˜ expressions that might be part of a program: – An active sequent on the right (Γ ` v : A | ∆) describes a term v that sends information of type A as its output (that is, v is a producer of type A). – An active sequent on the left (Γ | e : A ` ∆) describes a co-term e that receives information of type A as its input (that is, e is a consumer of type A). 66 – A passive sequent (c : (Γ ` ∆)) describes a command c that is an executable program capable of running on its own without any distinguished input or output. In each case, the environments Γ and ∆ describe any additional inputs and outputs to an expression by specifying the type of free variables (x, . . . ) and free co-variables (α, . . . ) that expression might reference, respectively. The expressions of the µµ˜-calculus come from the axiom and cut rules of LK plus an additional pair of activation rules AR and AL. The Ax rule of LK is divided into two separate rules in µµ˜: the VR rule creates a term by just referring to a variable available from its environment, and similarly the VL rule creates a co-term by referring to a co-variable. The Cut rule connects a term and co-term that are waiting to send and receive information of the same type, so that the output of the term is forwarded to the co-term as input (and dually, the input of the co-term is drawn from the output of the term). Finally, the activation rules AR and AL pick a particular (co-)variable from the environment of a command to activate by creating an output or input abstraction, respectively. Intuitively, if the variable x stands for an unknown input in a command c, then the input abstraction µ˜x.c is a co-term that, when given a place to draw information, will bind that location to the input channel x while running c. Dually, if the co-variable α stands for an unknown output in a command c, then the output abstraction µα.c is a term that, when given a place to send information, will bind that location to the output channel α while running c. Having examined the static properties of the µµ˜-calculus—its syntax and types— we still need to consider the dynamic properties of µµ˜, to explain what it means to run a program. To say “what is computation in the sequent calculus?” we turn to cut elimination (previously mentioned in Section 3.1) which outlines a method of reducing commands as the main unit of computation.4 In other words, computation in µµ˜ is the behavior that results from cutting together a compatible producer and consumer in a command, so that they may meaningfully interact with one another. In the bare µµ˜-calculus with no logical connectives, we can only have three forms of commands: a cut between (co-)variables 〈x||α〉, a cut with an output abstraction 〈µα.c||e〉, and a cut with an input abstraction 〈v||µ˜x.c〉. In the first case, a command 〈x||α〉 represents a basic final state that can reduce no further, and even though its typing derivation 4Note, however, that the steps performed in µµ˜ transform more of the program at once which differs from the fine-grained steps of the original cut-elimination procedures used for LK. 67 contains a Cut, it is a trivial sort of cut that corresponds more closely to a passive version of LK’s Ax : x : A ` x : A | VR | α : A ` α : A VL 〈x||α〉 : (x : A ` α : A) Cut In the second two cases, the operational meaning of input and output abstractions are expressed via capture-avoiding substitution—much like the β law for functions in the λ-calculus—as illustrated by the following µ and µ˜ rewriting rules: (µ) 〈µα.c||e〉 µ c {e/α} (µ˜) 〈v||µ˜x.c〉 µ˜ c {v/x} The µ˜ reduction step substitutes the term v for the variable x introduced by an input abstraction, distributing it into the command c to the points where it is referenced. The µ reduction step is the mirror image, which substitutes a co-term e for a co-variable α introduced by an output abstraction. There is an extensional nature to input and output abstractions—analogous to the η law for functions in the λ-calculus—that observes the fact that trivial input and output abstractions can be eliminated by the following ηµ and ηµ˜ rewriting rules: (ηµ) µα. 〈v||α〉 ηµ v (α /∈ FV (v)) (ηµ˜) µ˜x. 〈x||e〉 ηµ˜ e (x /∈ FV (e)) In other words, the term that sends the output of v to α only to forward that information along as its own output is the same as v itself. Dually, the co-term that binds its input to x only to forward that information along to another co-term e can be written more simply as just e. As per Remark 2.3, we can derive a reduction theory (µµ˜ηµηµ˜) and equational theory (=µµ˜ηµηµ˜) for the µµ˜-calculus as the compatible-reflexive-transitive and compatible-reflexive-symmetric-transitive closures (respectively) of the µ, µ˜, ηµ, and ηµ˜ rewriting rules. It is also very easy to give the µµ˜-calculus an operational semantics by just applying the µ and µ rewriting rules directly to commands, so that the only evaluation context is the empty context in contrast to the λ-calculus which requires deeply nested evaluation contexts. In other words a single operational reduction is 68 given by c µµ˜ c′ c 7→µµ˜ c′ and multiple steps of the µµ˜ operational semantics is the reflexive-transitive closure of the single-step 7→µµ˜ . Note how only the operational semantics only includes the µ and µ˜ rewriting rules, meaning that they are operational rules (Herbelin & Zimmermann, 2009). In contrast, the ηµ and ηµ˜ rewriting rules are not used to run a program in the operational semantics, so they are (merely) observational rules meaning that the (co-)terms before and after ηµ and ηµ˜ reduction are both observably the same in any program. These observational rules are never needed to run a program and get a result, because they are simulated by the operational rules whenever they come to the forefront. For example, we have the following (general) ηµ reduction which gets us to a final command: 〈µβ. 〈x||β〉||α〉 →ηµ 〈x||α〉 But notice that in this case a µ operational step gets us to exactly the same final command anyway. The fundamental dilemma of computation Unfortunately, the aforementioned dynamic semantics for µµ˜ is overly simplistic and extremely non-deterministic, to the point where programs may make completely divergent and unrelated computations. The non-determinism of the µµ˜-calculus corresponds to the fact that classical cut elimination in the LK sequent calculus is also non-deterministic. The phenomenon is embodied by the fundamental conflict between input and output abstractions, as shown by the following critical pair between two dual µ and µ˜ reductions for performing substitution: c1 {(µ˜x.c2)/α} ≺µ 〈µα.c1||µ˜x.c2〉 µ˜ c2 {(µα.c1)/x} Both the term µα.c1 and co-term µ˜x.c2 are fighting for control in the above command, and either one may win. The non-deterministic outcome of this conflict is exemplified 69 in the case where neither α nor x are referenced in their respective commands: c1 ≺µ 〈µ .c1||µ˜ .c2〉 µ˜ c2 showing that programs may produce different results each time they are run, since the same starting point may step to two different and completely arbitrary commands. This form of divergent reduction paths is called a critical pair and has a serious impact on the dynamic semantics of the µµ˜-calculus. For the µµ˜ operational semantics, the result of a program is non-deterministic because it can end up in different final states depending on which rule is chosen; for example 〈x||α〉 ← [µµ˜ 〈µγ. 〈x||α〉||µ˜z. 〈y||β〉〉 7→ µµ˜ 〈y||β〉. This fact implies that the µµ˜ηµηµ˜ reduction theory is not non-confluent, because different reductions can be applied such that the two diverging paths never converge back to the same result again. And finally, the µµ˜ηµηµ˜ equational theory is incoherent because all commands and (co-)terms are equated. From the perspective of programming language semantics, this type of non-determinism can be undesirable since it makes it impossible to predict a single definitive result of a program since there may be multiple incompatible results depending on the choices made during execution. If we want to regain properties like determinism, confluence, or coherence, which are enjoyed by the λ-calculus, then some of these freedoms must be curtailed. In order to recover determinism for the sequent calculus, Curien & Herbelin (2000) observed that we only need to choose an evaluation strategy that deterministically picks the next step to take by giving priority to one reduction over the other: Call-by-value consists in giving priority to the µ redexes, while call-by- name gives priority to the µ˜ redexes. Prioritization between the two opposed means that there must be some potential µ or µ˜ redexes that we could reduce but choose not to, thereby yielding priority to the other side of the command. From another viewpoint, choosing a priority between the two sides of a command is the same thing as choosing a restriction on the terms and co-terms that can be substituted by the µ and µ˜ rules. And reversing directions, choosing which terms and co-terms are substitutable by µ and µ˜ reductions also chooses the evaluation strategy. Reflecting the above observation back to the calculus, we can restore determinacy to the operational semantics and confluence to the rewriting theory by making the 70 V ∈ ValueV ::= x E ∈ CoValueV ::= e (µV) 〈µα.c||E〉 µV c {E/α} (ηµ) µα. 〈v||α〉 ηµ v (α /∈ FV (v)) (µ˜V) 〈V ||µ˜x.c〉 µ˜V c {V/x} (ηµ˜) µ˜x. 〈x||e〉 ηµ˜ e (x /∈ FV (e)) FIGURE 3.8. The call-by-value (V) rewriting rules for the core µµ˜V-calculus. V ∈ ValueN ::= v E ∈ CoValueN ::= α (µN ) 〈µα.c||E〉 µN c {E/α} (ηµ) µα. 〈v||α〉 ηµ v (α /∈ FV (v)) (µ˜N ) 〈V ||µ˜x.c〉 µN c {V/x} (ηµ˜) µ˜x. 〈x||e〉 ηµ˜ e (x /∈ FV (e)) FIGURE 3.9. The call-by-name (N ) rewriting rules for the core µµ˜N -calculus. substitution rules strategy-aware: µ˜ only substitutes values for variables and µ only substitutes co-values for co-variables. In other words, the decision of which values and co-values are substitutable is enough information to determine an evaluation strategy in the µµ˜-calculus. To get call-by-value reduction, we can restrict the notion of value to exclude output abstractions and leave co-values unrestricted, thereby giving priority to the µ redexes as shown in Figure 3.8. Dually for call-by-name reduction, we can restrict the notion of co-value to exclude input abstractions and leave values unrestricted, thereby giving priority to the µ˜ redexes as shown in Figure 3.9. Notice that in any case, the observational ηµ and ηµ˜ reductions are not affected by the restrictions on (co-)values, because they do no substitution and are sound under any choice of evaluation strategy. These restrictions on substitution give us exactly Curien & Herbelin’s (2000) notions of the call-by-value and call-by-name, which restores determinacy, confluence, and coherence to the dynamic semantics of µµ˜. Excluding a (co-)term from the collection of (co-)values effectively prioritizes it by blocking opposing reductions, whereas including a (co-)term as a (co-)value diminishes its priority since it can be deleted or duplicated by substitution. 71 Structural rules and static scope So far we have skirted around the issue of how the structural properties of the sequent calculus are represented in the µµ˜-calculus. After all, they are an important part of Gentzen’s LK sequent calculus, but the type system in Figure 3.7 does not express them. For instance, the co-term µ˜z. 〈x||α〉 should have the type x : X | µ˜z. 〈x||α〉 : Y ` α : X, but there’s no way to derive that conclusion with the typing rules in Figure 3.7 alone. What’s missing here is a way to infer weakening on the left, which is a symptom of the general lack of structural properties in the raw core typing rules. There are multiple options for restoring the classical structural properties to the core µµ˜ type system, and to be thorough we will compare two of the most commonly used methods. The common theme behind both methods is to equate the structural properties of sequents with the scoping properties of static variables and co-variables in expressions. The first method of expressing the structural properties of sequents in µµ˜ is to add explicit structural rules that allow for a single (co-)variable to appear any number of times in an expression. The full collection of these structural scoping rules are shown in Figure 3.10, which corresponds one-for-one with the structural rules of Gentzen’s LK sequent calculus over each form of µµ˜ expression. The weakening rules say that even if a free (co-)variable is in scope in an expression, it does not have to be referenced, as in the co-term µ˜z. 〈x||α〉: x : X ` x : X | VR | α : X ` α : X VL 〈x||α〉 : (x : X ` α : X) Cut 〈x||α〉 : (x : X, z : Y ` α : X) WL x : X | µ˜z. 〈x||α〉 : Y ` α : X AL The contraction rules say that a free (co-)variable can be referenced an additional time by replacing another (co-)variable, as in the command 〈µδ. 〈y||α〉||µ˜z. 〈y||α〉〉: y : X ` y : X | VR | β : X ` β : X VL 〈y||β〉 : (y : X ` β : X) Cut 〈y||β〉 : (y : X ` δ : Y, β : X)WR y : X ` µδ. 〈y||β〉 : Y | β : XAR x : X ` x : X | VR | α : X ` α : XVL 〈x||α〉 : (x : X ` α : X) Cut 〈x||α〉 : (x : X, z : Y ` α : X)WL x : X | µ˜z. 〈x||α〉 : Y ` α : X AL 〈µδ. 〈y||β〉||µ˜z. 〈x||α〉〉 : (x : X, y : X ` α : X, β : X) Cut 〈µδ. 〈x||β〉||µ˜z. 〈x||α〉〉 : (x : X ` α : X, β : X) CL 72 c : (Γ ` ∆) c : (Γ ` α : A,∆) WR c : (Γ ` ∆) c : (Γ, x : A ` ∆) WL c : (Γ ` α : A, β : A,∆) c {α/β} : (Γ ` α : A,∆) CR c : (Γ, y : A, x : A ` ∆) c {x/y} : (Γ, x : A ` ∆) CL c : (Γ ` ∆, α : A, β : B,∆′) c : (Γ ` ∆, β : B,α : A,∆′) XR c : (Γ′, y : B, x : A,Γ ` ∆) c : (Γ′, x : A, y : B,Γ ` ∆) XL Γ ` v : C | ∆ Γ ` v : C | α : A,∆ WR Γ ` v : C | ∆ Γ, x : A ` v : C | ∆ WL Γ ` v : C | α : A, β : A,∆ Γ ` v {α/β} : C | α : A,∆ CR Γ, y : A, x : A ` v : C | ∆ Γ, x : A ` v {x/y} : C | ∆ CL Γ ` v : C | ∆, α : A, β : B,∆′ Γ ` v : C | ∆, β : B,α : A,∆′ XR Γ′, y : B, x : A,Γ ` v : C | ∆ Γ′, x : A, y : B,Γ ` v : C | ∆ XL Γ | e : C ` ∆ Γ | e : C ` α : A,∆ WR Γ | e : C ` ∆ Γ, x : A | e : C ` ∆ WL Γ | e : C ` α : A, β : A,∆ Γ | e {α/β} : C ` α : A,∆ CR Γ, y : A, x : A | e : C ` ∆ Γ, x : A | e {x/y} : C ` ∆ CL Γ | e : C ` ∆, α : A, β : B,∆′ Γ | e : C ` ∆, β : B,α : A,∆′ XR Γ′, y : B, x : A,Γ | e : C ` ∆ Γ′, x : A, y : B,Γ | e : C ` ∆ XL FIGURE 3.10. Scoping rules for (co-)variables in commands, terms, and co-terms. Finally, the exchange rules say that the order of the (co-)variables in scope does not matter. Notice that none of these rules are syntactically visible in their expression. Unlike the axiom, activation, and cut rules that only apply to expressions starting with a very specific form, the structural rules could potentially apply to expressions of any form so they are not directed by syntax. The scoping rules in Figure 3.10 can seem repetitive or even redundant: the same weakening, contraction, and exchange rules are repeated three times for commands, terms, and co-terms. Indeed, with this style of presenting the structural properties of sequents, it is common to limit the rules to a single form of expression like commands (Wadler, 2003; Munch-Maccagnoni, 2009). Unfortunately however, the repetition for 73 each kind of expression and sequent is necessary to ensure that the structural rules match our expectation of static scope in programming languages. For example, in anticipation of the imminent extension of µµ˜ with function types in Section 3.3, we might want to call a binary function of type X → X → Y with the same value for both arguments, as in the co-term x · x · β. To type this co-term, we need to contract x in the co-term itself, as in: x′ : X ` x′ : X | VR x : X ` x : X | VR | β : Y ` β : Y VL x : X | x · β : X → Y ` β : Y →L x′ : X, x : X | x′ · x · β : X → X → Y ` β : Y →L x : X | x · x · β : X → X → Y ` β : Y CL which is not possible if we only allow contraction in commands. Furthermore, only including the structural rules for commands can mean that sensible observational reductions like ηµ and ηµ˜ no longer preserve the type of expressions. For example, the ηµ- expanded term µα. 〈x||α〉 can be assigned the type y : Y, x : X ` µα. 〈x||α〉 : X | β : Y using weakening and exchange on commands as follows: x : X ` x : X | VR | α : X ` α : X VL 〈x||α〉 : (x : X ` α : X) Cut 〈x||α〉 : (x : X ` β : Y, α : X) WR 〈x||α〉 : (x : X ` α : X, β : Y ) XR 〈x||α〉 : (x : X, y : Y ` α : X, β : Y ) WL 〈x||α〉 : (y : Y, x : X ` α : X, β : Y ) XL y : Y, x : X ` µα. 〈x||α〉 : X | β : Y AR But there is no way to conclude y : Y, x : X ` x : X | β : Y without the structural rules for terms, even though it is a reduct of a term of that type: µα. 〈x||α〉 →ηµ x. The second method of expressing the structural properties of sequents in µµ˜ is by treating the environments Γ and ∆ as unordered sets associating types to (co-)variables and generalizing the axiom and cut rules to implicitly accomodate several steps of weakening and contraction, respectively (Curien & Herbelin, 2000; Wadler, 2005; Munch-Maccagnoni, 2013). This extension of the core µµ˜ type system is shown in Figure 3.11 and corresponds to the variant of LK with implicit structural rules discussed in Remark 3.2. In this formulation, there is no explicit use of structural rules in a typing derivation, but instead the structural properties of sequents follow 74 Γ, x : A ` x : A | ∆ VR Γ | α : A ` α : A,∆ VL c : (Γ ` α : A,∆) Γ ` µα.c : A | ∆ AR c : (Γ, x : A ` ∆) Γ | µ˜x.c : A ` ∆ AL Γ ` v : A | ∆ Γ | e : A ` ∆ 〈v||e〉 : (Γ ` ∆) Cut FIGURE 3.11. Implicit (co-)variable scope in the core µµ˜ typing. from the natural scoping rules for static (co-)variables in the µµ˜-calculus, analogous to the scoping rules for the λ-calculus. During type checking, an output abstraction Γ ` µα.c : A | ∆ (and dually an input abstraction Γ | µ˜x.c : A ` ∆) signals that the active type A may undergo an arbitrary number of structural rules depending on how α (dually x) is referenced in c. During execution, the behavior of structural rules are implicitly implemented by the substitution operation used by µ and µ˜ reduction, corresponding to the structural steps of a cut elimination procedure. As stated before for the logic of LK in Remark 3.2, the choice between the two formulations of the scoping properties of µµ˜ (co-)variables is somewhat arbitrary and a matter of taste. Since we are dealing with a calculus corresponding to classical logic, both treatments of structural properties are equivalent to each other in a sense—both formulations will admit type checking the same expressions, even in richer extensions of the core language. However, the two formulations have their own advantages. The implicit scoping presented in Figure 3.11 is concise and forgoes the redundancy of repeated rules, whereas the explicit scoping presented in Figure 3.10 easily allows for a more refined analysis of the structural properties and exploration of sub-structural calculi (Munch-Maccagnoni, 2009) corresponding to sub-structural logics that forbid certain uses of structural rules. The most important thing, though, is that something is done to express the scope of (co-)variables in the classical language µµ˜. For our purposes here, we will take the explicit formulation of scoping rules in Figure 3.10 as the canonical definition for classical µµ˜ in the remainder. Remark 3.6. As it turns out, output abstractions in the µµ˜-calculus let programs manipulate their own control flow similar to Scheme’s (Kelsey et al., 1998) callcc control operator, or Felleisen’s (1992) C operator. Intuitively, a use of callcc or an abort can be read in terms of an output abstraction that duplicates or deletes its 75 bound co-variable, respectively: callcc(λα.v) , µα. 〈v||α〉 abort c , µδ.c (δ /∈ FV (c)) This phenomenon is a consequence of Griffin’s (1990) observation that under the Curry-Howard correspondence, classical logic corresponds to control flow manipulation, along with the fact that the LK sequent calculus formalizes classical logic (see Remark 3.5). Under this interpretation, multiple consequences in the sequent calculus correspond to multiple available co-variables which give the program multiple possible exit paths. The weakening and contraction rules on the right for these multiple consequences correspond to deleting or copying an exit path, respectively. Indeed, multiple consequences with right-handed structural rules may be seen as the logical essence for this “classical” form of control effects (so called for the connection to classical logic as well as callcc being the traditional control operator), since extending natural deduction with multiple consequences, as in Parigot’s (1992) λµ-calculus, gives rise to a programming language with control effects equivalent to callcc (Ariola & Herbelin, 2003). End remark 3.6. The Dual Calculi With the core µµ˜ language firmly in place, we can now enrich it with additional programming constructs that correspond to the logical elements—the connectives and logical rules—of Gentzen’s LK sequent calculus. The syntax and typing rules for these extra logical constructs are shown in Figure 3.12,5 which extends the core µµ˜-calculus from Figure 3.7 along with the structural (co-)variable scoping rules from Figure 3.10. This language combines both Curien & Herbelin’s (2000) λµµ˜-calculus (the portion associated with implication) and Wadler’s (2003) dual calculus (the portion associated with conjunction, disjunction, and negation) into a single calculus corresponding to all of the simply-typed LK sequent calculus. Furthermore, the quantifiers of LK are interpreted as a sequent calculus version of system F (Reynolds, 1983; Girard et al., 1989): universal quantification (∀) acts as an abstraction over types analogous to implication, and existential quantification (∃) is the mirror image of ∀. We refer to 5To help syntactically distinguish terms from co-terms, we use the notational convention throughout that round parentheses are the grouping brackets for terms, and square brackets are the grouping brackets for co-terms. 76 this combined language here as the “dual calculi” because, as we will soon see, the language is the basis for two different but highly related calculi that exhibit dual computational behavior to one another. Since the right introduction rules for logical connectives are shared by both natural deduction and the sequent calculus, the dual calculi terms for creating results of product, sum, and function types have the same form as in the λ-calculus. Products are introduced by pairing, (v, v′), sums are introduced by injection, ι1 (v) and ι2 (v), and functions are introduced by λ-abstractions, λx.v. Additionally, the terms for creating results of universally quantified types are Λ-abstractions, ΛX.v, as in system F, and the results of existentially quantified types are “masked” terms, B @ v, that hide the type B in the underlying term v from being visible from the outside. In contrast, the left introduction rules of the sequent calculus are distinct from the right elimination rules of natural deduction, so the difference between the λ-calculus and the dual calculi really appears when results are used. Instead of function application, the left implication introduction →L builds a co-term that represents a call-stack. If v is a term that produces a result of type A, and e is a co-term that consumes a result of type B, then the call-stack v · e is a co-term that works with a function value of type A→ B by feeding it v as an argument and sending the returned result to e. For example, given that x1 : A1, x2 : A2, x3 : A3, and β : B, then the call-stack x1 · [x2 · [x3 · β]] is expecting to consume a function of type A1 → (A2 → (A3 → B)):6 x1:A1 ` x1 : A1 |VR x2:A2 ` x2 : A2 |VR x3:A3 ` x3 : A3 |VR | β : B ` β:BVL x3:A3 | x3 · β : A3 → B ` β : B →L x3:A3, x2:A2 | x2 · x3 · β : A2 → A3 → B ` β:B →L x3:A3, x2:A2, x1:A1 | x1 · x2 · x3 · β : A1 → A2 → A3 → B ` β:B →L The left introductions for the other type constructors follow a similar pattern, with each one building a co-term that expects to consume a value of that type. There are two left conjunction introductions corresponding to the two projections out of a product. If e1 is a co-term that consumes a value of type A, then ×L1 builds the 6Like the common notational convention in the simply-typed λ-calculus that the function type constructor associates to the right, so that A1 → A2 → A3 → B = A1 → (A2 → (A3 → B)), we adopt a similar notational convention that the call stack constructor associates to the right, so that x1 · x2 · x3 · β = x1 · [x2 · [x3 · β]]. 77 A,B,C ∈ Type ::= X | A×B | A+B | ¬A | A→ B | ∀X.A | ∃X.A c ∈ Command ::= 〈v||e〉 v ∈ Term ::= x | µα.c | (v, v) | ι1 (v) | ι2 (v) | not(e) | λx.v | ΛX.v | B @ v e ∈ CoTerm ::= α | µ˜x.c | pi1 [e] | pi2 [e] | [e, e] | not[v] | v · e | B @ e | Λ˜X.e Γ ∈ InputEnv ::= x1 : A1, . . . , xn : An ∆ ∈ OutputEnv ::= α1 : A2, . . . , αn : An Judgement ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ | e : A ` ∆) Logical rules: Γ ` v : A | ∆ Γ ` v′ : B | ∆ Γ ` (v, v′) : A×B | ∆ ×R Γ | e : A ` ∆ Γ | pi1 [e] : A×B ` ∆ ×L1 Γ | e : B ` ∆ Γ | pi2 [e] : A×B ` ∆ ×L2 Γ ` v : A | ∆ Γ ` ι1 (v) : A+B | ∆ +R1 Γ ` v : B | ∆ Γ ` ι2 (v) : A+B | ∆ +R2 Γ | e : A ` ∆ Γ | e′ : B ` ∆ Γ | [e, e′] : A+B ` ∆ +L Γ | e : A ` ∆ Γ ` not(e) : ¬A | ∆ ¬R Γ ` v : A | ∆ Γ | not[v] : ¬A ` ∆ ¬L Γ, x : A ` v : B | ∆ Γ ` λx.v : A→ B | ∆ →R Γ ` v : A | ∆ Γ′ | e : B ` ∆′ Γ′,Γ | v · e : A→ B ` ∆′,∆ →L Γ ` v : A | ∆ X /∈ FV (Γ ` ∆) Γ ` ΛX.v : ∀X.A | ∆ ∀R Γ | e : A {B/X} ` ∆ Γ | B @ e : ∀X.A ` ∆ ∀L Γ ` v : A {B/X} | ∆ Γ ` B @ v : ∃X.A | ∆ ∃R Γ | e : A ` ∆ X /∈ FV (Γ ` ∆) Γ | Λ˜X.e : ∃X.A ` ∆ ∃L FIGURE 3.12. The syntax and types for the dual calculi. 78 co-term pi1 [e1] that works with a value of type A × B by projecting out the first element of the product and sending it to e1 when needed (and similarly for the second projection pi2 [e2] built by ×L2). If e1 and e2 are co-terms that consume values of type A and B, respectively, then +L builds the co-term [e1, e2] that works with a value of type A+B by checking its constructor: an injection of the form ι1 (v1) has the value of v1 sent to e1 as needed, and likewise an injection of the form ι2 (v2) has the value of v2 sent to e2 as needed. The co-term for ∀L is similar to the call stacks of→L, so that if e is a co-term that consumes a value at the particular type A {B/X}, then B @ e works with a value of the general type ∀X.A by first specializing the polymorphic value and then passing it along to e. Perhaps the most unusual co-term comes from ∃L, but this is just the mirror image of the ∀R term. If e is a co-term that consumes a value of type A, containing a generic type variable X, then ∃L gives the abstracted co-term Λ˜X.e that works with a value of type ∃X.A by instantiating X with the value’s hidden type before passing the underlying value to e. The one type constructor that is not typically found in the λ-calculus, but commonly in a sequent calculus like LK or the dual calculi, is negation. The negation type ¬A represents an inversion between producers and consumers—terms and co-terms—during computation. Intuitively, negation expresses a form of continuations: a term of type ¬A is actually a consumer of A. The right negation introduction allows terms to contain consumers, so that if e is a co-term expecting an input a result of type A then ¬R builds the term not(e). Dually, the left negation introduction allows co-terms to contain producers, so that if v is a term expecting to output a result of type A then ¬L builds the co-term not[v]. When a negated term and co-term meet each other in a command, the inversion is undone so that their underlying components change places and continue the interaction. The above intuition on the dynamic meaning of types in the dual calculi can be codified into rewriting rules. Recall from Section 3.2 that the semantics of the core µµ˜-calculus was split in two to restore determinacy and confluence: one corresponding to call-by-value and the other to call-by-name. Likewise, there are two semantics for the dual calculi, so that the same language bears two different calculi (hence the name). Since both semantics of the core µµ˜-calculus are already given in Figure 3.8 and Figure 3.9, we only need to suitably expand the notions of value and co-value to accomodate the new (co-)term introductions and explain the logical steps of cut elimination (referred to by the common name β) that occur when two opposed 79 V ∈ ValueV ::= x | (V, V ) | ι1 (V ) | ι2 (V ) | not(e) | λx.v | ΛX.v | A@ V E ∈ CoV alueV ::= e (β×V ) 〈(V1, V2)||pii [E]〉 β×V 〈Vi||E〉 (β + V ) 〈ιi (V )||[E1, E2]〉 β+V 〈V ||Ei〉 (β¬V ) 〈not(e)||not[v]〉 β¬V 〈v||e〉 (β→V ) 〈λx.v||V · E〉 β→V 〈v {V/x}||E〉 (β∀V) 〈ΛX.v||B @ E〉 β∀V 〈v {B/X}||E〉 (β ∃ V) 〈 B @ V ∣∣∣∣∣∣Λ˜X.e〉 β∃V 〈V ||e {B/X}〉 FIGURE 3.13. The β laws for the call-by-value (V) half of the dual calculi. V ∈ ValueN ::= v E ∈ CoValueN ::= α | pi1 [E] | pi2 [E] | [E,E] | not(v) | v · E | B @ E | Λ˜X.e (β×V ) 〈(V1, V2)||pii [E]〉 β×V 〈Vi||E〉 (β + V ) 〈ιi (V )||[E1, E2]〉 β+V 〈V ||Ei〉 (β¬V ) 〈not(e)||not[v]〉 β¬V 〈v||e〉 (β→V ) 〈λx.v||V · E〉 β→V 〈v {V/x}||E〉 (β∀V) 〈ΛX.v||B @ E〉 β∀V 〈v {B/X}||E〉 (β ∃ V) 〈 B @ V ∣∣∣∣∣∣Λ˜X.e〉 β∃V 〈V ||e {B/X}〉 FIGURE 3.14. The β laws for the call-by-name (N ) half of the dual calculi. introduction forms of the same type meet in a command. The call-by-value β rules are given in Figure 3.13 and the call-by-name β rules are given in Figure 3.14, both of which extend the core semantics from Figure 3.8 and Figure 3.9, respectively. The β×, β+ and β¬ rules come from Wadler’s (2003) dual calculus whereas the β→ rules are inspired by Curien & Munch-Maccagnoni’s (2010) revision of the λµµ˜-calculus. The β laws extend the previous dynamic semantics of the core µµ˜-calculus to account for the additional programming constructs. As per Remark 2.3, we have a reduction theory (µV µ˜VβV ), equational theory (=µV µ˜VβV ), and an operational semantics (7→ µV µ˜VβV ) for the call-by-value dual calculus from the µV , µ˜V , ηµ, ηµ˜, and βV laws, as well as a reduction theory (µNµNβN ), equational theory (=µNµNβN ), and operational semantics (7→ µNµNβN ) for the call-by-name dual calculus from the µN , µ˜N , ηµ, ηµ˜, and βN laws. As before, both the call-by-value and call-by-name operational semantics applies the rewriting rules directly to commands. 80 Notice that, like in the core µµ˜-calculus, the form of the operational β rules are the same in both semantics, so that the only difference is the definition of value and co-value referred to in those rules. The rule of thumb is that a β rule only applies when an introductory value and co-value interact in a command. For example, the call-by-value β×V rule will only project from a pair value to extract a component that is also a value. These restrictions are captured in the call-by-value definition of value that admits only “simple” terms and hereditarily excludes complex terms like µα.c (representing an arbitrarily complex computation before yielding a result on α) from the values of product and sum types, which matches the behavior of products and sums in strict functional languages like ML. However, there is no such restriction on co-terms in the call-by-value operational semantics, so any co-term counts as a co-value. Dually, the call-by-name β×N rule will only project out of a pair when it is needed by a projection co-value to send that component the underlying co-value. These restrictions are captured in the call-by-name definition of co-value that admits only “strict” co-terms and hereditarily excludes complex co-terms like µ˜x.c (representing an arbitrarily complex computation before demanding a result for x) from the co-values of product and sum types. However, there is no restriction on terms in the call-by-name operational semantics, so any term counts as a value. Remark 3.7. It’s worthwhile to mention that although the dual calculi are primarily seen as typed languages, their semantics do not use any type information to run commands. We can therefore execute untyped commands as well as typed ones, which of course creates the possibility of getting stuck at fatal type errors. Untyped commands also open up the possibility of running general recursive programs, which can be encoded in a similar manner as in the λ-calculus without any additional features of the language. For example, Curry’s untyped fixed-point Y combinator in the λ- calculus: Y , λf.(λx.f (x x)) (λx.f (x x)) can be analogously defined in the dual calculi using functions as: Y , λf.µα. 〈λx.µβ. 〈f ||µγ. 〈x||x · γ〉 · β〉||(λx.µβ. 〈f ||µγ. 〈x||x · γ〉 · β〉) · α〉 The two share analogous behavior: in the λ-calculus Y f = f (Y f) and in the dual calculi 〈Y ||f · α〉 = 〈f ||µβ. 〈Y ||f · β〉 · α〉. Also analogous to the non-terminating untyped term Ω , (λx.x x) (λx.x x) in the λ-calculus, the dual calculi both have 81 non-terminating untyped commands, which can be written using functions or more simply with negation: Ω , 〈not(µ˜x. 〈x||not[x]〉)||not[µα. 〈not(α)||α〉]〉 For example, in the call-by-name operational semantics, we have the following infinite execution of Ω: Ω , 〈not(µ˜x. 〈x||not[x]〉)||not[µα. 〈not(α)||α〉]〉 7→β¬N 〈µα. 〈not(α)||α〉||µ˜x. 〈x||not[x]〉〉 7→µ˜N 〈µα. 〈not(α)||α〉||not[µα. 〈not(α)||α〉]〉 7→µN 〈not(not[µα. 〈not(α)||α〉])||not[µα. 〈not(α)||α〉]〉 7→β¬N 〈µα. 〈not(α)||α〉||not[µα. 〈not(α)||α〉]〉 7→µN . . . Note that encoding general recursion in the untyped sequent calculus requires some logical connective, like negation or implication. The core µµ˜-calculus gives a more restrained language of substitution that does not express general recursion even in the untyped calculus, where general (and non-confluent) µ- and µ˜-reduction is still strongly normalizing (Polonovski, 2004)—that is, there are no infinite sequences of µµ˜-reductions. This fact is in contrast with the untyped λ-calculus which can express general recursion, because β-reduction is not strongly normalizing in the untyped calculus. End remark 3.7. Focusing on computation There is a problem lurking in the β-based operational semantics for the dual calculi. Consider how we would evaluate the projection pi1((f 1), 2) in a call-by-value functional language like ML. First we would compute the application f 1 to construct the pair value, then we would compute the pi1 projection of that pair and extract the value returned by f 1 as the result of the expression. However, if we represent this 82 program as the following command in the call-by-value dual calculus:7 〈((µβ. 〈f ||1 · β〉), 2)||pi1 [α]〉 we find that no operational rule matches this command, so we are stuck! This isn’t just a problem with the call-by-value operational semantics. The command: 〈(1, 2)||pi1 [µ˜x. 〈0||α〉]〉 which corresponds to the expression letx = pi1(1, 2) in 0 in a functional language, is also stuck in the call-by-name operational semantics. This is clearly an undesirable situation that breaks the connection between the λ- calculus and dual calculi—we should not get stuck on such commands with unfinished computation in introduction forms—so something needs to be done to refocus the attention in a command to the next step of computation. As it stands now in the dual calculi, we either have too many programs with unexplained behavior, or too few behaviors for executing programs. Correspondingly, there are two general techniques to remedy prematurely stuck commands and restore the connection between λ-calculus and the dual calculi: (1) The static approach (Curien & Herbelin, 2000) removes the superfluous parts of the syntax that cause β reduction to get stuck, but are not necessary to express all the same computations as the original language. (2) The dynamic approach (Wadler, 2003) adds the necessary extra steps to the operational semantics that lift buried computations to the top of the command, so that they are exposed and may take over control of the computation. Both of these techniques are an application of an idea called focusing (Andreoli, 1992; Laurent, 2002) from proof search at different points in a programs life—either at “run time” or at “compile time”—to make sure that the call-by-value and call-by- name semantics are complete without missing out on any essential capabilities of the language. 7Here, α stands for the empty, or top-level, context which is implicit in the functional expression. 83 Static focusing For the static method of focusing, consider which syntactic patterns could lead to β-stuck commands. In the call-by-value command above, 〈((µβ. 〈f ||1 · β〉), 2)||pi1 [α]〉, the problem is that a pair with a non-value component (namely the first one) is interacting with a projection co-value. Because the pair does not have values for both components, the β×V operational step does not apply. Dually, the call-by- name command above, 〈(1, 2)||pi1 [µ˜x. 〈0||α〉]〉, puts a pair value in interaction with a projection that has a non-co-value component. Because the projection does not contain a co-value, the β×N operational step does not apply. After examining all the βV rules, we see that the call-by-value βV operational semantics is only equipped to deal with certain introduction forms containing values (namely the pairing ×R, injection +R, and masking ∃R terms as well as calling →L co-terms). Similarly, the call-by-name βN operational semantics is only equipped to deal with certain introduction co-terms containing co-values (namely the projection ×L, matching +L, and calling →L, and specializing ∀L co-terms). We can rule out the problematic commands via static focusing by limiting ourselves to a sub-syntax of the dual calculi. However, since each operational semantics (both call-by-value and call-by-name) have difficulty with different parts of the syntax, static focusing effectively splits the language in two: one sub-syntax for each evaluation strategy. For call-by-value, we must bake in the notion of values into the syntax and restrict the ×R, +R, ∃R, and →L inference rules appropriately. Doing so gives us the LKQ sub-calculus (Curien & Herbelin, 2000) shown in Figure 3.15. Dually for call-by-name, we must bake in the notion of co-values into the syntax and restrict the ×L, +L, →L, and ∀L inference rules appropriately, giving the LKT sub-calculus shown in Figure 3.16. The associated type systems separate the restricted notions of (co-)values from general (co-)terms through a new form of focused sequent with a stricter sense of active formula held in a stoup (Girard, 1991). LKQ introduces values in the focus of a stoup on the right (Γ ` V : A ; ∆) and LKT introduces co-values in the focus of a stoup on the left (Γ ; E : A ` ∆). Notice how the focus of the inference rules is forcibly maintained through type checking: working bottom-up, once a (co-)value is in focus in the stoup, our active attention cannot move to any other type in the sequent via activation since the AR and AL rules do not introduce (co-)values in focus. The new form of sequent calls for additional focusing structural rules FR (in LKQ) and 84 A,B,C ∈ Type ::= X | A×B | A+B | ¬A | A→ B | ∀X.A | ∃X.A v ∈ Term ::= V | µα.c V ∈ Value ::= x | (V, V ) | ι1 (V ) | ι2 (V ) | not(e) | λx.v | ΛX.v | A@ V e ∈ CoTerm ::= α | µ˜x.c | pi1 [e] | pi2 [e] | [e, e] | not[v] | v · e | B @ e | Λ˜X.e c ∈ Command ::= 〈v||e〉 Judgement ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ ` V : A ; ∆) | (Γ | e : A ` ∆) Axiom: x : A ` x : A ; Var | α : A ` α : A CoVar Logical rules: Γ ` V : A ; ∆ Γ ` V ′ : B ; ∆ Γ ` (V, V ′) : A×B ; ∆ ×R Γ | e : A ` ∆ Γ | pi1[e] : A×B ` ∆ ×L1 Γ | e : B ` ∆ Γ | pi2[e] : A×B ` ∆ ×L2 Γ ` V : A ; ∆ Γ ` ι1(V ) : A+B ; ∆ +R1 Γ ` V : B ; ∆ Γ ` ι2(V ) : A+B ; ∆ +R2 Γ | e : A ` ∆ Γ | e′ : B ` ∆ Γ | [e, e′] : A+B ` ∆ +L Γ | e : A ` ∆ Γ ` not(e) : ¬A ; ∆ ¬R Γ ` v : A | ∆ Γ | not[v] : ¬A ` ∆ ¬L Γ, x : A ` v : B | ∆ Γ ` λx.v : A→ B ; ∆ →R Γ ` V : A ; ∆ Γ′ | e : B ` ∆′ Γ,Γ′ | V · e : A→ B ` ∆,∆′ →L Γ ` v : A | ∆ X /∈ FV (Γ ` ∆) Γ ` ΛX.v : ∀X.A ; ∆ ∀R Γ | e : A {B/X} ` ∆ Γ | B @ e : ∀X.A ` ∆ ∀L Γ ` V : A {B/X} ; ∆ Γ ` B @ V : ∃X.A ; ∆ ∃R Γ | e : A ` ∆ X /∈ FV (Γ ` ∆) Γ | Λ˜X.e : ∃X.A ` ∆ ∃L Focusing (structural) rules: Γ ` V : A ; ∆ Γ ` V : A | ∆ FR FIGURE 3.15. LKQ: The focused sub-syntax and types for the call-by-value dual calculus. 85 A,B,C ∈ Type ::= X | A×B | A+B | ¬A | A→ B | ∀X.A | ∃X.A v ∈ Term ::= x | µα.c | (v, v) | ι1 (v) | ι2 (v) | not(e) | λx.v | ΛX.v | B @ v e ∈ CoTerm ::= E | µ˜x.c E ∈ CoValue ::= α | pi1 [E] | pi2 [E] | [E,E] | not(v) | v · E | B @ E | Λ˜X.e c ∈ Command ::= 〈v||e〉 Sequent ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ | e : A ` ∆) | (Γ ; E : A ` ∆) Axiom: x : A ` x : A | Var ; α : A ` α : A CoVar Logical rules: Γ ` v : A | ∆ Γ ` v′ : B | ∆ Γ ` (v, v′) : A×B | ∆ ×R Γ ; E : A ` ∆ Γ ; pi1[E] : A×B ` ∆ ×L1 Γ ; E : B ` ∆ Γ ; pi2[E] : A×B ` ∆ ×L2 Γ ` v : A | ∆ Γ ` ι1(v) : A+B | ∆ +R1 Γ ` v : B | ∆ Γ ` ι2(v) : A+B | ∆ +R2 Γ ; e : A ` ∆ Γ ; e′ : B ` ∆ Γ ; [E,E ′] : A+B ` ∆ +L Γ | e : A ` ∆ Γ ` not(e) : ¬A | ∆ ¬R Γ ` v : A | ∆ Γ ; not[v] : ¬A ` ∆ ¬L Γ, x : A ` v : B | ∆ Γ ` λx.v : A→ B | ∆ →R Γ ` v : A | ∆ Γ′ ; E : B ` ∆′ Γ,Γ′ ; v · E : A→ B ` ∆,∆′ →L Γ ` v : A | ∆ X /∈ FV (Γ ` ∆) Γ ` ΛX.v : ∀X.A | ∆ ∀R Γ ; E : A {B/X} ` ∆ Γ ; B @ E : ∀X.A ` ∆ ∀L Γ ` v : A {B/X} ` ∆ Γ ` B @ v : ∃X.A | ∆ ∃R Γ | e : A ` ∆ X /∈ FV (Γ ` ∆) Γ ; Λ˜X.e : ∃X.A ` ∆ ∃L Focusing (structural) rules: Γ ; E : A ` ∆ Γ | E : A ` ∆ FL FIGURE 3.16. LKT: The focused sub-syntax and types for the call-by-name dual calculus. 86 FL (in LKT) which just say that every value is a term and every co-value is a co-term. However, the reverse of the focusing rules—which would say that every (co-)term is a (co-)value—are omitted in LKQ and LKT because they would collapse the distinction that the stoup has created. As it turns out (Curien & Munch-Maccagnoni, 2010), distinguishing (co-)values in type systems like LKQ and LKT correspond with the technique of focusing in proof theory developed by Andreoli (1992), Girard (1993, 2001), and Laurent (2002). In proof search, focusing makes the searching algorithm more efficient by cutting down on the search space, whereas in calculi, focusing identifies a well-behaved sub-syntax for the operational semantics. Dynamic focusing For the dynamic method of focusing, consider which steps were missing from the operational semantics. So instead of ruling out troublesome corners of the syntax, we will instead add additional steps to kick-start stuck commands. Recall that in our stuck call-by-value command, 〈((µβ. 〈f ||1 · β〉), 2)||pi1 [α]〉, the β×V operational step was stuck because a pair with a non-value component needs to interact with a projection. One thing we can do in this situation is lift the non-value component out of the pair and assign it a name via an input abstraction. Such a step reveals a hidden µV reduction and lets the computation continue to bring the application of f to the top: 〈((µβ. 〈f ||1 · β〉), 2)||pi1 [α]〉 7→? 〈µβ. 〈f ||1 · β〉||µ˜x. 〈(x, 2)||pi1 [α]〉〉 7→µV 〈f ||1 · µ˜x. 〈(x, 2)||pi1 [α]〉〉 Now, assuming that the call to f returns the result 3, the computation can continue along to present 3 as the result to α, yielding the desired answer: 〈f ||1 · µ˜x. 〈(x, 2)||pi1 [α]〉〉 7→ 〈3||µ˜x. 〈(x, 2)||pi1 [α]〉〉 7→µ˜V 〈(3, 2)||pi1 [α]〉 7→β×V 〈3||α〉 That one extra lifting step was all that was needed to continue the computation and get to the final command. Likewise, the stuck call-by-name command 〈(1, 2)||pi1 [µ˜x. 〈0||α〉]〉 has a non-co-value component in the projection, so we can similarly lift the component 87 out of the projection and assign it a name via an output abstraction: 〈(1, 2)||pi1 [µ˜x. 〈0||α〉]〉 7→? 〈µβ. 〈(1, 2)||pi1 [β]〉||µ˜x. 〈0||α〉〉 7→µ˜N 〈0||α〉 Lifting non-(co-)value components out of introduction forms of (co-)terms seems to be the missing step in β-stuck commands. The full set of such lifting rules are given in Figure 3.17 for the call-by-value semantics and Figure 3.18 for the call-by-name semantics.8 These rules give the minimum required extra steps to reduce hidden computations nested deeply inside terms and co-terms in a way that matches the call-by-value and call-by-name semantics for the λ-calculus. However, the ς laws are the first operational rules on (co-)terms, rather than commands. As such, we must extend the context of our operational reductions to allow for ς when necessary. For the call-by-value and call-by-name operational semantics including ς, we have the following evaluation contexts (denoted by D to avoid confusion with co-values): D ∈ EvalCxtV ::=  | 〈||e〉 | 〈V ||〉 D ∈ EvalCxtN ::=  | 〈v||〉 | 〈||E〉 Still, unlike the λ-context, evaluation contexts are not arbitrarily nested, but only ever place attention the entire command or its immediate (co-)term. For example, in call-by-value we have the following operational ς reductions on either side of a command like: 〈ι1 (v)||e〉 7→ς+V 〈µα. 〈v||µ˜y. 〈ι1 (y)||α〉〉||e〉 7→µV 〈v||µ˜y. 〈ι1 (y)||e〉〉 〈V ||v · e〉 7→ς→N 〈V ||µ˜x. 〈v||µ˜y. 〈x||y · e〉〉〉 7→µ˜V 〈v||µ˜y. 〈V ||y · e〉〉 and in call-by-name we have only operational ς reductions on the co-term side like: 〈v||pi1 [e]〉 7→ς×N 〈v||µ˜x. 〈µβ. 〈x||pi1 [β]〉||e〉〉 7→µ˜N 〈µβ. 〈v||pi1 [β]〉||e〉 Furthermore, note that extending the semantics of the dual calculi with the ς rules preserves determinism of the operational semantics and confluence of the reduction 8The proviso that x, y, α, and β are fresh means that they do not appear free anywhere in the command on the left-hand side of the operational reduction step. 88 (ς×V ) (v, v′) ς×V µα. 〈v||µ˜y. 〈(y, v ′)||α〉〉 (V, v) ς×V µα. 〈v||µ˜y. 〈(V, y)||α〉〉 (ς+V ) ιi (v) ς+V µα. 〈v||µ˜y. 〈ιi (y)||α〉〉 (ς→V ) v · e ς→V µ˜x. 〈v||µ˜y. 〈x||y · e〉〉 (ς∃V) B @ v ς∃V µα. 〈v||µ˜y. 〈B @ y||α〉〉  v /∈ValueV α, x, y fresh FIGURE 3.17. The focusing ς laws for the call-by-value (V) half of the dual calculi. (ς×N ) pii [e] ς×N µ˜x. 〈µβ. 〈x||pii [β]〉||e〉 (ς+N ) [e, e′] ς+N µ˜x. 〈µβ. 〈x||[β, e ′]〉||e〉 [E, e] ς+N µ˜x. 〈µβ. 〈x||[E, β]〉||e〉 (ς→N ) v · e ς→N µ˜x. 〈µβ. 〈x||v · β〉||e〉 (ς∀N ) B @ e ς∀N µ˜x. 〈µβ. 〈x||B @ β〉||e〉  e/∈CoValueN x, β fresh FIGURE 3.18. The focusing ς laws for the call-by-name (N ) half of the dual calculi. theory, since there are no critical pairs between the ς rules and µµ˜ηµηµ˜β rules in either the call-by-value or call-by-name calculus. For the µV µ˜VβVςV call-by-value operational semantics, the net effect is that the final commands are always a value yielded to a co-variable or a simple co-value (that is, a co-variable or a left introduction co-term) applied to a variable as follows: FinalCommandV ::= 〈V ||α〉 | 〈x||Es〉 V ∈ ValueV ::= x | (V, V ′) | ι1 (V ) | ι2 (V ) | not(e) | λx.v | ΛX.v | B @ V Es ∈ SimpleCoValueV ::= α | pi1 [e] | pi2 [e] | [e, e′] | not[v] | V · e | B @ e | Λ˜X.e Dually for the µN µ˜NβN ςN call-by-name operational semantics, the final commands are always a simple value (a variable or an introduction term) yielded to a co-variable or a co-value applied to a variable as follows: FinalCommandN ::= 〈Vs||α〉 | 〈x||E〉 Vs ∈ SimpleValueN ::= x | (v, v′) | ι1 (v) | ι2 (v) | not(e) | λx.v | ΛX.v | B @ v E ∈ CoValueN ::= α | pi1 [E] | pi2 [E] | [E,E ′] | not[v] | v · E | B @ E | Λ˜X.e 89 If we only take well-typed commands into consideration, then we get a standard type safety theorem which says that well-typed commands always reduce to a final command, and do not get stuck on any interacting (and potentially mismatched) introduction forms. The small-step version of type safety can be expressed as the progress and preservation properties (Wright & Felleisen, 1994). Theorem 3.3 (Progress and preservation). For any command c : (Γ ` ∆): a) Progress: c is a call-by-value (respectively, call-by-name) final command or there is a command c′ such that c 7→µV µ˜VβV ςV c′ (respectively, c 7→µN µ˜NβN ςN c′), and b) Preservation: if c 7→µV µ˜VβV ςV c′ or c 7→µN µ˜NβN ςN c′, then c′ : (Γ ` ∆). Proof. Progress follows by induction on the typing derivation of c : (Γ ` ∆). The structural rules (for weakening, contraction, and exchange) follow immediately from the inductive hypothesis and the Cut rule forms the base cases. For call-by-name, progress is assured because for every well-typed co-term Γ | e : A ` ∆, either e is a co-value, an input abstraction, or e ςN e′ for some e′. Therefore, if the cut is neither final nor reducible, then the co-term reduces. Similarly for call-by-value, every well- typed term Γ ` v : A | ∆ is either a value, an output abstraction or v ςV v′ for some v′, and every well-typed co-term Γ | e : A ` ∆ is either a simple co-value, an input abstraction, or e ςV e′ for some e′. Therefore, if the cut is neither final nor reducible, then either the term reduces or the term is a value and the co-term reduces. Preservation follows by cases on all the possible rewriting rules so that – if c µµ˜ηµηµ˜βς c′ then c : (Γ ` ∆) implies c′ : (Γ ` ∆), – if v µµ˜ηµηµ˜βς v′ then v : (Γ ` ∆)C implies v′ : (Γ ` ∆)C, and – if e µµ˜ηµηµ˜βς e′ then e : (Γ ` ∆)C implies e′ : (Γ ` ∆)C. for both call-by-value and call-by-name, using the fact that for Γ ` V : A | ∆ and Γ′ | E : A ` ∆′: – if c : (Γ′, x : A ` ∆′) then c {V/x} : (Γ′,Γ ` ∆′,∆), – if c : (Γ ` α : A,∆) then c {E/α} : (Γ′,Γ ` ∆′,∆), – if Γ′, x : A ` v : C | ∆′ then Γ′,Γ ` v {V/x} : C | ∆′,∆, – if Γ ` v : C | α : A,∆ then Γ′,Γ ` v {E/α} : C | ∆′,∆, 90 – if Γ ` v : C | ∆ and X /∈ FV (Γ ` ∆) then Γ ` v {B/X} : C {B/X} | ∆, – if Γ′, x : A | e : C ` ∆′ then Γ′,Γ | e {V/x} : C ` ∆′,∆, – if Γ | e : C ` α : A,∆ then Γ′,Γ | e {E/α} : C ` ∆′,∆, and – if Γ | e : C ` ∆ and X /∈ FV (Γ ` ∆) then Γ | e {B/X} : C {B/X} ` ∆, each of which follows by induction on the typing derivation of c, v : C and e : C. From progress and preservation, we can derive the following big-step statement of type safety. Theorem 3.4 (Type safety). For any dual calculi command c : (Γ ` ∆): – if c 7→ µV µ˜VβV ςV c′ then c′ : (Γ ` ∆) and c′ is irreducible (i.e. c′ 67→µV µ˜VβV ςV ) if and only if c′ is a call-by-value final command, and – if c 7→ µN µ˜NβN ςN c′, then c′ : (Γ ` ∆) and c′ is irreducible (i.e. c′ 67→µN µ˜NβN ςN ) if and only if c′ is a call-by-name final command. Proof. By induction on the left-to-right reflexive-transitive structure of c 7→ µV µ˜VβV ςV c′ and c 7→ µN µ˜NβN ςN c′, using progress (Theorem 3.3 (a)) for the reflexive case and preservation (Theorem 3.3 (b)) for the transitive case. Remark 3.8. The original λµµ˜-calculus used a different β rule for functions, namely: (β→) 〈λx.v||v′ · e〉 β→ 〈v′||µ˜x. 〈v||e〉〉 x /∈ FV (e) This β→ works the same for both call-by-name and call-by-value reduction; since the argument v′ is bound to x with an input abstraction, the rules of the core µµ˜- calculus take over to determine whether or not the argument is evaluated now (by a µV reduction, for example) or later (by a µ˜N reduction). Furthermore, this form of β→ reduction applies more often than the strategy-specific β→V and β→N , so we might ask if it avoids the need of focusing for functions altogether. Unfortunately, the general β→ rule still suffers a similar, if more subtle, fate as the strategy-specific β rules. For example, consider the command 〈f ||µβ. 〈1||α〉 · µ˜x. 〈0||α〉〉 which corresponds to the expression let z = f (abort 1) in 0 in a functional language containing the control operator abort that halts the current computation and yields its argument as the result. In call-by-value this expression should evaluate to 1, and in call-by-name it 91 should evaluate to 0, but the β→ rule does not help us since there is a free variable f instead of a λ-abstraction. In this command, the ς rules are still necessary to get the final result, and unfortunately combining the general β→ rule with ς→ creates a mild form of non-determinism in the operational semantics since some β→ redexes are also ς→ redexes (though the associated reduction theories are still confluent). As it turns out, though, the combination of lifting and strategy-specific β→ reductions are more powerful than the generalized β→ rule. In call-by-value, the combination of ς→V , µ˜V , and β→V exactly simulate the λµµ˜-calculus β→ rule as follows: 〈λx.v||v′ · e〉 7→ς→V 〈λx.v||µ˜y. 〈v′||µ˜x. 〈y||x · e〉〉〉 7→µ˜V 〈v′||µ˜x. 〈λx.v||x · e〉〉→β→V 〈v||µ˜x. 〈v||e〉〉 In call-by-name, observe that the combination of λµµ˜’s β→ and µ˜N rules simulate the call-by-name-specific β→N even when the call stack is not a co-value, 〈λx.v||v′ · e〉 7→β→ 〈v′||µ˜x. 〈v||e〉〉 7→µ˜N 〈v {v′/x}||e〉 but together the µ˜Nηµβ→N ς→N rules perform the same reduction as follows: 〈λx.v||v′ · e〉 7→ς→N 〈λx.v||µ˜y. 〈µα. 〈y||v′ · α〉||e〉〉 7→µ˜N 〈µα. 〈λx.v||v′ · α〉||e〉 →β→N 〈µα. 〈v {v′/x}||α〉||e〉 →ηµ 〈v {v′/x}||e〉 So even though type safety (Theorem 3.4) cannot dispense with the ς→ rules by adopting the λµµ˜-calculus’ original β→ rules, we can still rely on the combination of strategy-specific β→ς→ rules from Figures 3.13 and 3.17 and Figures 3.14 and 3.18 to get all the same results with deterministic operational semantics. End remark 3.8. Static versus dynamic focusing Now that we have two different methods for addressing β-stuck commands, one question still remains: what do the static and dynamic methods have to do with one another? As it turns out, they are compatible and complementary solutions to the same problem—two sides of the same coin—that apply the same essential idea at different times. First, one of the major features of static focusing in proof theories and type systems is that the apparent restriction on inference rules is no real restriction at all: every program (i.e. proof) in the original system has a corresponding program with the same type (i.e. specification) in the focused sub-system. We can make this claim 92 J〈v||e〉KQ , 〈JvKQ∣∣∣∣∣∣JeKQ〉JxKQ , x Jµα.cKQ , µα.JcKQJ(v, v′)KQ , µα. 〈JvKQ∣∣∣∣∣∣µ˜x. 〈J(x, v′)KQ∣∣∣∣∣∣α〉〉 J(V, v)KQ , µα. 〈JvKQ∣∣∣∣∣∣µ˜x. 〈J(V, x)KQ∣∣∣∣∣∣α〉〉J(V, V ′)KQ , (JV KQ, JV ′KQ)Jιi (v)KQ , µα. 〈JvKQ∣∣∣∣∣∣µ˜x. 〈Jιi (x)KQ∣∣∣∣∣∣α〉〉 Jιi (V )KQ , ιi (JV KQ)Jnot(e)KQ , not(JeKQ)Jλx.v′KQ , λx.Jv′KQ JΛX.v′KQ , ΛX.Jv′KQJB @ vKQ , µα. 〈v∣∣∣∣∣∣µ˜x. 〈JB @ xKQ∣∣∣∣∣∣α〉〉 JB @ V KQ , B @ JV KQ JαKQ , α Jµ˜x.cKQ , µ˜x.JcKQJpii [e]KQ , pii [JeKQ] J[e, e′]KQ , [JeKQ, Je′KQ] Jnot[v′]KQ , not[Jv′KQ]Jv · eKQ , µ˜x. 〈JvKQ∣∣∣∣∣∣µ˜y. 〈x∣∣∣∣∣∣Jy · eKQ〉〉 JV · eKQ , JV KQ · JeKQ JB @ eKQ , B @ JeKQ qΛ˜X.eyQ , Λ˜X.JeKQ where v /∈ ValueV FIGURE 3.19. The Q-focusing translation to the LKQ sub-syntax. more formally for LKQ and LKT by observing that the syntactic transformations in Figures 3.19 and 3.20 translate general dual calculi expressions into the LKQ and LKT sub-syntaxes, respectively, with the same type (by generalizing the proof of preservation in Theorem 3.3 (b)). These translations are defined in such a way that an expression that happens to already lie in the LKQ sub-syntax is not altered by Q-focusing translation, and likewise LKT expressions are not altered by T -focusing translation. With the focusing translations and the ς reduction theory in hand, we can now observe that both the static and dynamic methods of focusing amount to the same thing. In particular, notice that the LKQ sub-syntax is just the ςV-normal forms from the original dual calculus and the Q-focusing translation performs call-by-value ςV-normalization, and similarly the T -focusing translation is just call-by-name ςN - normalization into the LKT sub-syntax of ςN -normal forms, which can be confirmed by induction on the syntax of (co-)terms and commands. 93 J〈v||e〉KT , 〈JvKT ∣∣∣∣∣∣JeKT〉 JxKT , x Jµα.cKT , µα.JcKTJ(v, v′)KT , (JvKT , Jv′KT) Jιi (v)KT , pii [JvKT ] Jnot(e)KT , not(JeKT )Jλx.vKT , λx.JvKT JΛX.vKT , ΛX.JvKT JB @ vKT , B @ JvKT JαKT , α Jµ˜x.cKT , µ˜x.JcKTJpii [e]KT , µ˜x. 〈µα. 〈x∣∣∣∣∣∣Jpii [α]KT〉∣∣∣∣∣∣JeKT〉 Jpii [E]KT , pii [JEKT ]J[e, e′]KT , µ˜x. 〈µα. 〈x∣∣∣∣∣∣J[α, e′]KT〉∣∣∣∣∣∣JeKT〉 J[E, e]KT , µ˜x. 〈µα. 〈x∣∣∣∣∣∣J[E,α]KT〉∣∣∣∣∣∣JeKT〉J[E,E ′]KT , [JEKT , JE ′KT ] Jnot[v]KT , not[JvKT ]Jv · eKT , µ˜x. 〈µα. 〈x∣∣∣∣∣∣Jv · αKT〉∣∣∣∣∣∣JeKT〉 Jv · EKT , JvKT · JEKTJB @ eKT , µ˜x. 〈µα. 〈x∣∣∣∣∣∣JB @ αKT〉∣∣∣∣∣∣JeKT〉 JB @ EKT , B @ JEKTq Λ˜X.e′ yT , Λ˜X.Je′KT where e /∈ CoV alueN FIGURE 3.20. The T -focusing translation to the LKT sub-syntax. 94 Theorem 3.5 (Focusing). – Every LKQ command, term, and co-term is a ςV- normal form, and cςV JcKQ, v ςV JvKQ, and eςV JeKQ. – Every LKT command, term, and co-term is a ςN -normal form, and cςN JcKT , v ςN JvKT , and eςN JeKT . Proof. The fact that LKQ expressions are ςV-normal forms and LKT expressions are ςN -normal forms is apparent from the syntax of LKQ and LKT. Furthermore, the fact that cςV JcKQ, cςN JcKT , and so on follows by mutual induction on the syntax of commands and (co-)terms. Therefore, the difference between the static and dynamic methods of focusing is not a matter of what but when: do we prefer to leave ς redexes to happen during execution, or would we rather reduce them all up front as a preprocessing pass? Remark 3.9. By representing a calling context with an explicit syntactic object e, we have a direct representation of a tail-recursive interpreter (Ariola et al., 2009a), which can also be seen as a form of abstract machine. In particular, we may view the syntax of the dual calculi as a more abstract representation of a CEK-style machine (Felleisen & Friedman, 1986) or a Krivine-style machine (Krivine, 2007): the control (C) is represented by a term v, the continuation (K) is represented by a co-term e, and the environment (E) is implicit and instead implemented by the capture-avoiding substitution operation. Finally, the configuration state of the machine is represented by a command c. Interestingly, though, the treatment of focusing in these machines tends to be asymmetrical depending on the evaluation strategy: call-by-value abstract machines tend to rely on dynamic focusing during execution, whereas call-by-name abstract machines tend to maintain static focusing. For example, consider a variation on a Krivine machine with implicit substitution for call-by-name evaluation of λ-calculus terms: 〈v v′||E〉 〈v||E[ v′]〉 〈λx.v||E[ v′]〉 〈v {v′/x}||E〉 This machine uses two forms of evaluation context—the application of the computation in question to an argument, E[ v′], and the empty context, —for finding the next β-redex to perform. We can relate the states of this call-by-name machine to the call- by-name dual calculus by translating the evaluation contexts to co-terms. The empty 95 context can be represented by just an arbitrary co-variable α, and the application to an argument is represented directly as a call stack co-term: E[ v′] , v′ ·E. With this interpretation, the first rule of the machine states the relationship between function application in the λ-calculus and call stacks in the dual calculus, and the second rule is exactly the β→N operational step. Note that if we always start with a co-value in the machine state then the first rule only ever builds co-values in the LKT sub-syntax. For example, by evaluating a term v in the “empty context” as 〈v||α〉, the co-term in the machine will always be a chain of call stacks with some number of arguments like v1 · v2 · v3 · v4 · α. Therefore, this Krivine-style machine operates within the statically focused LKT sub-syntax. Now consider the following variation on a CEK machine with implicit substitution for call-by-value evaluation of λ-calculus terms: 〈v v′||E〉 〈v||E[ v′]〉 〈V ||E[ v]〉 〈v||E[V ]〉 〈V ||E[(λx.v) ]〉 〈v {V/x}||E〉 Compared to the call-by-name machine above, the machine uses one additional form of evaluation context—the application of a function value to the computation in question E[V ]—for finding the next β-redex to perform. We can extend the previous translation of evaluation contexts to co-terms so that an applied function value is represented indirectly with an input abstraction: E[V ] , µ˜x. 〈V ||x · E〉. With this interpretation, the first rule of the machine relates function application and call stacks as before, the second rule of the machine is a combined ς→V µ˜V step, 〈V ||v · E〉 7→ς→V 〈V ||µ˜x. 〈v||µ˜y. 〈x||y · E〉〉〉 7→µ˜V 〈v||µ˜y. 〈V ||y · E〉〉 and the last rule is a combined µ˜Vβ→V step: 〈V ||µ˜y. 〈λx.v||y · E〉〉 7→µ˜V 〈λx.v||V · E〉 7→β→V 〈v {V/x}||E〉 Notice that this machine does not necessarily operate within the LKQ sub-syntax: the first rule might push a non-value computation onto a call stack. In this case, the ς→V rule is needed to refocus the machine during execution. Of course, we could avoid the need for ς→V reduction at run-time by changing our interpretation of application 96 to pre-ς→V -normalize the call stack, as in E[ v] , µ˜x. 〈v||µ˜y. 〈x||y · E〉〉. However, this is just a matter of taste since the two timings of focusing amount to the same thing (Theorem 3.5). End remark 3.9. Call-by-value is dual to call-by-name We now turn to the duality for which the dual calculi are named. We saw how the symmetries of the sequent calculus present a logical duality that captures De Morgan duals in Section 3.1. This duality is carried over by the Curry-Howard isomorphism and presents itself as two dualities in programming languages: (1) a duality between the static semantics (types) of languages, and (2) a duality between the dynamic semantics (reductions) of languages. These dualities of programming languages were first observed by Filinski (1989) from the correspondence with duality in category theory, which was later expanded upon by Selinger (2001, 2003) in the style of natural deduction. Curien & Herbelin (2000) and Wadler (2003, 2005) brought this duality to the language of sequent calculus, and show how it is better reflected in the language as a duality of syntax corresponding to the inherent symmetries in the logic. The static aspect of duality between types comes directly from the logical duality of the sequent calculus. Since duality spins a sequent around its turnstyle, so that assumptions are exchanged with conclusions, we also have a corresponding swap in the programming language. The dual of a term v of type A is a co-term of the dual type and vice versa, so that the term and co-term components of a command are swapped. Likewise, the duality on types lines up directly with the De Morgan duality on logical propositions. For example, since the types for pairs (×) and sums (+) correspond to conjunction (∧) and disjunction (∨), we have the same relationship with the duality operation C⊥: (A×B)⊥ , (A⊥) + (B⊥) (A+B)⊥ , (A⊥)× (B⊥) Also following the De Morgan duality, negation (¬) is self-dual. However, just like we found in Gentzen’s LK sequent calculus in Section 3.1, the dual calculi presented in Figure 3.12 are missing the counterpart to functions. By analogy, we complete the duality of components in the calculus by adding the dual of 97 functions, also referred to as subtraction, that represent a transformation on co-terms, as the counterpart to a transformation on terms. The typing rules for subtraction are the same as the logical rules for subtraction in LK, and the syntax is reversed from functions in the dual calculi: Γ | e : A ` ∆ Γ′ ` v : B | ∆′ Γ,Γ′ ` e · v : B − A | ∆,∆′ −R Γ | e : B ` α : A,∆ Γ | λ˜α.e : B − A ` ∆ −L Similarly, the β− and ς− operational rules for subtraction are the mirror image of the corresponding rules for functions. In call-by-value we have: V alueV ::= . . . | e · V (β−V ) 〈 E · V ∣∣∣∣∣∣λ˜α.e〉 β−V 〈V ||e {E/α}〉 (ς−V ) e · v ς−V µα. 〈v||µ˜y. 〈e · y||α〉〉 (v /∈ V alueV , x fresh) and in call-by-name we have: CoV alueN ::= . . . | λ˜α.e (β−N ) 〈 E · V ∣∣∣∣∣∣λ˜α.e〉 β−N 〈V ||e {E/α}〉 (ς−N ) e · v ς−N µα. 〈µβ. 〈β · v||α〉||e〉 (e /∈ CoV alueN , α fresh) With the dual counterpart to functions in place, the full duality relationship of types and programs of the dual calculi is defined in Figure 3.21, where we assume an underlying involutive bijection x and α between variables and co-variables.9 First, notice that the duality operation is involutive on the nose: the dual of the dual is exactly the same as the original (Wadler, 2003). Theorem 3.6 (Involutive duality). The duality operation ⊥ on environments, sequents, types, commands, terms, and co-terms is involutive, so that ⊥⊥ is the identity transformation. Proof. By mutual induction on the definition of the duality operation ⊥ 9By an involutive bijection, we mean that x gives a (co-)variable and α gives a variable such that x ≡ y and α ≡ β if and only if x ≡ y and α ≡ β, and also that x ≡ x and α ≡ α. 98 Duality of sequents: (c : (Γ ` ∆))⊥ , c⊥ : (∆⊥ ` Γ⊥) (Γ ` v : A | ∆)⊥ , ∆⊥ | v⊥ : A⊥ ` Γ⊥ (Γ | e : A ` ∆)⊥ , ∆⊥ ` e⊥ : A⊥ | Γ⊥ (xn : An, . . . , x1 : A1)⊥ , x⊥1 : A⊥1 , . . . , x⊥n : A⊥n (α1 : A1, . . . , αn : An)⊥ , α⊥n : A⊥n , . . . , α⊥1 : A⊥1 Duality of types: (X)⊥ , X (A×B)⊥ , (A⊥) + (B⊥) (A+B)⊥ , (A⊥)× (B⊥) (A→ B)⊥ , (B⊥)− (A⊥) (B − A)⊥ , (A⊥)→ (B⊥) (∀X.A)⊥ , ∃X.(A⊥) (∃X.A)⊥ , ∀X.(A⊥) (¬A)⊥ , ¬(A⊥) Duality of programs: 〈v||e〉⊥ , 〈 e⊥ ∣∣∣∣∣∣v⊥〉 (x)⊥ , x [α]⊥ , α (µα.c)⊥ , µ˜α.c⊥ [µ˜x.c]⊥ , µx.c⊥ (v1, v2)⊥ , [ v⊥1 , v ⊥ 2 ] [e1, e2]⊥ , ( e⊥1 , e ⊥ 2 ) ι1 (v)⊥ , pi1 [ v⊥ ] pi1 [e]⊥ , ι1 ( e⊥ ) ι2 (v)⊥ , pi2 [ v⊥ ] pi2 [e]⊥ , ι2 ( e⊥ ) not(e)⊥ , not[e⊥] not[v]⊥ , not(v⊥) (λx.v)⊥ , λ˜x.[v⊥] [λ˜α.e]⊥ , λα.(v⊥) (e · v)⊥ , e⊥ · v⊥ [v · e]⊥ , v⊥ · e⊥ (ΛX.v)⊥ , Λ˜X.[v⊥] [Λ˜X.e]⊥ , ΛX.(e⊥) (B @ v)⊥ , B⊥ @ [v⊥] [B @ e]⊥ , B⊥ @ (e⊥) FIGURE 3.21. The duality relation between the dual calculi. 99 This relationship is not just a syntactic word game, but it gives us a duality between the typing derivations of terms and co-terms (Curien & Herbelin, 2000; Wadler, 2003): Theorem 3.7 (Static duality). a) c : (Γ ` ∆) is well-typed if and only if c⊥ : (∆⊥ ` Γ⊥) is. b) Γ ` v : A | ∆ is well-typed if and only if ∆⊥ | v⊥ : A⊥ ` Γ⊥ is. c) Γ | e : A ` ∆ is well-typed if and only if ∆⊥ ` e⊥ : A⊥ | Γ⊥ is. Furthermore, if a command, term, or co-term lies in the LKQ sub-syntax, its dual lies in the LKT sub-syntax and vice versa. Proof. By induction on the typing derivation. The dynamic aspect of duality takes form as a relationship between the two reduction systems for evaluating programs: call-by-value reduction is dual to call-by- name reduction. That is, if we have a command c that behaves a certain way according to the call-by-value calculus, then the dual command c⊥ behaves in a correspondingly dual way according to the call-by-name calculus, and vice versa. The two dynamic semantics (operational, reduction, and equational) mirror each other exactly, rule for rule (Curien & Herbelin, 2000; Wadler, 2003). Theorem 3.8 (Dynamic duality). a) c µV µ˜VβV c′ if and only if c⊥ µN µ˜NβN c′⊥, and dually c µN µ˜NβN c′ if and only if c⊥ µV µ˜VβV c′⊥. b) v ηµςV v′ if and only if v⊥ ηµ˜ςN v′⊥, and dually v ηµςN v′ if and only if v⊥ ηµ˜ςV v′⊥. c) e ηµ˜ςV e′ if and only if e⊥ ηµςN e′⊥, and dually e ηµ˜ςN e′ if and only if e⊥ ηµςV e′⊥. Proof. By cases on the respective rewriting rules, using the fact that substitution commutes with duality ((c {V/x})⊥ =α c⊥ { V ⊥/x } , (c {E/α})⊥ =α c⊥ { E⊥/α } , (c {A/X})⊥ =α c⊥ { A⊥/X } , and similarly for (co-)terms) which is guaranteed by the fact that the duality operation is compositional and hygienic (Downen & Ariola, 2014a). 100 CHAPTER IV Polarity Looking back to Gentzen’s original LK from Figure 3.5, a careful eye might notice that there is a bit of an inconsistency among the logical rules. In particular, compare left implication introduction (⊃L) with right conjunction (∧R) and left disjunction (∨L) introduction and notice how they treat their auxiliary propositions (hypotheses Γ and consequences ∆) very differently. In both the ∧R and ∨L rules, the auxiliary propositions are shared among both premises and the deduction: each sequent contains exactly the same extra hypotheses (Γ) and consequences (∆). However, the ⊃L rule does not follow this pattern. Instead, the two premises of the ⊃L rule contain different auxiliary propositions from one another, which are then combined together in the deduction: each sequent contains potentially different hypotheses and consequences. Why are the rules for implication appear so different from the rules for conjunction and disjunction? Is this merely a notational accident, or is there some significance to the way these side propositions are threaded through the proof tree? As it turns out, we can classify the logical connectives in a way that emphasizes this distinction, which through the Curry-Howard lense has a profound impact on our understanding of the computational nature of the sequent calculus. Before in Chapter III, we found that the sequent calculus shows us the duality between evaluation strategies—namely the call-by-value and call-by-name strategies—via two distinct languages with the same syntax but different semantics. This distinction between the semantics of the dual call-by-value and call-by-name calculi becomes apparent when we consider the operational behavior of programs. For example, Wadler (2003) was able to encode functions in terms of the other connectives, but surprisingly different encodings are necessary for both call-by-value and call-by-name. Even though they share a syntax, the two dual calculi truly describe different languages. Instead, we will soon find that an alternative interpretation of the sequent calculus lets us express the same duality of evaluation within the same language, so that a single program might employ both call-by-value and call-by-name during its execution. 101 Additive and Multiplicative LK Recall back to the basic left introduction inference rules for conjunction in Figure 3.2. These rules state that if A is false then A ∧ B is false, and likewise if B is false then A ∧ B is false as well. However, there is another presentation of conjunction that makes use of the internal structure of sequents. We originally decided in Chapter III to interpret a sequent as meaning that the truth of all hypotheses entails the truth of one consequence. So for example, the sequent A,B ` C,D means that “A and B entails C or D.” In other words, the commas to the left are pronounced as “and,” and the commas to the right are pronounced “or.” We might then formalize this interpretation by saying that the logical connective for conjunction actually corresponds to a comma on the left, so that the sequents A,B ` and A ∧B ` are equally valid as shown by the two inferences which reverse one another (from bottom-up to top-down): A,B ` A ∧B ` A ∧B ` A,B ` Notice how the sequents A,B ` and A ∧ B ` are equivalent statements since both mean that “A and B entails false,” which justifies that the above inference rules are valid. Likewise, we could equate the logical connective for disjunction with a comma on the right, so that the sequents ` A,B and ` A∨B are equally valid as shown by the inferences: ` A,B ` A ∨B ` A ∨B ` A,B This gives an alternative to the right introduction rules for disjunction in contrast to the ones given in Figure 3.3. Notice how the above alternative rules for conjunction and disjunction are reversible: both the top-down and bottom-up inferences are valid. More generally, an inference of the form H1 H2 . . . Hn J 102 is reversible when there are derivations D1,D2, . . . ,Dn for each of J.... D1 H1 J.... D2 H2 . . . J.... D3 Hn and irreversible otherwise. So we can say that the above alternative left introduction of conjunction and right introduction of disjunction are both reversible. These formulations of conjunction and disjunction contrast with the rules that were given in Figures 3.2 and 3.3. Clearly A ` (i.e. “A is false”) is a much stronger statement than A ∧ B ` (i.e. “the conjunction of A and B is false”), so the left introduction rules given in Figure 3.2 are irreversible. Likewise, ` A (i.e. “A is true”) is a much stronger statement than ` A ∨B (i.e. “either A or B is true”), so the right introduction rules for conjunction given in Figure 3.3 are also irreversible. It seems that we have a substantive choice on how we might phrase conjunction and disjunction in the setting of the sequent calculus. Instead of just arbitrarily choosing one of them, we can consider all the possibilities at once in the same logic as shown in Figure 4.1. In this combined logic, we have two separate logical connectives for conjunction and two connectives for disjunction. Additionally, there are two separate constants (i.e. nullary connectives) for truth and falsehood. Our original formulation of conjunction (∧) and disjunction (∨) in LK from Figure 3.5 are preserved as the & and ⊕ connectives, respectively, as well as truth (>) and falsehood (⊥) which go by the same name. The new alternatives for truth, falsehood, conjunction, and disjunction discussed above are denoted by the 1, 0, ⊗, and ` connectives, respectively. Finally, the presentation of negation (¬) and implication (⊃) is unchanged. For now we delay further discussion of the quantifiers until Chapter VI. Now we can more formally analyze the reversibility of the logical rules for the different variations of the connectives. The left introduction for ⊗-conjunction and ⊕- disjunction are reversible because the sequent Γ, A,B ` ∆ follows from Γ, A⊗B ` ∆ and each of Γ, A ` ∆ and Γ, B ` ∆ follow from Γ, A⊕B ` ∆: A ` A Ax B ` B Ax A,B ` A⊗B ⊗R Γ, A⊗B ` ∆ Γ, A,B ` ∆ Cut 103 A,B,C ∈ Proposition ::= X | 0 | 1 | A⊕B | A⊗B | > | ⊥ | A&B | A`B | A ⊃ B | ¬A Γ ∈ Hypothesis ::= A1, . . . , An ∆ ∈ Consequence ::= A1, . . . , An Judgement ::= Γ ` ∆ Axiom and cut: A ` A Ax Γ ` A,∆ Γ′, A ` ∆′ Γ′,Γ′ ` ∆′,∆ Cut Logical rules: ` 1 1R Γ ` ∆ Γ, 1 ` ∆ 1L no 0R rule Γ, 0 ` ∆ 0L Γ ` A,∆ Γ′ ` B,∆′ Γ,Γ′ ` A⊗B,∆,∆′ ⊗R Γ, A,B ` ∆ Γ, A⊗B ` ∆ ⊗L Γ ` >,∆ >R not >L rule Γ ` ∆Γ ` ⊥,∆ ⊥R ⊥ ` ⊥L Γ ` A,∆ Γ ` B,∆ Γ ` A&B,∆ &R Γ, A ` ∆ Γ, A&B ` ∆ &L1 Γ, B ` ∆ Γ, A&B ` ∆ &L2 Γ ` A,∆ Γ ` A⊕B,∆ ⊕R1 Γ ` B,∆ Γ ` A⊕B,∆ ⊕R2 Γ, A ` ∆ Γ, B ` ∆ Γ, A⊕B ` ∆ ⊕L Γ ` A,B,∆ Γ ` A`B,∆ `R Γ, A ` ∆ Γ, B ` ∆′ Γ,Γ′, A`B ` ∆,∆′ `L Γ, A ` ∆ Γ ` ¬A,∆ ¬R Γ ` A,∆ Γ,¬A ` ∆ ¬L Γ, A ` B,∆ Γ ` A ⊃ B,∆ ⊃R Γ ` A,∆ Γ′, B ` ∆′ Γ,Γ′, A ⊃ B ` ∆,∆′ ⊃L Structural rules: Γ ` ∆ Γ ` A,∆ WR Γ ` ∆ Γ, A ` ∆ WL Γ ` A,A,∆ Γ ` A,∆ CR Γ, A,A ` ∆ Γ, A ` ∆ CL Γ ` ∆, A,B,∆′ Γ ` ∆, B,A,∆′ XR Γ, A,B,Γ′ ` ∆ Γ, B,A,Γ′ ` ∆ XL FIGURE 4.1. An additive and multiplicative LK sequent calculus: with two truths (1, >), two falsehoods (0, ⊥), two conjunctions (⊗, &), two disjunctions (⊕, `), one negation (¬), and one implication (⊃). 104 A ` A Ax A ` A⊕B ⊕R1 Γ, A⊕B ` ∆ Γ, A ` ∆ Cut B ` B Ax B ` A⊕B ⊕R2 Γ, A⊕B ` ∆ Γ, B ` ∆ Cut However, the right rules of ⊗-conjunction and ⊕-disjunction are irreversible because the premises are stronger than the conclusion. Clearly neither Γ ` A,∆ nor Γ ` B,∆ follow from the weaker sequent Γ ` A⊕B,∆, but also neither Γ ` A,∆ nor Γ′ ` B,∆′ follow from Γ,Γ′ ` A ⊗ B,∆,∆′ because of the way that the side-propositions Γ,Γ′ and ∆,∆′ from the conclusion are split up between the two premises. In contrast, the right introduction rules for &-conjunction, `-disjunction, and →-implication are reversible because the premises are weak enough to be proved from the conclusions: Γ ` A&B,∆ A ` A Ax A&B ` A &L1 Γ ` A,∆ Cut Γ ` A&B,∆ B ` B Ax A&B ` B &L2 Γ ` B,∆ Cut Γ ` A`B,∆ A ` A Ax B ` B Ax A`B ` A,B `L Γ ` A,B,∆ Cut Γ ` A ⊃ B,∆ A ` A Ax B ` B Ax A,A ⊃ B ` B ⊃L Γ, A ` ∆, B Cut.... XR Γ, A ` B,∆ However, each of the &-conjunction,`-disjunction, and⊃-implication left introduction rules are irreversible for similar reasons as the right introduction rules for ⊗- conjunction and ⊕-disjunction. Clearly neither Γ, A ` ∆ nor Γ, B ` ∆ follow from the weaker sequent Γ, A&B ` ∆. Furthermore, both the `L and ⊃L share the same splitting problem that causes the irreversibility of ⊗R. One consequence of reversibility is that any derivation whose conclusion matches the conclusion of a reversible rule might as well end with that reversible rule, because we can always extract out the premises to the rule and then reassemble the same conclusion. For example, suppose that we have a derivation D of the sequent Γ ` A& B,∆, where the proposition A&B appears on the right side. Then by the reversibility of the &R rule noted above, we have derivations from Γ ` A&B,∆ to Γ ` A,∆ and Γ ` A,∆, which we will denote by the names &R1−1 and &R2−1 respectively. These two reverse derivations let us expand D to get an extended derivation which ends with 105 &R as follows: .... D Γ ` A&B,∆ ≺ .... D Γ ` A&B,∆.... &R1−1 Γ ` A,∆ .... D Γ ` A&B,∆.... &R2−1 Γ ` B,∆ Γ ` A&B,∆ &R Similarly, we can expand arbitrary derivations of sequents with A`B or A→ B on the right side using the derivations `R−1 and →R−1 which reverse the `R and →R right introduction rules: .... D Γ ` A`B,∆ ≺ .... D Γ ` A`B,∆.... `R−1 Γ ` A,B,∆ Γ ` A`B,∆ `R .... D Γ ` A→ B,∆ ≺ .... D Γ ` A→ B,∆.... →R−1 Γ, A ` B,∆ Γ ` A→ B,∆ →R The same expansion also occurs when the proposition A⊕B or A⊗B appears on the left of the concluding sequent, by using the ⊕L1−1, ⊕L2−1, and ⊗L−1 reverse derivations of the ⊕L and ⊗L left introduction rules. .... D Γ, A⊕B ` ∆ ≺ .... D Γ, A⊕B ` ∆.... ⊕L1−1 Γ, A ` ∆ .... D Γ, A⊕B ` ∆.... ⊕L2−1 Γ, B ` ∆ Γ, A⊕B ` ∆ ⊕L .... D Γ, A⊗B ` ∆ ≺ .... D Γ, A⊗B ` ∆.... ⊗L−1 Γ, A,B ` ∆ Γ, A⊗B ` ∆ ⊗L So in comparison with natural deduction, whereas the steps of cut elimination (Section 3.1) in the sequent calculus correspond with local soundness (Section 2.1), the above reversibility expansions correspond with local completeness. With both variations of the connectives included in a single logic, we can compare and contrast them by the emergent properties of their logical rules. Notice how the auxiliary hypotheses Γ and consequences ∆ in the &R and ⊕L rules are shared among 106 both premises as well in the conclusion, so that Γ and ∆ are “copied” when the rules are read from the bottom-up. Because the side-propositions are copied bottom-up, we say that the &-conjunction and ⊕-disjunction are additive connectives. In contrast, in each of the ⊗R, `L, and ⊃L rules the two premises contain different auxiliary hypotheses and consequences which are “merged” when the rules are read from the top-down. Because the side-propositions are merged top-down, we say that the ⊗- conjunction, `-disjunction, and ⊃-implication are multiplicative connectives. In the degenerate case for the nullary connectives, we can say that > and 0 are additive because the Γ and ∆ in the conclusion of their only introduction rule (>R and ⊥L) is “copied” among its zero premises, whereas the 1 and 0 have rules (1R and 0L) that “merge” the hypothesis and conclusions from their zero premises into the conclusion. Note that ¬-negation is neither additive nor multiplicative—or perhaps it could be considered both additive and multiplicative—since both its right and left introduction rules have exactly one premise. Besides the additive-multiplicative distinction, there is another axis which is perhaps more fundamental upon which we can classify the connectives. Recall the previous discussion of reversibility of the inference rules that lead us to consider ⊗- conjunction and `-disjunction as alternatives to the &-conjunction and ⊕-conjunction that were inherited from Gentzen’s LK. Both the ⊗-conjunction and ⊕-disjunction have reversible left introductions because the premises are weak enough to be proved from the conclusion. On the flip side, we saw that &-conjunction, `-disjunction, and ⊃-implication have reversible right introductions for dual reasons. We can thus divide the logical connectives based on two polarities: connectives with reversible left introductions and irreversible right introductions are positive, and dually connectives with reversible right introductions and irreversible left introductions are negative. Based on our previous analysis, we can say that ⊗-conjunction and ⊕-disjunction are positive, whereas &-conjunction, `-disjunction, and ⊃-disjunction are negative. Note again that ¬-negation does not directly participate in this classification and is neutral with regard to polarity because both the left and right ¬ introductions are reversible, making it both—or neither, depending on our perspective—positive and negative at the same time. We can thus categorize all the binary connectives along the additive-multiplicative and positive-negative axes, as shown in Figure 4.2. These two classifications are enough to separate all of the connectives into different quadrants 107 Positive Negative Additive ⊕ & Multiplicative ⊗ `, ⊃ FIGURE 4.2. The positive/negative and additive/multiplicative classification of binary connectives. based on their properties, so that only ` and ⊃ share the same quadrant showing that these are the two connectives that are most similar to one another. Pattern Matching and Extensionality Let us now consider a language for the additive and multiplicative LK sequent calculus which is well suited for expressing the polarity of connectives within the form of its expressions. The language shown in Figure 4.3, which extends the core µµ˜-calculus from Figure 3.7, is based on Munch-Maccagnoni’s (2009) system L family of calculi.1 System L is visually rather different from the dual calculi we studied previously in Chapter III, where its most obvious first departure from the dual calculi is its pervasive use of pattern-matching as a core language construct. One way to understand the role of pattern-matching in programming and its connection to polarity is to look at Dummett’s 1976 lectures (Dummett, 1991) on the justification of logical principles. In essence, Dummett suggested that there are effectively two ways for framing the meaning of logical laws, which reveals a certain bias in the logician: the verificationist and the pragmatist. In the eyes of a verificationist, it is the rules for proving a proposition (corresponding to the right introduction rules in either natural deduction or sequent calculus) that give meaning to a logical connective. These are the primitive rules for a connective that define its character. All the other rules of a connective (the elimination or left rules) must then be justified with respect to its right introductions. In other words, the meaning of a proposition can be devised from its canonical proofs (Prawitz, 1974) composed of right introduction rules, and the other rules are sound with respect to them. This is an alternative to the global property of cut elimination from Section 3.1 that is more similar to local soundness for natural deduction described in Section 2.1. 1We consider here the two-sided variant of system L to make easier comparisons with the other languages for the sequent calculus. 108 A,B,C ∈ Type ::= X | 0 | 1 | A⊕B | A⊗B | ∼A | > | ⊥ | A&B | A`B | A→ B | ¬A v ∈ Term ::= x | µα.c | () | ι1 (v) | ι2 (v) | (v, v) | ∼ (e) | µ([].c) | µ(pi1 [α].c | pi2 [β].c) | µ([α, β].c) | µ([x · β].c) | µ(¬ [x].c) e ∈ CoTerm ::= α | µ˜x.c | µ˜[().c] | µ˜[ι1 (x).c | ι2 (y).c] | µ˜[(x, y).c] | µ˜[∼ (α).c] | pi1 [e] | pi2 [e] | [e, e] | v · e | ¬ [v] c ∈ Command ::= 〈v||e〉 Logical rules: ` () : 1 | 1R c : (Γ ` ∆) Γ | µ˜[().c] : 1 ` ∆ 1L not 0R rule Γ | µ˜[] : 0 ` ∆ Γ ` v : A | ∆ Γ ` ι2 (v) : A⊕B | ∆ ⊕R1 Γ ` v : B | ∆ Γ ` ι1 (v) : A⊕B | ∆ ⊕R2 c : (Γ, x : A ` ∆) c′ : (Γ, y : B ` ∆) Γ | µ˜[ι1 (x).c | ι2 (y).c′] : A⊕B ` ∆ ⊕L Γ ` v : A | ∆ Γ ` v′ : B | ∆ Γ,Γ′ ` (v, v′) : A⊗B | ∆,∆′ ⊗R c : (Γ, x : A, y : B ` ∆) Γ | µ˜[(x, y).c] : A⊗B ` ∆ ⊗L Γ | e : A ` ∆ Γ ` ∼ (e) : ∼A | ∆ ∼R c : (Γ ` α : A,∆) Γ | µ(∼ (α).c) : ∼A ` ∆ ∼L Γ ` µ() : > | ∆ >R no >L rule c : (Γ ` ∆) Γ ` µ([].c) : ⊥ | ∆ ⊥R | [] : ⊥ ` ⊥L c : (Γ ` α : A,∆) c′ : (Γ ` β : B,∆) Γ ` µ(pi1 [α].c | pi2 [β].c′) : A&B | ∆ &R Γ | e : A ` ∆ Γ | pi1 [e] : A&B ` ∆ &L1 Γ | e : B ` ∆ Γ | pi2 [e] : A&B ` ∆ &L2 c : (Γ ` α : A, β : B,∆) Γ ` µ([α, β].c) : A`B | ∆ `R Γ | e : A ` ∆ Γ | e′ : B ` ∆ Γ,Γ′ | [e, e′] : A`B ` ∆,∆′ `L c : (Γ, x : A ` β : B,∆) Γ ` µ([x · β].c) : A→ B | ∆ →R Γ ` v : A | ∆ Γ′ | e : B ` ∆′ Γ,Γ′ | v · e : A→ B ` ∆,∆′ →L c : (Γ, x : A ` ∆) Γ ` µ(¬ [x].c) : ¬A | ∆ ¬R Γ ` v : A | ∆ Γ | ¬ [v] : ¬A ` ∆ ¬L FIGURE 4.3. The syntax and types semantics for system L: with two unit types (1, >), two empty types (0, ⊥), (co-)products (&, ⊕), (co-)pairs (⊗, `), two negations (∼, ¬), and functions (→). 109 In the eyes of a pragmatist, it is the rules for using a proposition (corresponding to the elimination rules in natural deduction left rules in sequent calculus), that give meaning to a logical connective. That is to say, the primitive concept is what can be done with a proposition. This stance is the polar opposite of the verificationist. For a pragmatist, canonical proofs are composed of elimination or left rules , and the other rules must be sound with respect to the way assumptions are used rather than the way facts are verified. The key insight behind this connection is that the positive connectives follow a verificationist’s point of view, whereas the negative connectives follow a pragmatist’s point of view. In terms of system L, positive types focus on the patterns or shapes of terms (which create results) whereas negative types focus on the patterns or shapes of co-terms (which use results). Since the positive connectives correspond to a verificationist style of proof, the proofs (i.e. verifications) of a proposition fall within a fixed set of well-known canonical forms, whereas the uses (i.e. refutations) of a proposition are arbitrary. Therefore, in a program corresponding to a verificationist proof, the terms for producing output also must fall within a fixed set of forms, but the co-terms for consuming input are allowed to be arbitrary. In order to gain a foot-hold on the unrestricted nature of positive co-terms, we may describe them by inversion on the possible forms of their input. That is to say, positive co-terms may be defined by cases on the structure of all possible input they might receive. In other words, positive types follow the general pattern that terms are formed by construction, whereas co-terms are formed by case analysis on term constructors Compared to the positive connectives, the pragmatist approach to negative connectives may seem a bit unusual. Rather than thinking about how to conclude true facts, the pragmatist takes the dual approach and focuses attention on how to make use of those facts. In this way, the methods of using an assumed proposition are limited to a fixed set of known canonical forms, whereas the conclusions of a proposition may be arbitrary. The programs that correspond with pragmatist proofs are likewise dual to verificationist proofs, so that the relative roles of producers and consumers are reversed. In a pragmatist program, the terms that produce output are allowed to have an arbitrary form. Instead, it is the co-terms for consuming input that must fall within a fixed set of known forms—the legal observations of a type. We may then define terms by inversion on the possible forms of their consumer, so that they are given by cases on the observation of their output. In other words, general pattern 110 for negative connectives is that the co-terms are formed by construction, whereas the terms are formed by the dual form of case analysis on co-term constructors. For example, in the dual calculi, a value of the product type A×B was created by the pair term (v1, v2) which are used by a projection co-term of the form pi1 [e] or pi2 [e]. In system L, however, we have two different methods to conjoin two types. From the verificationist viewpoint, the positive A⊗B method to conjunction puts the focus on the construction of pairs representing the canonical proof of a conjunction of two parts, keeping terms of the form (v1, v2) : A⊗B that clearly contains both v1 : A and v1 : B sub-terms, as the single right introduction rule defining A⊗B: Γ ` v : A | ∆ Γ ` v′ : B | ∆ Γ,Γ′ ` (v, v′) : A⊗B | ∆,∆′ ⊗R To use a value of type A ⊗ B, a co-term only needs to justify its reaction to the canonical pair values, such as the case abstraction co-term µ˜[(x, y).c] : A ⊗ B that performs a pattern-matching case analysis to bind x : A to the first component and y : B to the second component of its given pair in the arbitrary command c: c : (Γ, x : A, y : B ` ∆) Γ | µ˜[(x, y).c] : A⊗B ` ∆ ⊗L From the pragmatist viewpoint, the negative A&B method to conjunction puts the focus on the destruction of pairs, keeping co-terms of the form pi1 [e] : A & B and pi2 [e] : A&B that clearly mark the choice between the two canonical left introduction rules of products defining A&B: Γ | e : A ` ∆ Γ | pi1 [e] : A&B ` ∆ &L1 Γ | e : B ` ∆ Γ | pi2 [e] : A&B ` ∆ &L2 To create a value of type A&B, a term only needs to justify its reaction the two possible projection observations, such as the co-case abstraction term µ(pi1 [α].c1 | pi2 [β].c2) : A&B that performs pattern-matching case analysis which projection is observing it, binding α : A to e1 : A in c1 in the case of a pi1 [e1] projection and binding β : B to e2 : B in c2 in the case of a pi2 [e2] projection: c : (Γ ` α : A,∆) c′ : (Γ ` β : B,∆) Γ ` µ(pi1 [α].c | pi2 [β].c′) : A&B | ∆ &R 111 As another example, the dual calculi creates values of the sum type A + B by the injection terms ι1 (v) and ι2 (v) which are used by a co-pair co-term of the form [v1, v2], and in system L each of these two constructions show up separately in the two different methods to disjoin two types. On the one hand, the A⊕ B method to disjunction keeps the injection terms ι1 (v) : A ⊕ B and ι2 (v) : A ⊕ B that clearly mark the choice between the canonical right introduction rules defining A⊕B: Γ ` v : A | ∆ Γ ` ι2 (v) : A⊕B | ∆ ⊕R1 Γ ` v : B | ∆ Γ ` ι1 (v) : A⊕B | ∆ ⊕R2 To use a value of type A⊕B, we only need to justify the reaction of a co-term to the canonical injection terms, such as the case abstraction co-term µ˜[ι1 (x).c1 | ι2 (y).c2] : A⊕B that checks which injection it receives, binding x : A to v1 : A in c1 in the case of ι1 (v1) and binding y : B to v2 : B in c2 in the case of ι2 (v2): c : (Γ, x : A ` ∆) c′ : (Γ, y : B ` ∆) Γ | µ˜[ι1 (x).c | ι2 (y).c′] : A⊕B ` ∆ ⊕L On the other hand, the A`B method of disjunction puts the focus on the destruction of sums, keeping co-terms of the form [e1, e2] that clearly contains both e1 : A and e2 : B sub-co-terms, as the single canonical left introduction rule defining A`B: Γ | e : A ` ∆ Γ | e′ : B ` ∆ Γ,Γ′ | [e, e′] : A`B ` ∆,∆′ `L To create a value of type A`B, we only need to justify the reaction of a term to the canonical co-pair observations, such as the co-case abstraction term µ([α, β].c) : A`B that binds α : A to first component and β : B to the second component of its given co-pair in the arbitrary command c: c : (Γ ` α : A, β : B,∆) Γ ` µ([α, β].c) : A`B | ∆ `R The rest of the connectives follow suit accordingly, where positive connectives construct terms according to certain patterns and have co-terms which match on those patterns by case analysis, and negative connectives construct co-terms according to certain patterns and have terms which match on those patterns by case analysis. The 112 positive constants 1 and 0 are nullary versions of A⊗B and A⊕B so they contain the nullary versions of pairs and co-products. The negative constants > and ⊥ are nullary versions of A&B and A`B so they contain the nullary versions of products and co-pairs. Functions A→ B are another example of a multiplicative negative type like the negative disjunction A`B, and so it contains similar (co-)terms for the sake of uniformity. This means that the call stacks v · e : A → B for functions are the same as in the dual calculi, but λ-abstractions have been replaced with the co-case abstraction terms µ([x · β].c) : A → B which deconstruct a call stack to bind the argument to x : A and the return co-term to β : B in the command c. Note that this change in representation from λ-abstractions to call stack deconstructions does not change the expressiveness of functions, since the each can represent the other as macro expansions: µ([x · β].c) = λx.µβ.c λx.v = µ([x · β].〈v||β〉) (β /∈ FV (v)) Finally, we have to accomodate negation, which could be considered both positive and negative as we previous saw in Section 4.1. Therefore, instead of breaking the pattern or choosing arbitrarily, we include two different negation connectives—a positive negation ∼A and a negative negation ¬A—to express the two possible orientations of construction and deconstruction by case analysis. Remark 4.1. It is worthwhile to pause and ask why the pragmatist representation of logical connectives may appear to be backwards. For example, ` is a logical “or” whose interpretation appears to be an “and” combination of two things, whereas & is a logical “and” whose interpretation appears to be an “or” choice of two alternatives. The reason is that the pragmatist approach requires us to completely reverse the way we think about proving. Under the verificationist approach, we focus on how to establish truth: to show that “A and B” is true, we need to show that both A and B are true; to show that “A or B” is true, it suffices to show that either A is true or B is true. Instead, the pragmatist approach asks us to focus on the ways to establish falsehood: to show that “A and B” is false, it suffices to show that either A is false or B is false; to show that “A or B” is false we need to show that both A and B are false. Whereas the verificationist is primarily concerned with building a proof, the pragmatist is instead concerned with building a refutation. Therefore, the pragmatist interpretation of negative connectives intuitively has a negative baked in: “and” is 113 Positive ηP rules: (η0P) e : 0 ≺η0P µ˜[] (η1P) e : 1 ≺η1P µ˜[().〈()||e〉] (η⊕P ) e : A⊕B ≺η⊕P µ˜[ι1 (x).〈ι1 (x)||e〉 | ι1 (y).〈ι1 (y)||e〉] (η⊗P ) e : A⊗B ≺η⊗P µ˜[(x, y).〈(x, y)||e〉] (η∼P ) e : ∼A ≺η∼P µ˜[∼ (α).〈∼ (α)||e〉]  x, y, α /∈ FV (e) Negative ηP rules: (η>P ) v : > ≺η>P µ() (η⊥) v : ⊥ ≺η⊥P µ([].〈v||[]〉) (η&P ) v : A&B ≺η&P µ(pi1 [α].〈v||pi1 [α]〉 | pi2 [β].〈v||pi2 [β]〉) (ηP` ) v : A`B ≺ηP` µ([α, β].〈v||[α, β]〉) (η→P ) v : A→ B ≺η→P µ([x · β].〈v||x · β〉) (η¬P) v : ¬A ≺η¬P µ(¬ [x].〈v||¬ [x]〉)  α, β, x /∈ FV (v) FIGURE 4.4. The extensional η laws for system L: with two unit types (1, >), two empty types (0, ⊥), (co-)products (&, ⊕), (co-)pairs (⊗, `), two negations (∼, ¬), and functions (→). represented by a choice and “or” is represented by a pair because they are about refutations rather than proofs. End remark 4.1. The advantage of the system L style of syntax can be seen when we look to the program transformations corresponding to the reversibility expansions previously seen in Section 4.1, which are listed in Figure 4.4. In particular, these expansions correspond to the η laws from the λ-calculus, so we refer to them by the same naming convention. For example, the expansion of the right function introduction corresponds to the λ-calculus η law for functions (v : A→ B ≺η→ λx.v x) which in system L looks like: (η→P ) v : A→ B ≺η→P µ([x · β].〈v||x · β〉) Here, the pattern-matching formulation of functional terms gives a more pleasant η law than the λ-based syntax from the dual calculi, which must introduce an extra 114 output abstraction to express the η→P law of the sequent calculus as follows: v : A→ B ≺ λx.µβ. 〈v||x · β〉 As another example, the expansion of the right product introduction corresponds to the surjective η law for products (v : A × B ≺η× (pi1(v), pi2(v))) which in system L looks like: (η&P ) v : A&B ≺η&P µ(pi1 [α].〈v||pi1 [α]〉 | pi2 [β].〈v||pi2 [β]〉) Again, the pattern-matching syntax for product terms makes for a cleaner presentation of the surjectivity of products in the sequent calculus, where the dual calculi representation of the η&P introduces two output abstractions as follows: v : A×B ≺ (µα. 〈v||pi1 [α]〉, µβ. 〈v||pi2 [β]〉) We also have the positive reversibility expansions which worked on the left instead of the right, meaning that they expand co-terms instead of terms. For example the left sum introduction expansion η⊕P is: (η⊕P ) e : A⊕B ≺η⊕P µ˜[ι1 (x).〈ι1 (x)||e〉 | ι1 (y).〈ι1 (y)||e〉] The system L η law for sums looks very different than the one we saw in the λ-calculus (v : A+ B ≺η+ case v of ι1 (x)⇒ ι1 (x) | ι2 (y)⇒ ι2 (y)). In particular, the existence of co-terms as full-fledged syntactic entities, which were missing from the syntax of the λ-calculus, gives a better presentation of the positive η laws that reveals their connection with the negative η laws. In the λ-calculus, there doesn’t seem to be much connection between the η laws for sums and products, but in system L, the syntax makes it apparent that they are the polar opposite forms of the same law; one acting on terms and the other on co-terms. Polarizing the Fundamental Dilemma System L is a great language for expressing the extensional η laws of types in a way that reveals their symmetry with one another. However, if we try to naïvely reconcile the polarized ηP laws with the core µµ˜ operational laws, we quickly run into 115 trouble since their strength is capable of re-introducing the fundamental dilemma of computation (see Section 3.2). On the one hand, the negative ηP laws are incompatible with the call-by-value µV µ˜V laws, since ηP can convert any term into a V value. For example, if we start with the usual problematic command 〈µ .c1||µ˜ .c2〉, an unfortunate η→P expansion can convert µ .c1, which is not a V value, into µ([x · β].〈µ .c1||x · β〉), which is a V value. This leads to the divergent reductions: c1 ←µV 〈µ .c1||µ˜ .c2〉 ←η→P 〈µ([x · β].〈µ .c1||x · β〉)||µ˜ .c2〉 →µ˜V c2 Therefore, for the negative ηP laws to make sense, the (co-)terms of negative types cannot be interpreted by the call-by-value V strategy. On the other hand, the positive ηP laws are incompatible with the call-by-name µN µ˜N laws, since ηP can convert any co-term into a N co-value. For example, staring from the same problematic command, an unfortunate η⊗P expansion can convert µ˜ .c2, which is not a N co-value, into µ˜[(x, y).〈(x, y)||µ˜ .c2〉], which is a N co-value. This leads to the divergent reductions: c2 ←µN 〈µ .c1||µ˜ .c2〉 ←η⊗P 〈µ .c1||µ˜[(x, y).〈(x, y)||µ˜ .c2〉]〉 →µ˜N c1 Therefore, for the positive ηP laws to make sense, the (co-)terms of positive types cannot be interpreted by the call-by-name V strategy. What this means is that in the face of the polarized η laws, we cannot resolve the fundamental dilemma by just imposing a language-wide evaluation strategy once and for all as we did with the dual calculi in Chapter III, since half the ηP laws are incompatible with call-by-value evaluation and the other half are incompatible with call-by-name. Fortunately, the concept of reversibility give us a different answer to the fundamental non-determinism of the classical sequent calculus that leverages the ηP laws instead of fighting against them, with an idea that can be traced back to Danos et al. (1997). The key insight is that in lieu of imposing a language-wide evaluation strategy, we can use the type of an interacting pair of (co-)terms in a command to figure out what evaluation strategy to use for the reduction of that particular command. So when we faced with an ambiguous command like 〈µα.c1||µ˜x.c2〉, we can use the type of µα.c1 and µ˜x.c2 to tell us what the term and the co-term “really look like” (Graham-Lengrand, 2015). 116 For example, suppose the troublesome command is between a term and co-term of type A⊗B as in: .... D c1 : (Γ ` α : A⊗B,∆) Γ ` µα.c1 : A⊗B | ∆ AR .... E c2 : (Γ, x : A⊗B ` ∆) Γ | µ˜x.c2 : A⊗B ` ∆ AL 〈µα.c1||µ˜x.c1〉 : (Γ ` ∆) Cut Since we know that the left rule for ⊗ is reversible, we can achieve an equivalent co-term that ends with ⊗L: .... D c1 : (Γ ` α : A⊗B,∆) Γ ` µα.c1 : A⊗B | ∆ AR .... E ′ c′2 : (Γ, x : A, y : B ` ∆) Γ | µ˜[(x, y).c′2] : A⊗B ` ∆ ⊗L 〈µα.c1||µ˜[(x, y).c′2]〉 : (Γ ` ∆) Cut Therefore, by employing reversibility of the typing rules, we discovered that there wasn’t an issue after all, revealing the fact that in a sense the co-term was concealing its intent (Graham-Lengrand, 2015). On the other hand, if we have the command 〈V ||µ˜x.c〉, where V is a V value then it is safe to substitute V for x since it must be a pair 〈V1||V2〉 (or a variable standing in for a pair). This approach of using reversibility of restore confluence also extends to the negative connectives. However, because negative connectives are reversible in opposite ways to positive connectives, we get the opposite resolution to the dilemma. Suppose again that we are faced with the command 〈µα.c1||µ˜x.c2〉 with a similar typing derivation as before, except that now x and α have the type A → B. We know that the right rule for → is reversible, so we can explicate the typing derivation as: .... D′ c′1 : (Γ, y : A ` β : B,∆) Γ | µ([y · β].c′1) : A→ B ` ∆ →R .... E c2 : (Γ, x : A→ B ` ∆) Γ | µ˜x.c2 : A→ B ` ∆ AL 〈µ([y · β].c′1)||µ˜x.c2〉 : (Γ ` ∆) Cut giving us the more explicit command 〈µ([α, β].c′1)||µ˜x.c2〉 that now spells out exactly which side should be prioritized. Therefore, for negative types, polarity in the type of a cut reveals the opposite intent, restoring determinism to the system by favoring the 117 co-term over the term. Therefore polarity of the type of a cut can tell us who requires priority, and restores determinism in an analogous manner as in Section 3.3. Following this regime for solving the fundamental dilemma, the polarization hypothesis says that types can be used to determine the evaluation order in a program according to their polarity (Zeilberger, 2009; Munch-Maccagnoni, 2013). For positive types like A⊗B andA⊕B, reversibility on the left tells us to favor giving priority to the term in case of ambiguity. Contrarily, the reversibility on the right for negative types like A&B, A`B, and A→ B tells us to favor giving priority to the co-term. Thus, positive types suggest a call-by-value evaluation order and negative types suggest a call-by-name evaluation order. To formally apply the polarized approach to evaluation strategy, we must bifurcate the syntax of the core µµ˜-calculus and separate the positive entities from the negative ones, as shown in Figure 4.5. This bifurcated syntax has all the same types and expressions as from Figure 3.7, except that positive types and (co-)terms (denoted by A+, . . . , v+, and e+) are syntactically separate from negative types and (co-)terms (denoted by A−, . . . , v−, and e−). In order for the polarity of a type or (co-)term to be apparent from its syntax, we need to annotate type variables and (co-)variables with their intended polarity using either the positive superscript (X+, x+, α+) or the negative superscript (X−, x−, α−) which is an explicit part of their syntax (as opposed to the mere distinction between v+ and v−, etc.). When the polarity of type, term, co-term doesn’t matter, we may just refer to it as A for either A+ or A−, v for either an v+ or v−, and e for either a e+ or e−. Note commands are not distinguished by a polarity because unlike (co-)terms they are not part of a specific type. Instead, the single syntactic set Command contains two different kinds of commands—one between positive (co-)terms and one between negative ones—so that commands are only syntactically valid when the polarity of their (co-)terms agree. Also, note that only the core typing rules are bifurcated into positive and negative versions; the structural rules from Figure 3.10 which are also part of µµ˜P remain the same. Now, to address polarity in the full system L language, we only need to extend the polarized core µµ˜P -calculus with the specific connectives and constructs, as shown in Figure 4.6, which extends the polarized core calculus from Figure 4.5. Note that there is one extra pair of connectives ↓A− and ↑A+ that are introduced in Figure 4.6 which are known as Girard’s (2001) polarity “shifts” that mark a switch between the positive and negative polarities. These shifts are important for making sure that 118 A,B,C ∈ Type ::= A+ | A− A+, B+, C+ ∈ Type+ ::= X+ A−, B−, C− ∈ Type− ::= X− c ∈ Command ::= 〈v+||e+〉 | 〈v−||e−〉 v ∈ Term ::= v+ | v− v+ ∈ Term+ ::= x+ | µα+.c v− ∈ Term− ::= x− | µα−.c e ∈ CoTerm ::= e+ | e− e+ ∈ CoTerm ::= α+ | µ˜x+.c e− ∈ CoTerm ::= α− | µ˜x−.c Γ ∈ InputEnv ::= x1 : A1, . . . , xn : An ∆ ∈ OutputEnv ::= α1 : A1, . . . , αn : An Judgement ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ | e : A ` ∆) Core rules: x+ : A+ ` x+ : A+ | VR+ | α+ : A+ ` α+ : A+ VL+ x− : A− ` x− : A− | VR− | α− : A− ` α− : A− VL− c : (Γ ` α+ : A+,∆) Γ ` µα+.c : A+ | ∆ AR+ c : (Γ, x+ : A+ ` ∆) Γ | µ˜x+.c : A+ ` ∆ AL+ c : (Γ ` α− : A−,∆) Γ ` µα−.c : A− | ∆ AR− c : (Γ, x− : A− ` ∆) Γ | µ˜x−.c : A− ` ∆ AL− Γ ` v+ : A+ | ∆ Γ′ | e+ : A+ ` ∆′ 〈v+||e+〉 : (Γ′,Γ ` ∆′,∆) Cut+ Γ ` v− : A− | ∆ Γ′ | e− : A− ` ∆′ 〈v−||e−〉 : (Γ′,Γ ` ∆′,∆) Cut− V ∈ ValueP ::= V+ | V− V+ ∈ Value+ ::= x+ V− ∈ Value− ::= v− E ∈ CoValueP ::= E+ | E− E+ ∈ CoValue+ ::= e+ E− ∈ CoValue− ::= α− (µP) 〈 µα+.c ∣∣∣∣∣∣E+〉 µP c{E+/α+} (ηµ) µα+. 〈v+∣∣∣∣∣∣α+〉 ηµ v+ (α+ /∈ FV (v+)) (µP) 〈 µα−.c ∣∣∣∣∣∣E−〉 µP c {E−/α−} (ηµ) µα−. 〈v−∣∣∣∣∣∣α−〉 ηµ v− (α− /∈ FV (v−)) (µ˜P) 〈 V+ ∣∣∣∣∣∣µ˜x+.c〉 µP c {V+/x+} (ηµ) µ˜x+. 〈x+∣∣∣∣∣∣e+〉 ηµ e+ (x+ /∈ FV (e+)) (µ˜P) 〈 V− ∣∣∣∣∣∣µ˜x−.c〉 µP c {V−/x−} (ηµ) µ˜x−. 〈x−∣∣∣∣∣∣e−〉 ηµ e− (x− /∈ FV (e−)) FIGURE 4.5. The polarized core µµ˜P-calculus: its static and dynamic semantics. 119 A,B,C ∈ Type ::= A+ | A− A+, B+, C+ ∈ Type+ ::= X+ | 0 | 1 | A+ ⊕B+ | A+ ⊗B+ | ∼A− | ↓A− A−, B−, C− ∈ Type− ::= X− | > | ⊥ | A− &B− | A− `B− | A+ → B− | ¬A+ | ↑A+ c ∈ Command ::= 〈v+||e+〉 | 〈v−||e−〉 v ∈ Term ::= v+ | v− e ∈ CoTerm ::= e+ | e− v+ ∈ Term+ ::= x+ | µα+.c | () | ι1 (v+) | ι2 (v+) | (v+, v+) | ∼ (e−) | ↓(v−) e+ ∈ CoTerm+ ::= α+ | µ˜x+.c | µ˜[] | µ˜[().c] | µ˜ [ ι1 ( x+ ) .c | ι2 ( y+ ) .c ] | µ˜ [( x+, y+ ) .c ] | µ˜ [ ∼ ( α− ) .c ] | µ˜ [ ↓ ( x− ) .c ] v− ∈ Term− ::= x− | µα−.c | µ() | µ([].c) | µ ( pi1 [ α− ] .c | pi2 [ β− ] .c ) | µ ([ α−, β− ] .c ) | µ ( [x+ · α−].c ) | µ ( ¬ [ x+ ] .c ) | µ ( ↑ [ α+ ] .c ) e− ∈ CoTerm− ::= α− | µ˜x−.c | [] | pi1 [e−] | pi2 [e−] | [e−, e−] | v+ · e− | ¬ [v+] | ↑[e+] FIGURE 4.6. The syntax for polarized system L: with both positive connectives— disjunction (⊕), conjunction (⊗), negation (∼), and polarity shift (↓)—and negative connectives—conjunction (&), disjunction (`), negation (¬), functions (→), and polarity shift (↑). 120 the polar bifurcation of the language does not accidentally eliminate its essential expressive capabilities. For example, in the λ-calculus, the dual calculi, or a functional programming language, it is typical to store a function (which is a negative term) inside the structure of a pair or sum type structure (which is a positive term). However, this would be prevented by the distinction between the polarities of types. Instead, we would like to allow for some mingling between positive and negative types and (co-)terms without confusing the two. The ↓ shift lets us embed negative types inside positive ones, so that for every negative type A− we have the positive type ↓A−. Going along with our story that positive values follow predetermined patterns, we have the structured term ↓(v−) : ↓A− which contains a negative term along with a case abstraction co-term, µ˜[↓(x−).c] : ↓A−, for unpacking the structure and pulling out the underlying term. The ↑ shift lets us embed positive types inside negative ones, so that for every type A+ we have the negative type ↑A+. The (co-)terms of the ↑ shift symmetric to the ↓ ones, so that we have the co-case abstraction term µ(↑[α+].c) : ↑A+ which is waiting for a shifted co-term of the form ↑[e+] : ˆA+ containing a positive co-term. The logical typing rules for polarized system L is shown in Figure 4.7, which are effectively the same rules from Figure 4.3 made aware of the distinction between positive and negative polarities. The only new rules are for the new shift connectives. More interestingly, the βP rules for system L are similar to rules for reducing case analysis in functional languages as shown in Figure 4.8. For example, for the positive βP laws we have sum types that select which branch to take based on the constructor tag: 〈 ι1 (V+) ∣∣∣∣∣∣µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉 β⊕P c1 {V+/x+} and pair types which decompose a pair into its constituent parts: 〈( V+, V ′ + )∣∣∣∣∣∣µ˜[(x+, y+).c]〉 β⊗P c{V+/x+, V ′+/y+} The negative β[][P ] laws follow the same notion of case analysis as the positive β[][P ] laws, except in the reverse direction. For example, terms of product types select the appropriate response based on the constructor tag of their observation: 〈 µ ( pi1 [ α− ] .c1 | pi2 [ β− ] .c2 )∣∣∣∣∣∣pi1 [E−]〉 β&P c1 {E−/α−} 121 Positive logical rules: no 0R rule Γ | µ˜[] : 0 ` ∆ 0L ` () : 1 | 1R c : (Γ ` ∆) Γ | µ˜[().c] : 1 ` ∆ 1L Γ ` v+ : A+ | ∆ Γ ` ι1 (v+) : A+ ⊕B+ | ∆ ⊕R1 Γ ` v+ : B+ | ∆ Γ ` ι2 (v+) : A+ ⊕B+ | ∆ ⊕R2 c : (Γ, x+ : A+ ` ∆) c′ : (Γ, y+ : B+ ` ∆) Γ | µ˜[ι1 (x+).c | ι2 (y+).c′] : A+ ⊕B+ ` ∆ ⊕L Γ ` v+ : A+ | ∆ Γ′ ` v′+ : B+ | ∆′ Γ,Γ′ ` ( v+, v ′ + ) : A+ ⊗B+ | ∆,∆′ ⊗R c : (Γ, x+ : A+, y+ : B+ ` ∆) Γ | µ˜[(x+, y+).c] : A+ ⊗B+ ` ∆ ⊗L Γ | e− : A− ` ∆ Γ ` ∼ (e−) : ∼A− | ∆ ∼R c : (Γ ` α− : A−,∆) Γ | µ˜[∼ (α−).c] : ∼A− ` ∆ ∼L Γ ` v− : A− | ∆ Γ ` ↓(v−) : ↓A− | ∆ ´R c : (Γ, x− : A− ` ∆) Γ | µ˜[↓(x−).c] : ↓A− ` ∆ ´L Negative logical rules: Γ ` µ() : > | ∆ >R no >L rule c : (Γ ` ∆) Γ ` µ([].c) : ⊥ | ∆ ⊥R | [] : ⊥ ` ⊥L c : (Γ ` α− : A−,∆) c′ : (Γ ` β− : B−,∆) Γ ` µ(pi1 [α−].c | pi2 [β−].c′) : A− &B− | ∆ &R Γ | e− : A− ` ∆ Γ | pi1 [e−] : A− &B− ` ∆ &L1 Γ | e− : B− ` ∆ Γ | pi2 [e−] : A− &B− ` ∆ &L2 c : (Γ ` α− : A−, β− : B−,∆) Γ ` µ([α−, β−].c) : A− `B− | ∆ `R Γ | e− : A− ` ∆ Γ′ | e′− : B− ` ∆′ Γ,Γ′ | [ e−, e′− ] : A− `B− ` ∆,∆′ `L c : (Γ, x+ : A+ ` ∆) Γ ` µ(¬ [x+].c) : ¬A+ | ∆ ¬R Γ ` v+ : A+ | ∆ Γ | ¬ [v+] : ¬A+ ` ∆ ¬L c : (Γ ` α+ : A+,∆) Γ ` µ(↓(α+).c) : ↑A+ | ∆ ˆR Γ | e+ : A+ ` ∆ Γ | ↑[e+] : ↑A+ ` ∆ ˆL FIGURE 4.7. Logical typing rules for polarized system L: with both positive connectives (0, 1, ⊕, ⊗, ∼, ↓) and negative connectives (>, ⊥, &, `, →, ¬, ↑). 122 V ∈ V alueP ::= V+ | v− V+ ∈ V alue+ ::= x+ | () | ι1 (V+) | ι2 (V+) | (V+, V+) | ∼ (E−) | ↓(v−) E ∈ CoV alueP ::= e+ | E− E− ∈ CoV alue− ::= α− | pi1 [E−] | pi2 [E−] | [E−, E−] | V+ · E− | ¬ [V+] | ↑[e+] Positive βP rules: (β0P) no β0P rule (β1P) 〈()||µ˜[().c]〉 β1P c (β⊕P ) 〈 ιi (V+) ∣∣∣∣∣∣µ˜[ι1 (x+1 ).c1 | ι2 (x+2 ).c2]〉 β⊕P ci {V+/x+i } (β⊗P ) 〈( V+, V ′ + )∣∣∣∣∣∣µ˜[(x+, y+).c]〉 β⊗P c{V+/x, V ′+/y} (β∼P ) 〈 ∼ (E−) ∣∣∣∣∣∣µ˜[∼ (α−).c]〉 β∼P c{E−/α−} (β↓P) 〈 ↓(v−) ∣∣∣∣∣∣µ˜[↓(x−).c]〉  β ↓ P c { v−/x− } Negative βP rules: (β>P ) no β>P rule (β⊥P ) 〈µ([].c)||[]〉 β⊥P c (β&P ) 〈 µ ( pi1 [ α−2 ] .c1 | pi2 [ α−1 ] .c2 )∣∣∣∣∣∣pii [E−]〉 β&P ci {E−/α−i } (βP` ) 〈 µ ([ α−, β− ] .c )∣∣∣∣∣∣[E−, E ′−]〉 βP` c {E−/α−, E ′−/β−} (β→P ) 〈 µ ( [x+ · β−].c )∣∣∣∣∣∣V+ · E−〉 β→P c{V+/x+, E−/β−} (β¬P) 〈 µ ( ¬ [ x+ ] .c )∣∣∣∣∣∣¬ [V+]〉 β¬P c{V+/x+} (β↑P) 〈 µ ( ↑ [ α+ ] .c )∣∣∣∣∣∣↑[e+]〉 β↑P c { e+/α + } FIGURE 4.8. The operational β laws for polarized system L: with two unit types (1, >), two empty types (0, ⊥), (co-)products (&, ⊕), (co-)pairs (⊗, `), two negations (∼, ¬), functions (→), and two polarity shifts (↓, ↑). 123 and terms of co-pair types decompose their observation into the two independent messages: 〈 µ ([ α−, β− ] .c )∣∣∣∣∣∣[E−, E ′−]〉 βP` c {E−/α−, E ′−/β−} Focusing and Polarity The βP-based operational rules for polarized system L explain how to reduce commands by performing pattern-matching. However, βP reduction alone is not enough, since it suffers the same essential deficiency as β reduction in the dual calculi (Section 3.3). For example, in the positive form of pattern-matching of type A+⊕B+, we could encounter the command 〈 ι1 ( µα+.c )∣∣∣∣∣∣µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉 6β⊕P which does not proceed by β⊕P because µα+.c is not a V value. Similarly in the negative form of pattern-matching of type A+ → B−, we could encounter the command 〈 µ ( [x+ · β−].c )∣∣∣∣∣∣(µα+.c1) · [µ˜y−.c2]〉 6β→P which does not proceed by β→P because µα+.c1 is not a V value and µ˜y−.c2 is not a N co-value. Unsurprisingly, the same technique of focusing with the same two options we had before: we can remove the superfluous parts of the syntax of system L (like the above two commands) with the static approach to focusing, or we can add the extra steps necessary to kick-start the computation again with the dynamic approach to focusing. The major difference between focusing in system L versus focusing in the dual calculi is that since polarized system L incorporates aspects of both the call-by-value and call-by-name halves of the dual calculi into a single language, the polarized focusing shares similarities with both the call-by-value and call-by-name focusing at once. In particular, the dual calculi had two different sets of focused sub-syntaxes (LKQ and LKT) and two different sets of focusing ς rules (ςV and ςN ) corresponding to its two different evaluation strategies. Instead, polarized system L has a single focused sub-syntax and a single set of focusing ς rules. First, let’s consider the static approach with the focused sub-syntax of system L shown in Figure 4.9. On the positive side, the restrictions on the syntax of positive terms resembles LKQ. Every positive term is either a positive value or output 124 v+ ∈ Term+ ::= V+ | µα+.c V+ ∈ Value+ ::= x+ | () | ι1 (V+) | ι2 (V+) | (V+, V+) | ∼ (E−) | ↓(v−) e+ ∈ CoTerm+ ::= α+ | µ˜x+.c | µ˜[] | µ˜[().c] | µ˜ [ ι1 ( x+ ) .c | ι2 ( y+ ) .c ] | µ˜ [( x+, y+ ) .c ] | µ˜ [ ∼ ( α− ) .c ] | µ˜ [ ↓ ( x− ) .c ] v− ∈ Term− ::= x− | µα−.c | µ() | µ([].c) | µ ( pi1 [ α− ] .c | pi2 [ β− ] .c ) | µ ([ α−, β− ] .c ) | µ ( [x+ · α−].c ) | µ ( ¬ [ x+ ] .c ) | µ ( ↑ [ α+ ] .c ) e− ∈ CoTerm− ::= E− | µ˜x−.c E− ∈ CoValue− ::= α− | pi1 [E−] | pi2 [E−] | [E−, E−] | V+ · E− | ¬ [V+] | ↑[e+] c ∈ Command ::= 〈v+||e+〉 | 〈v−||e−〉 Judgement ::= c : (Γ ` ∆) | (Γ ` v : A | ∆) | (Γ ` V+ : A+ ; ∆) | (Γ | e : A ` ∆) | (Γ ; E− : A− ` ∆) Axiom: x+ : A+ ` x+ : A+ ; Var + | α+ : A+ ` α+ : A+ CoVar + x− : A− ` x− : A− | Var − ; α− : A− ` α− : A− CoVar − Focusing (structural) rules: Γ ` V+ : A+ ; ∆ Γ ` V+ : A+ | ∆ FR Γ ; E− : A− ` ∆ Γ | E− : A− ` ∆ FL FIGURE 4.9. Focused sub-syntax and core typing rules for polarized system L. 125 abstraction, where the positive values are defined hereditarily: a pair of two values is a value, an injection of a value is a value, and so on. That way, troublesome commands like 〈ι1 (µα+.c)||µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉 become syntactically forbidden. The interesting types that contain negative types and break this mold are the values ∼ (E−) : ∼A− which contain a negative co-value and the values ↓(v−) : ↓A− which contains a negative term. Also like LKQ there is no restrictions placed on positive co-terms, which is in part because the co-terms of positive types are all abstractions which are not easily restricted like the positively constructed terms are. On the negative side, the restrictions on the syntax of negative co-terms resembles LKT. Every negative term is either a negative co-value or input abstraction, where the negative co-values are defined hereditarily: a pair of two co-value is a co-value, a projection of a co-value is a co-value, etc. So troublesome commands like 〈µ([x+ · β−].c)||(µα+.c1) · [µ˜y−.c2]〉 are also syntactically forbidden. As before, there are some interesting types that refer to positive types, like the co-values V+ · E− : A+ → B− and ¬ [V+] : ¬A+ which contain a positive value and ↑[e+] : ↑A+ which contains a positive co-term. Also as like LKT there is no restriction on the negative terms, which are all abstractions over negative co-values. The focalized and polarized type system for system L introduces two new sequents using the stoup (;) based on the two restrictions on the syntax, Γ ` V+ : A+ ; ∆ for typing positive values in focus and Γ ; E− : A− ` ∆ for typing negative co-values in focus. The logical typing rules are given in Figure 4.10. The typing rules are essentially the same as the unfocused polarized ones from Figure 4.7, except that they now follow the syntactic restrictions on positive terms and negative co-terms Figure 4.9. This has the net effect that, in a bottom-up reading of a typing derivation, once focus is gained via the FR or FL rules it is maintained. The only rules which are capable of losing focus are the ´R and ˆL rules, which transition from a positive value to a negative term and from a negative co-value to a positive co-term. This can be seen as a design philosophy justifying the choice of polarities in the connectives of polarized system L from Figure 4.6: focus should be maintained by every connective except the shifts. Therefore, the function type A+ → B− (called the “primordial function type” by Zeilberger (2009)) must have a positive argument type and negative return type to maintain focus in the call stack V+ · E−, and the negation types ∼A− and ¬A+ must invert the polarity of the type to maintain focus in ∼ (E−) and ¬ [V+]. Anything else 126 Positive focused logical rules: no 0R rule Γ | µ˜[] : 0 ` ∆ 0L ` () : 1 ; 1R c : (Γ ` ∆) Γ | µ˜[().c] : 1 ` ∆ 1L Γ ` V+ : A+ ; ∆ Γ ` ι1 (V+) : A+ ⊕B+ ; ∆ ⊕R1 Γ ` V+ : B+ ; ∆ Γ ` ι2 (V+) : A+ ⊕B+ ; ∆ ⊕R2 c : (Γ, x+ : A+ ` ∆) c′ : (Γ, y+ : B+ ` ∆) Γ | µ˜[ι1 (x+).c | ι2 (y+).c′] : A+ ⊕B+ ` ∆ ⊕L Γ ` V+ : A+ ; ∆ Γ′ ` V ′+ : B+ ; ∆′ Γ,Γ′ ` ( V+, V ′ + ) : A+ ⊗B+ ; ∆,∆′ ⊗R c : (Γ, x+ : A+, y+ : B+ ` ∆) Γ | µ˜[(x+, y+).c] : A+ ⊗B+ ` ∆ ⊗L Γ ; E− : A− ` ∆ Γ ` ∼ (E−) : ∼A− ; ∆ ∼R c : (Γ ` α− : A−,∆) Γ | µ˜[∼ (α−).c] : ∼A− ` ∆ ∼L Γ ` v− : A− | ∆ Γ ` ↓(v−) : ↓A− ; ∆ ´R c : (Γ, x− : A− ` ∆) Γ | µ˜[↓(x−).c] : ↓A− ` ∆ ´L Negative focused logical rules: Γ ` µ() : > | ∆ >R no >L rule c : (Γ ` ∆) Γ ` µ([].c) : ⊥ | ∆ ⊥R ; [] : ⊥ ` ⊥L c : (Γ ` α− : A−,∆) c′ : (Γ ` β− : B−,∆) Γ ` µ(pi1 [α−].c | pi2 [β−].c′) : A− &B− | ∆ &R Γ ; E− : A− ` ∆ Γ ; pi1 [E−] : A− &B− ` ∆ &L1 Γ ; E− : B− ` ∆ Γ ; pi2 [E−] : A− &B− ` ∆ &L2 c : (Γ ` α− : A−, β− : B−,∆) Γ ` µ([α−, β−].c) : A− `B− | ∆ `R Γ ; E− : A− ` ∆ Γ′ ; E ′− : B− ` ∆′ Γ,Γ′ ; [ E−, E ′− ] : A− `B− ` ∆,∆′ `L c : (Γ, x+ : A+ ` ∆) Γ ` µ(¬ [x+].c) : ¬A+ | ∆ ¬R Γ ` V+ : A+ ; ∆ Γ ; ¬ [V+] : ¬A+ ` ∆ ¬L c : (Γ ` α+ : A+,∆) Γ ` µ(↓(α+).c) : ↑A+ | ∆ ˆR Γ | e+ : A+ ` ∆ Γ ; ↑[e+] : ↑A+ ` ∆ ˆL FIGURE 4.10. Focused logical typing rules for polarized system L: with positive connectives (0, 1, ⊕, ⊗, ¬, ↓) and negative connectives (>, ⊥, &, `, →, ∼, ↑). 127 would place a negative term or a positive co-term inside of a construction, breaking the convention. Next, let’s consider the dynamic approach with the extra focusing rewrite rules shown in Figure 4.11. These extra reductions are just enough to prevent the troublesome commands from getting stuck. For example, the β⊕P -stuck command between (co-)terms of type A+ ⊕ B+, 〈ι1 (µα+.c)||µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉, can now proceed by a ς⊕P reduction on the immediate sub-term:〈 ι1 ( µα+.c )∣∣∣∣∣∣µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉 →β⊕P 〈 µγ+. 〈 µα+.c ∣∣∣∣∣∣µ˜y+. 〈ι1 (y+)∣∣∣∣∣∣γ+〉〉∣∣∣∣∣∣µ˜[ι1 (x+).c1 | ι2 (y+).c2]〉 Likewise, the β→P -stuck command between (co-)terms of type A+ → B−, 〈µ([x+ · β−].c)||(µα+.c1) · [µ˜y−.c2]〉, can also now proceed by a ς→P reduction on the immediate sub-co-term: 〈 µ ( [x+ · β−].c )∣∣∣∣∣∣(µα+.c1) · [µ˜y−.c2]〉 →ς→P 〈 µ ( [x+ · β−].c )∣∣∣∣∣∣µ˜x+. 〈µα+.c1∣∣∣∣∣∣µ˜y+. 〈x+∣∣∣∣∣∣y+ · [µ˜y−.c2]〉〉〉 The combination of βP and ςP reductions gives us enough tools for a well-behaved extension the core µµ˜ operational semantics. Because ςP operates on (co-)terms instead of commands, we must extend the set of polarized evaluation contexts D to reduce (co-)terms when necessary as follows: D ∈ EvalCxtP ::=  | 〈||e+〉 | 〈v−||〉 This gives us the µP µ˜PβPςP operational semantics ( 7→µP µ˜PβP ςP ), which is strong enough to compute results of the following form: FinalCommandP ::= 〈 V+ ∣∣∣∣∣∣α+〉 | 〈x+∣∣∣∣∣∣Es+〉 | 〈x−∣∣∣∣∣∣E−〉 | 〈V s−∣∣∣∣∣∣α−〉 V s− ∈ SimpleValue− = { v− ∈ Term− | v− 6=α µα−.c } Es+ ∈ SimpleCoValue+ = { e+ ∈ CoTerm+ | e+ 6=α µ˜x+.c } When considering only well-typed commands, we get the standard safety theorem saying that well-typed commands always reduce to a final command shown above similar to the dual calculi. 128 Positive ςP rules: (ς0P) no ς0P rule (ς1P) no ς1P rule (ς⊕P ) pii [v+] ς⊕P µα +. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈pii [y+]∣∣∣∣∣∣α+〉〉 (ς⊗P ) ( v+, v ′ + ) ς⊗P µα +. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈(y+, v′+)∣∣∣∣∣∣α+〉〉 (ς⊗P ) (V+, v+) ς⊗P µα +. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈(V+, y+)∣∣∣∣∣∣α+〉〉 (ς∼P ) ∼ (e−) ς∼P µα+. 〈 µβ−. 〈 ∼ ( β− )∣∣∣∣∣∣α+〉∣∣∣∣∣∣e−〉 (ς ↓P) no ς ↓ P rule  v+ /∈ V alueP , e− /∈ CoV alueP , α+,β−, y+ fresh Negative ςP rules: (ς>P ) no ς>P rule (ς⊥P ) no ς⊥P rule (ς&P ) pii [e−] ς&P µ˜x −. 〈 µβ−. 〈 x− ∣∣∣∣∣∣pii [β−]〉∣∣∣∣∣∣e−〉 (ςP` ) [ e−, e′− ] ςP` µ˜x +. 〈 µβ−. 〈 x+ ∣∣∣∣∣∣[β−, e′−]〉∣∣∣∣∣∣e−〉 (ςP` ) [E−, e−] ςP` µ˜x +. 〈 µβ−. 〈 x+ ∣∣∣∣∣∣[E−, β−]〉∣∣∣∣∣∣e−〉 (ς→P ) v+ · e′− ς→P µ˜x+. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈x+∣∣∣∣∣∣y+ · e′−〉〉 (ς→P ) V+ · e− ς→P µ˜x+. 〈 µβ−. 〈 x+ ∣∣∣∣∣∣V+ · β−〉∣∣∣∣∣∣e−〉 (ς¬P) ¬ [v+] ς¬P µ˜x+. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈x+∣∣∣∣∣∣¬ [y+]〉〉 (ς ↑P) no ς ↑ P rule  e− /∈ CoV alueP , v+ /∈ V alueP x−,y+, β− fresh FIGURE 4.11. The focusing ς laws for polarized system L: with two unit types (1, >), two empty types (0, ⊥), (co-)products (&, ⊕), (co-)pairs (⊗, `), two negations (∼, ¬), functions (→), and two polarity shifts (↓, ↑). 129 Theorem 4.1 (Progress and preservation). For any system L command c : (Γ ` ∆): a) Progress: c is a polarized final command or there is a command c′ such that c 7→µP µ˜PβP ςP c′, and b) Preservation: if c 7→µP µ˜PβP ςP c′, then c′ : (Γ ` ∆). Proof. The proof is analogous to the proof of Theorem 3.3. Progress follows by induction on the typing derivation of c : (Γ ` ∆), which is assured because – every v+ is either a value, an output abstraction, or a ςP redex, – every e+ is either an input abstraction or in SimpleCoValue+, – every v− is either an output abstraction or in SimpleValue−, and – every e− is either a co-value, an input abstraction, or a ςP redex. Therefore, if the cut is neither final nor reducible, then either its positive term or negative co-term ςP -reduces. Preservation follows by cases on all the possible rewriting rules using the substitution principle for typing derivations similar to Theorem 3.3, so that – if c µµ˜ηµηµ˜βς c′ then c : (Γ ` ∆) implies c′ : (Γ ` ∆), – if v µµ˜ηµηµ˜βς v′ then v : (Γ ` ∆)C implies v′ : (Γ ` ∆)C, and – if e µµ˜ηµηµ˜βς e′ then e : (Γ ` ∆)C implies e′ : (Γ ` ∆)C. Also, much like the dual calculi, the two methods of focusing correspond to one another, applying the same essential transformations either during execution or as a pre-processing pass. More specifically, the focused sub-syntax of polarized system L contains exactly the ςP-normal forms of system L, and therefore every command, term, and co-term can be ςP reduced into the focused sub-syntax. Theorem 4.2 (Focusing). Every polarized system L command, term, and co-term is in the focused sub-syntax if and only if it is a ςP-normal form. Furthermore, for every polarized system L command c, term v, and co-term e, there is a focused command c′, term v′, and co-term e′ such that cςP c ′, v ςP v ′, and eςP e ′. Proof. First, the fact that the command and (co-)term in the focused sub-syntax are in one-for-one correspondence with ςP -normal forms follows by induction on the syntax of polarized system L commands and (co-)terms. 130 Second, observe that the ςP reduction theory is strongly normalizing because each reduction reduces the number of non-(co-)values within term and co-term constructions which serves as a normalization measure. Therefore, every command and (co-)term has a unique ςP-normal form, which by the first point must lie within the focused sub-syntax of polarized system L. Self-Duality System L exhibits a logical duality similar to the dual calculi (see Section 3.3). However, the dual calculi are two separate dual calculi—one call-by-value and one call-by-name—that share common syntax and types, polarized system L is self-dual. In other words, we can say that polarized system L internalizes the notion of duality inside of itself, so that it gives a single, complete, and self-contained language for discussing and using dual concepts. This is because polarization lets us incorporate both call-by-name and call-by-value constructs and evaluation. In the dual calculi, call- by-value programs are dualized into call-by-name ones, and vice versa, which lie in the two separate interpretations of the same syntax. But polarized system L contains both call-by-value and call-by-name fragments, so that there is no need for a separate calculus and a change of interpretation to accomodate the inversion of control flow caused by duality. As with the dual calculi, the self-duality of polarized system L resembles the de Morgan laws, where truth is dual to falsehood and conjunction is dual to conjunction. However, polarity explicitly reveals another aspect of duality that was implicit in the dual calculus: duality also reverses the polarity of types and programs. So 0 is dual to >, 1 is dual to ⊥,⊕ is dual to &, and ⊗ is dual to `. The polarity reversal corresponds to the fact that the dynamic semantics of call-by-value is dual to that of call-by-name. This also means that, whereas the single negation connective was self-dual in the dual calculi, the two polarities of negation (∼ and ¬) are dual to one another. Likewise, the two polarity shifts (↓ and ↑) are also dual connectives. The only lack of symmetry is with function types A+ → B− which lack their dual counterpart, as was the case in both LK (Section 3.1) and the dual calculi (Section 3.3). As is now the standard procedure, this asymmetry is easily remedied by adding subtraction types B+ − A− as the dual counterpart to function types as shown in Figure 4.12. Syntactically, this presentation of subtraction is the same as in the dual calculi, except that we use a case abstraction co-term µ˜[(α · y).c] to match 131 v+ ∈ Term+ ::= . . . | e− · v+ e+ ∈ CoTerm+ ::= . . . | µ˜ [ (α− · y+).c ] V+ ∈ Value+ ::= . . . | E− · V+ A+, B+, C+ ∈ Type+ ::= . . . | B+ − A− Γ′ | e− : A− ` ∆′ Γ ` v+ : B+ | ∆ Γ,Γ′ ` e− · v+ : B+ − A− | ∆,∆′ −R c : (Γ, y+ : B+ ` α− : A−∆) Γ | µ˜[(α− · y+).c] : B+ − A− ` ∆ −L Γ′ ; E− : A− ` ∆′ Γ ` V+ : B+ ; ∆ Γ,Γ′ ` E− · V+ : B+ − A− ; ∆,∆′ −R (β−P ) 〈 E− · V+ ∣∣∣∣∣∣µ˜[(α− · y+).c]〉 β−P c{V+/y+, α−E−} (ς−P ) e− · v+ ς−P µα +. 〈 µβ+. 〈 β− · v+ ∣∣∣∣∣∣α+〉∣∣∣∣∣∣e−〉 (e /∈ CoValue−, α+, β− /∈ FV (e− · v+)) (ς−P ) E− · v+ ς−P µα +. 〈 v+ ∣∣∣∣∣∣µ˜y+. 〈E− · y+∣∣∣∣∣∣α+〉〉 (v /∈ Value+, α+, y+ /∈ FV (E− · v+)) (η−P ) e+ : B+ − A− ≺η−P µ˜ [ (α− · y+). 〈 α− · y+ ∣∣∣∣∣∣e〉] FIGURE 4.12. Extending polarized system L with subtraction (−), the dual of implication (→). 132 (X+)⊥ , X− (X−)⊥ , X+ 0⊥ , > >⊥ , 0 1⊥ , ⊥ ⊥⊥ , 1 (A+ ⊕B+)⊥ , (A⊥+) & (B⊥+) (A− &B−)⊥ , (A⊥−)⊕ (B⊥−) (A+ ⊗B+)⊥ , (A⊥+)` (B⊥+) (A− `B−)⊥ , (A⊥−)⊗ (B⊥−) (B+ − A−)⊥ , (A⊥−)→ (B⊥+) (A+ → B−)⊥ , (B⊥−)− (A⊥+) (∼A−)⊥ , ¬(A⊥−) (¬A+)⊥ , ∼(A⊥+) (↓A−)⊥ , ↑(A⊥−) (↑A+)⊥ , ↓(A⊥+) FIGURE 4.13. The self-duality of system L types: with two unit types (1,>), two empty types (0, ⊥), (co-)products (⊕, &), (co-)pairs (⊗, `), (co-)functions (→, −), and negations (∼, ¬), and two polarity shifts (↓, ↑). the system L style of pattern-matching function terms instead of a λ abstracting a co-variable over a co-term. Additionally, the fact that the function type A+ → B− mixes the two polarities is reflected in the subtraction type B+−A−. With symmetry restored, we formally define the duality of polarized system L types in Figure 4.13 and programs in Figure 4.14. The self-duality of polarized system L exhibits the same pleasant properties as duality of the dual calculi from Section 3.3: the duality relation is involutive, respects static semantics (typing), and respects dynamic semantics (reduction). The major departure from the dual calculi is that all the dynamic semantics and rewriting rules are contained within the same polarized language, instead of being split between two interpretations of the same syntax. Theorem 4.3 (Involutive duality). The duality operation ⊥ on environments, sequents, types, commands, terms, and co-terms is involutive, so that ⊥⊥ is the identity transformation. Proof. By induction on the definition of the duality operation ⊥, similar to the proof of Theorem 3.6. Theorem 4.4 (Static duality). a) c : (Γ ` ∆) is well-typed if and only if c⊥ : (∆⊥ ` Γ⊥) is. b) Γ ` v : A | ∆ is well-typed if and only if ∆⊥ | v⊥ : A⊥ ` Γ⊥ is. 133 〈v+||e+〉⊥ , 〈 e⊥+ ∣∣∣∣∣∣v⊥+〉 〈v−||e−〉⊥ , 〈e⊥−∣∣∣∣∣∣v⊥−〉 (x+)⊥ , x− (µα+.c)⊥ , µ˜α−.c⊥ ()⊥ , [] ι1 (v+)⊥ , pi1 [ v⊥+ ] ι2 (v+)⊥ , pi2 [ v⊥+ ] ( v+, v ′ + )⊥ , [ v⊥+, v ′⊥ + ] (e− · v+)⊥ , e⊥− · v⊥+ ∼ (e−)⊥ , ¬ [ e⊥− ] ↓(v−)⊥ , ↑ [ v⊥− ] [α+]⊥ , α− [µ˜x+.c]⊥ , µx−.c⊥ µ˜[]⊥ , µ() µ˜[().c]⊥ , µ ( [].c⊥ ) µ˜ [ ι1 ( x+ ) .c | ι2 ( y+ ) .c′ ]⊥ , µ ( pi1 [ x− ] .c⊥ | pi2 [ y− ] .c′⊥ ) µ˜ [( x+, y+ ) .c ]⊥ , µ ([ x−, y− ] .c⊥ ) µ˜ [ (α− · x+).c ]⊥ , µ ( [α+ · x−].c⊥ ) µ˜ [ ∼ ( α− ) .c ]⊥ , µ ( ¬ [ α+ ] .c⊥ ) µ˜ [ ↓ ( x− ) .c ]⊥ , µ ( ↑ [ x+ ] .c⊥ ) (x−)⊥ , x+ (µα−.c)⊥ , µ˜α+.c⊥ µ()⊥ , µ˜[] µ([].c)⊥ , µ˜ [ ().c⊥ ] µ ( pi1 [ α− ] .c | pi2 [ β− ] .c′ )⊥ , µ˜ [ ι1 ( α+ ) .c⊥ | ι2 ( β +) .c′⊥ ] µ ([ α−, β− ] .c )⊥ , µ˜ [( α+, β +) .c⊥ ] µ ( [x+ · α−].c )⊥ , µ˜ [ (x− · α+).c⊥ ] µ ( ¬ [ x+ ] .c )⊥ , µ˜ [ ∼ ( x− ) .c⊥ ] µ ( ↑ [ α+ ] .c )⊥ , µ˜ [ ↓ ( α− ) .c⊥ ] [α−]⊥ , α+ [µ˜x−.c]⊥ , µx+.c⊥ []⊥ , () pi1 [e−]⊥ , ι1 ( e⊥− ) pi2 [e−]⊥ , ι2 ( e⊥− ) [ e−, e′− ]⊥ , ( e⊥−, e ′⊥ − ) [v+ · e−]⊥ , v⊥+ · e⊥− ¬ [v+]⊥ , ∼ ( v⊥+ ) ↑[e+]⊥ , ↓ ( e⊥+ ) FIGURE 4.14. The self-duality of system L programs: with two unit types (1,>), two empty types (0, ⊥), (co-)products (⊕, &), (co-)pairs (⊗, `), (co-)functions (→, −), and negations (∼, ¬), and two polarity shifts (↓, ↑). 134 c) Γ | e : A ` ∆ is well-typed if and only if ∆⊥ ` e⊥ : A⊥ | Γ⊥ is. Furthermore, if a command, term, or co-term lies in the focused sub-syntax, then so does its dual. Proof. By induction on the typing derivation, similar to the proof of Theorem 3.7. Theorem 4.5 (Dynamic duality). a) c µP µ˜PβP c′ if and only if c⊥ µP µ˜PβP c′⊥. b) v ηµςP v′ if and only if v⊥ ηµ˜ςP v′⊥. c) e ηµ˜ςP e′ if and only if e⊥ ηµςP e′⊥. Proof. By cases on the respective rewriting rules using the fact that substitution commutes with duality, similar to the proof of Theorem 3.8. 135 CHAPTER V Data and Co-Data This chapter is a new text based on the ideas and results from (Downen & Ariola, 2014c) of which I was the primary author and developed the language and theory of data and co-data in the classical sequent calculus presented in this chapter. I would to thank my advisor Zena M. Ariola for the assistance and feedback in writing that publication. The ramifications of treating the sequent calculus as a programming language (Curien & Herbelin, 2000; Wadler, 2003; Zeilberger, 2008b; Munch-Maccagnoni, 2009) have elucidated issues that arise in programs, including the interplay between strict and lazy evaluation in programs and types. When interpreted as a computational framework, the sequent calculus reveals a diversity of connectives that is easy to overlook in the tradition of the λ-calculus. However, this diversity can become overwhelming. We now have several connectives for representing similar logical ideas: two connectives each for conjunction, disjunction, negation, and so on. Additionally, there are still some questions that have not been addressed. For instance, how do other evaluation strategies, like call-by-need (Ariola et al., 1995; Ariola & Felleisen, 1997; Maraist et al., 1998),1 fit into the picture? If we follow the story of polarized logic, that the polarity determines evaluation order, then there is no room—by definition there are only two polarities so we can only directly account for two evaluation strategies with this approach. We now aim to tame the abundance of connectives found in the sequent calculus. Can we find a single pattern that encompasses every single connective we have discussed so far in the sequent calculus? That way, rather than cataloguing the many different connectives on a case-by-case basis, we can direct our attention on the commonalities underlying them all. As a tool for analysis, we summarize a broad family of types occurring in the sequent calculus, whose static and dynamic properties 1Call-by-need can be thought of as a memoizing version of call-by-name where the arguments to function calls are evaluated on demand, like in call-by-name, but where the value of an argument is remembered so that it is computed only once, like in call-by-value. 136 all derive from a small core. As a tool for synthesis, we use the patterns underpinning the connectives as a mechanism facilitating the exploration of new connectives. Furthermore, we look for a more general classification of evaluation strategies, in an effort to capture the essence of strategies, that goes beyond the duality between call-by-value and call-by-name evaluation. In order to account for other evaluation strategies like call-by-need, we need to step outside of the polarization hypothesis, which assumed that every evaluation strategy corresponds to one of the (two!) polarities. Instead, we look at a treatment of strategy based on its impact on substitution. The substitution-based characterization of evaluation strategies is general enough to describe call-by-need evaluation and also generalizes polarization as a mechanism for combining different evaluation strategies within a single program. Our approach to understanding the dynamic behavior of the various connectives is the same as the traditional approach from the λ-calculus: the dynamic meaning of all connectives are characterized by β and η laws. We will first investigate these principles as symmetric equations, rather than non-symmetric reductions, which lets us understand β and η laws that are valid for any evaluation strategy. Besides maintaining similarity with the simply typed λ-calculus, the equational theory avoids the conflict between extensionality and control that arises in rewriting theories for classical logic (David & Py, 2001). Instead, we drive the (untyped) reduction theory and operational semantics for all the connectives, which includes the operational β and focusing ς rewriting rules previously studied in Chapters III and IV, is justified in terms of the fundamental β and η equations. The Essence of Evaluation: Substitutability As we have seen previously in Chapters III and IV, there are many different languages for the sequent calculus (Curien & Herbelin, 2000; Wadler, 2003; Herbelin, 2005; Munch-Maccagnoni, 2009; Munch-Maccagnoni & Scherer, 2015) that are all based on the same structural core µµ˜-calculus that was explored in Section 3.2. This core, as was in Figure 3.7, forms the basis of naming in the sequent calculus via variables and co-variables as well as input and output abstractions. Further still, the fundamental dilemma of computation in classical sequent calculus lies wholly within this core. The root cause of non-determinism, non-confluence, and incoherence is a conflict between the input and output abstractions, where each one tries to take control over the future path of evaluation. Therefore, before we tackle evaluation 137 of the language with (co-)data types, we will first focus on how to characterize the resolutions to the fundamental dilemma in the structural core of the sequent calculus. Recall that the source of the conflict in the structural core of the sequent calculus comes from the two opposing rules for implementing substitution: 〈µα.c||e〉 µ c {e/α} 〈v||µ˜x.c〉 µ˜ c {v/x} As stated, a command like 〈µ .c1||µ˜ .c2〉, where the (co-)variables are never used, is equal to both c1 and c2, so any two arbitrary commands may be considered equal. The language-based solution to this dilemma from Chapter III is to restrict one of the two rules to remove the conflict—the µ rule is restricted to co-values to implement a form of call-by-name evaluation or the µ˜ rule is restricted to values implement a form of call-by-value evaluation. However, in lieu of inventing various different languages with different evaluation strategies for mitigating the conflict, let’s instead admit restrictions on both directions of substitution to values and co-values: 〈µα.c||E〉 µS c {E/α} 〈V ||µ˜x.c〉 µ˜S c {V/x} while leaving the specifics of what constitutes a substitutable value V and a substitutable co-value E open-ended. That is to say, we make the sets of values (V ∈ ValueS) and co-values (E ∈ CoValueS) a parameter of the theory, in the same sense as Ronchi Della Rocca & Paolini’s (2004) parametric λ-calculus, that may be filled in at a later time. A choice of a specific value set ValueS and co-value set CoValueS makes up a substitution strategy S = (ValueS ,CoValueS). The full parametric equational theory µµ˜ for the structural core (Downen & Ariola, 2014c) is given in Figure 5.1, where we denote a particular instance of for a chosen substitution strategy S as µµ˜S . Since the rules for extensionality of input and output abstractions did not cause any issue, we leave them alone. By leaving the choice of dual substitution restrictions open as parameters, the same parametric theory may describe the semantics different evaluation strategies by instantiating the parameters in different ways. As per Remark 2.3, we can derive reduction and equational theories from the µS µ˜Sηµηµ˜ rewriting rules from Figure 5.1 as their compatible-reflexive-transitive and compatible-reflexive-symmetric- transitive closures, respectively. So, given a particular substitution strategy S, the S instance of the parametric reduction and equational theories, denoted µµ˜S , is 138 (µS) 〈µα.c||E〉 µS c {E/α} (E ∈ CoValueS) (µ˜S) 〈V ||µ˜x.c〉 µ˜S c {V/x} (V ∈ ValueS) (ηµ) µα. 〈v||α〉 ηµ v (α /∈ FV (v)) (ηµ˜) µ˜x. 〈x||e〉 ηµ˜ e (x /∈ FV (e)) FIGURE 5.1. A parametric theory, µµ˜S , for the core µµ˜-calculus. obtained by instantiating the set of values and co-values with S. The one constraint on the substitution strategy is that we always assume that variables are values, and co-variables are co-values, since our restriction on the µ and µ˜ axioms mean that they can only ever stand in for unknown value and co-values. If we want to characterize an operational semantics as well, we also need to specify the evaluation contexts in which the standard reduction may occur. Therefore, we say that an evaluation strategy S (or just strategy for short) is a substitution strategy together with a set of evaluation contexts (D ∈ EvalCxtS) that yield a command when filled with a command, term, or co-term as appropriate, and which includes at least the following contexts: –  ∈ EvalCxtS , – 〈||E〉 ∈ EvalCxtS for all E ∈ CoValueS , and – 〈V ||〉 ∈ EvalCxtS for all V ∈ ValueS . So a choice of evaluation strategy S gives us the µS µ˜S operational semantics that is closed under EvalCxtS contexts. The previous characterizations of call-by-value and call-by-name from Chapter III come out as particular instances of the parametric theory. For example, we can define the call-by-value strategy V, shown in Figure 5.2, by restricting the set of values to exclude output abstractions, leaving variables as the only value, and letting every co-term be a co-value. In effect, this decision restricts the µ˜ rule in the usual way for call-by-value while letting the µ rule be unrestricted. In addition, the V evaluation contexts only permit reduction at the top of a command or one of its immediate sub- (co-)terms, favoring the term side over the co-term side. The call-by-name strategy N is defined in the dual way by letting every term be a value and restricting the set 139 V ∈ ValueV ::= x E ∈ CoValueV ::= e D ∈ EvalCxtV ::=  | 〈||e〉 | 〈V ||〉 V ∈ ValueN ::= v E ∈ CoValueN ::= α D ∈ EvalCxtN ::=  | 〈v||〉 | 〈||E〉 FIGURE 5.2. Call-by-value (V) and call-by-name (N ) strategies for the core µµ˜-calculus. of co-values to exclude input abstractions, leaving co-variables as the only co-value. Again, this choice of values and co-values describes the call-by-name restriction on the µ rule while leaving the µ˜ rule unrestricted. The N evaluation contexts also only permit reduction at the top of commands or the immediate sub-(co-)terms, but instead favor the co-term side over the term size. We can also explore other choices for the parameters that describe strategies besides just call-by-value and call-by-name. For instance, we can characterize a notion of call-by-need in terms of a “lazy call-by-value” strategy LV shown in Figure 5.3, which characterizes evaluation similar to a previous call-by-need theory for the sequent calculus (Ariola et al., 2011). The intuition for LV is similar to the call-by-need λ- calculus (Ariola et al., 1995): a non-value term bound to a variable represents a delayed computation that will only be evaluated when it is needed. Then, only once the term has been reduced to a value (in the sense of call-by-value), may it be substituted for the variable. In this way, LV only performs V substitutions (which we can see from the fact that the LV (co-)values are a subset of V (co-)values), but in a lazy, pull-driven fashion that gives initial priority to the consumer as in N . Therefore, in the command 〈v1||µ˜x. 〈v2||µ˜y.c〉〉, we temporarily ignore v1 and v2 and work inside c since this command decomposes into the evaluation context 〈v1||µ˜x. 〈v2||µ˜y.〉〉 surrounding c. If it turns out that c evaluates to D[〈x||E〉], we are left in the state 〈v1||µ˜x. 〈v2||µ˜y.D[〈x||E〉]〉〉, where E is a co-value that wants to know something about x, making µ˜x. 〈v2||µ˜y.D[〈x||E〉]〉 into a co-value as well. Therefore, if v1 is a non-value output abstraction, it may take over via the µLV rule, and thus begin evaluation of the value of the demanded variable x. Due to the symmetry of the sequent calculus, it is straightforward to generate the dual to the call-by-need strategy, which is the “lazy call-by-name” strategy LN shown in Figure 5.4. This strategy performs a subset of N substitutions (since LN (co-)values are a subset of N (co-)values), but still gives initial priority to the producer 140 V ∈ ValueLV ::= x E ∈ CoValueLV ::= α | µ˜x.D[〈x||E〉] D ∈ EvalCxtLV ::=  | 〈v||µ˜y.D〉 | 〈v||〉 | 〈||E〉 FIGURE 5.3. “Lazy-call-by-value” (LV) strategy for the core µµ˜-calculus. V ∈ ValueLN ::= x | µα.D[〈V ||α〉] E ∈ CoValueLN ::= α D ∈ EvalCxtLN ::=  | 〈µα.D||e〉 | 〈||e〉 | 〈V ||〉 FIGURE 5.4. “Lazy-call-by-name” (LN ) strategy for the core µµ˜-calculus. as in V. For example, in the command 〈µα. 〈µβ.c||e2〉||e1〉, we temporarily ignore e1 and e2 and work inside c since this command decomposes into the LN evaluation context 〈µα. 〈µβ.||e2〉||e1〉 surrounding c. If it turns out that c evaluates to D[〈V ||α〉], we are left in the state 〈µα. 〈µβ.D[〈V ||α〉]||e2〉||e1〉, where V is a value that wants to yield a result to α, making µα. 〈µβ.D[〈V ||α〉]||e2〉 a value as well. Therefore, if e1 is a non-co-value input abstraction, it may take over via the µ˜LN rule, and thus begin evaluation of the observation for the demanded co-variable α. Remark 5.1. Note that, while our primary interest in strategies is to achieve a coherent, confluent theory of deterministic evaluation by avoiding the fundamental dilemma of classical computation, individual strategies are not required to do so. That is to say, it can be meaningful to talk about strategies that yield incoherent theories for the sequent calculus, if we are not interested in properties like confluence. For example, the simplest such strategy is the “unrestricted” strategy, U , for unconstrained and non-deterministic evaluation, which considers every term to be a value and every co-term to be a co-value as shown in Figure 5.5. The µ˜U µ˜U theory effectively ignores the concept of values and co-values, choosing to restrict neither the µ nor µ˜ rules for V ∈ V alueU ::= v E ∈ CoV alueU ::= e D ∈ EvalCxtU ::=  FIGURE 5.5. Nondeterministic (U) strategy for the core µµ˜-calculus. 141 substitution, and thereby giving a theory corresponding to Barbanera & Berardi’s (1994) symmetric λ-calculus for a classical logic that does not consider a restricted evaluation strategy. End remark 5.1. Remark 5.2. Another way to think about substitution strategies, and the parameterized notions of values and co-values, is to consider the essential parts of an equational theory. Typically, equational theories are expressed by a set of axioms (primitive equalities assumed to hold) along with some basic properties or rules for forming larger equations like compatibility reflexivity, symmetry, and transitivity previously discussed in Remark 2.3. In a language with an internal notion of variables, like the λ-calculus or the core µµ˜-calculus, we also generally expect the equational theory to be closed under substitution. That is to say, if two things are equal, then they should still be equal after substituting the same term for the same variable in both of them. However, this principle often does not always hold in full generality for programming languages. For example, the ML terms let y = x in 5 and 5 are equal—they will always behave the same in any context. However, if we substitute the term (print ”hi”; 1) for x in both, we end up with let y = (print ”hi”; 1) in 5 and 5, which are no longer equal because one produces an observable side effect (printing the string ”hi”) and the other does not. Instead, ML supports a restricted substitution principle: if two terms are equal, then they are still equal when we substitute the same value (an integer, a pair of values, a function abstraction, . . . ) for the same variable in both of them. This restriction deftly avoids these kinds of counter-examples. The exact same issue arises in the classical sequent calculus, since it also includes effects that allow manipulation of control flow. Therefore, we need to restrict the substitution principle in the sequent calculus to only allow substituting values for variables. Additionally, since we have a second form of substitution, we also have a restriction that only allows substituting co-values for co-variables. This leads us to substitution principles that say if two commands (or terms or co-terms) are equal, they must still be equal after substituting (co-)values for (co-)variables: c = c′ V ∈ ValueS c {V/x} = c′ {V/x} substS c = c′ E ∈ CoValueS c {E/α} = c′ {E/α} substS and similarly for substitutions in terms and co-terms. 142 In lieu of the presentation in Figure 5.1, we may also define the dynamic semantics of the core µµ˜-calculus by axioms describing trivial statements about variable binding. The ηµ and ηµ˜ rules state that giving a name to something, and then using it immediately (without repetition) in the same place is the same thing as doing nothing. Additionally, we may say that binding a variable to itself is the same thing as doing nothing: (µα) 〈µα.c||α〉 µα c (µx) 〈x||µ˜x.c〉 µ˜x c These axioms can also be seen as the special cases of µS and µ˜S which are always sound for every strategy, since we always assume that (co-)variables are (co-)values. If we take the above substitution principles as primitive inference rules like reflexivity, etc. in our equational theory, we can derive µS and µ˜S from the µα and µ˜x axioms. The trick is to realize that a command like 〈V ||µ˜x.c〉 is the image of 〈x||µ˜x.c〉 under substitution of V for x. That is to say that 〈V ||µ˜x.c〉 is syntactically the same as 〈x||µ˜x.c〉 {V/x}. Therefore, we can derive the µ˜S axiom from µ˜x and substS as follows: 〈x||µ˜x.c〉 = c µ˜x V ∈ ValueS 〈V ||µ˜x.c〉 = c {V/x} substS The derivation of µS from µα and substS is similar: 〈µα.c||α〉 = c µα E ∈ CoValueS 〈µα.c||E〉 = c {E/α} substS Conversely, the substitution principles are derivable from the more powerful µ˜S and µS axioms. For example, we can derive the substS principle for co-values from µ˜S by recognizing that both sides of the equation can be deduced from a command like 〈V ||µ˜x.c〉 with the µ˜V axiom, so that congruence allows us to lift the equality c = c′ under the bindings. The full derivation of the co-value substS principle is: V ∈ ValueS 〈V ||µ˜x.c〉 = c {V/x} µ˜S c {V/x} = 〈V ||µ˜x.c〉 symm c = c′ 〈V ||µ˜x.c〉 = 〈V ||µ˜x.c′〉 comp V ∈ ValueS 〈V ||µ˜x.c′〉 = c′ {V/x} µ˜S 〈V ||µ˜x.c〉 = c′ {V/x} trans c {V/x} = c′ {V/x} trans 143 and the substS principle for co-values may be derived similarly. Therefore, the µ˜S and µS rules may also be seen as a realization of two dual substitution principles of an equational theory in the form of axioms. And furthermore, by controlling substitution we control evaluation itself. End remark 5.2. The Essence of Connectives: Data and Co-Data When considering a variety of different polarized connectives (Zeilberger, 2008b, 2009; Curien & Munch-Maccagnoni, 2010; Munch-Maccagnoni, 2013), we find that they all fit into one of two dual patterns. Each polarized connective is either positive or negative: positive connectives (following the verificationist approach) describe how to construct terms, whereas negative connectives (following the pragmatist approach) describe how to construct co-terms. In response, both approaches define their other half by inversion, or cases on the allowed patterns of construction. Thus, we use verificationist approach to represent (algebraic) data types from functional languages, whose objects are produced by specific constructions and consumed by inversion on the possible constructions. Contrastingly, we use pragmatist approach to represent the dual form of co-data types, whose observations, or messages, are described by specific constructions and whose objects respond by inversion on those possible observations. To study types in the sequent calculus, we will mirror the way that modern programming languages let the user define new types. Functional languages allow for user-defined data types, which are declared by describing the constructors used to build objects of that type. Object-oriented languages allow for user-defined co-data types as interfaces, which are declared by describing the methods (observations) to which objects of that type respond. As we have seen, the sequent calculus unifies these two computational uses of types, letting us describe both user-defined data and co-data types as mirror images of one another. Thus, we aim to encompass all the previously considered connectives as user-defined (co-)data types. As a starting point, we base the syntax for declaring new user-defined (co-)data type declarations in the sequent calculus on data type declarations in functional languages. However, because the form of (co-)data types in the classical sequent calculus is more expressive than data types in functional languages, we need a syntax that is more general than the usual form of data type declaration from ML-based languages. Therefore, we will look at how the generalized syntax for GADTs in Haskell (Peyton Jones et al., 2006; Schrijvers et al., 2009) may be used for ordinary data type 144 declarations. For example, the typical sum type Either and pair type Both may be declared as: data Either a bwhere Left : a→ Either a b Right : b→ Either a b dataBoth a bwhere Pair : a→ b→ Both a b In the declaration for Either, we specify that there are two constructors: a Left constructor that takes a value of type a and builds a value of type Either a b, and similarly a Right constructor that take a value of type b and builds a value of type Either a b. In the declaration for Both, we specify that there is one constructor, Pair, that takes a value of type a, a value of type b, and builds a value of type Both a b. When declaring a new type in the sequent calculus, we will take the basic GADT form, but instead describe the constructors with a sequent judgment rather than a function type. For connectives following the verificationist approach, we have data type declarations that introduce new concrete terms and abstract co-terms. For instance, we can give a declaration of A⊕B as: dataX ⊕ Y where ι1 : X ` X ⊕ Y | ι2 : Y ` X ⊕ Y | where we replace the function arrow (→) with logical entailment (`), to emphasize that the function type is not inherently baked into the system. Additionally, we mark the distinguished output of each constructor as X ⊕ Y |, which denotes the type of the result produced as the output of the constructed term. This declaration extends the syntax of the language with two new concrete terms for the constructors, ι1 (v) and ι2 (v), and with one new abstract co-term for case analysis, µ˜[ι1 (x).c1 | ι2 (y).c2]. Note that these are exactly the system L terms and co-terms for the type A⊕B from Figure 4.3. Similarly, we can declare pair types A⊗B as: dataX ⊗ Y where ( , ) : X, Y ` X ⊗ Y | 145 where the multiple inputs to the constructor are given as a list of inputs on the left of the sequent, as opposed to the “curried” style used in the declaration of Both. Note that we make use of mix-fix notation ( , ) used in functional languages like Agda for describing the constructor syntax, so that this declaration extends the syntax of the language with one new concrete term for the constructor, (v, v′), and one new abstract co-term for case analysis, µ˜[(x, y).c]. Again, these are exactly the same terms and co-terms for the type A⊗B in system L. However, note that user-defined types in the sequent calculus are more general than in functional programming languages. For example, we can declare the positive form of negation as: data∼X where ∼ : ` ∼X | X where we have an additional output beside the normal distinguished output of type ∼X, which is not expressible in functional programming languages. This declaration extends the syntax of the language with one new concrete term for the constructor, ∼ (e), and one new abstract co-term for case analysis, µ˜[∼ (α).c]. Besides data declarations, we also have co-data declarations that introduce abstract terms and concrete co-terms. We can think of a co-data declaration as an interface that describes the messages understood by an abstract value. By analogy to object-oriented programming, an interface (co-data type declaration) describes the fixed set of methods (co-structures) that an object (case abstraction) has to support (provide cases for), and the object value (case abstraction) defines the behavior that results from a method call (command). For example, we can declare product types A&B as: codataX & Y where pi1 : | X & Y ` X pi2 : | X & Y ` Y where instead of a distinguished output, we have a distinguished input marked as | A & B for each co-constructor, which denotes the type of the input expected by the constructed co-term. This declaration extends the language with a new abstract term for case analysis, µ(pi1 [α].c1|pi2 [β].c2), and two concrete co-terms, pi1 [e] and pi2 [e]. 146 Note that these are exactly the terms and co-terms for the type A&B as described in system L. Of note, we find that function types, which are usually baked into functional programming languages as non-definable types, are just another instance of user- defined co-data types in the sequent calculus. In particular, we can declare function types A→ B as: codataX → Y where · : X | X → Y ` Y Following the pattern by rote, this declaration extends the language with a new abstract term, µ([x · α].c) where we put brackets around the call-stack pattern x · α for clarity, and a new concrete co-term, v · e. Even though presentation for objects of the function type differs from the usual λ-based presentation, both presentations are mutually definable as syntactic sugar based on one another, as we saw in Chapter IV: λx.v , µ(x · α.〈v||α〉) µ([x · α].c) , λx.µα.c The rest of the basic connectives, including negation and the corresponding unit types for ⊕, ⊗, &, and `, are declared as user-defined (co-)data types in Figure 5.6. Now that we have shown how each of the basic connectives can be described by a data or co-data declaration, our goal is to generalize the pattern to arbitrary, user-defined data and co-data types. First, we introduce the general untyped syntax for arbitrary data and co-data in Figure 5.7.2 In addition to the expressions inherited from the core µµ˜-calculus, we now have two new forms of terms and co-terms. On the one hand, we have data structure terms K( #»e , #»v ) that build a concrete construction with the constructor K, and these may be analysed by a data case abstraction co-term µ˜ [ # »K( #»α , #»x ).c] which defines several alternative responses to its given answer matching specific patterns. On the other hand, we have co-data structure co-terms O[ #»v , #»e ] that build a concrete observation with the observer O, and these may be analysed by a co-data case abstraction term µ ( # »O[ #»x , #»α ].c) which defines several alternative responses to its given question matching specific patterns. Note that for both data and co-data case abstractions, we impose the additional syntactic side-condition that the listed 2We maintain the same convention from Chapter III and IV for user-defined data and co-data types, whereby terms and co-terms are syntactically distinguished by the use of round parenthesis for terms and square brackets for co-terms. 147 dataX ⊕ Y where ι1 : X ` X ⊕ Y | ι2 : Y ` X ⊕ Y | codataX & Y where pi1 : | X & Y ` X pi2 : | X & Y ` Y dataX ⊗ Y where ( , ) : X, Y ` X ⊗ Y | codataX ` Y where [ , ] : | X ` Y ` X, Y data 0where codata>where data 1where () : ` 1 | codata⊥where [] : | ⊥ ` dataX − Y where · : X ` X − Y | Y codataA→ Bwhere · : X | X → Y ` Y data∼X where ∼ : ` ∼X | X codata¬X where ¬ : X | ¬X ` FIGURE 5.6. Declarations of the basic data and co-data types. x, y, z ∈ Variable ::= . . . α, β, γ ∈ CoVariable ::= . . . K ∈ Constructor ::= . . . O ∈ Observer ::= . . . c ∈ Command ::= 〈v||e〉 v ∈ Term ::= x | µα.c | K( #»e , #»v ) | µ ( # »O[ #»x , #»α ].c) e ∈ CoTerm ::= α | µ˜x.c | µ˜ [ # »K( #»α , #»x ).c] | O[ #»v , #»e ] FIGURE 5.7. Adding data and co-data to the core µµ˜ sequent calculus. 148 constructors K, . . . of a data case abstraction are all distinct from one another and likewise the listed observers O, . . . of a co-data case abstraction are all distinct. Second, we give the type system accommodating the general form of declarations for a generic data type constructor F and co-data type constructor G in Figure 5.8. The type constructors in such declarations may connect a sequence of other types, which are represented by the sequence of type variables #»X . Furthermore, a data type may have several constructors, named K1 to Kn, and a co-data type may have several observers which are co-constructors, named O1 to On. The form of these (co-)constructors (i.e. their arity and the type of terms and co-terms they are built from) are described by an arbitrary sequent in the declaration, with the (co-)data being defined in the distinguished input or output position of the sequent. For each such data and co-data declaration, we have additional typing rules for the newly declared connectives which are also shown in Figure 5.8. Because the meaning of a particular type constructor F or G depends on its declaration, we annotate the sequent with the global environment G that specifies the declarations for all the type constructors, so that G is used to determine the shape of their left and right logical rules. While these generalized typing rules are involved, they are described in such a way that they exactly replicate the expected typing rules for existing (co-)data types. For instance, by instantiating the generalized typing rules to the basic (co-)data types from Figure 5.6, we recover exactly the same (unpolarized) logical rules from system L in Figure 4.3. Thus, the syntax and typing rules for user-defined (co-)data types subsume each basic connective. Since we have extended the core µµ˜-calculus syntax with (co-)data structures and abstractions, we must also update the core strategies from Section 5.1 to account for the new values and co-values introduced by the declarations. We could define the (co-)values of each newly declared (co-)data type on a case-by-case basis. However, instead we can also to define the (co-)values of (co-)data types generically across all declarations, which besides being more economical prevents ad-hoc decisions. To do this, we define a strategy S once and for all over an untyped syntax that was given in Figure 5.7 which already accounts for all possible (co-)data type declarations. Also note that the notion of evaluation context does not change with the addition of (co-)data, so we only need to consider how the substitution strategy is impacted. Thus, a strategy can be given for all possible extensions of newly-declared (co-)data types by carving out a set of values and co-values from the untyped syntax of terms and co-terms. 149 A,B,C ∈ Type ::= X | F( #»A) X, Y, Z ∈ TypeVariable ::= . . . F,G ∈ Connective ::= . . . decl ∈ Declaration ::= data F( #»X )where # » K : #»A ` F( #»X ) | #»B | codataG( #»X )where # » O : #»A | G( #»X ) ` #»B G ∈ GlobalEnv ::= # »decl Γ ∈ InputEnv ::= # »x : A ∆ ∈ OutputEnv ::= # »α : A J,H ∈ Judgement ::= c : ( Γ `G ∆ ) | (Γ `G v : A | ∆) | (Γ | e : A `G ∆) Core rules: x : A `G x : A | VR | α : A `G α : A VL c : ( Γ `G α : A,∆ ) Γ `G µα.c : A | ∆ AR c : ( Γ, x : A `G ∆ ) Γ | µ˜x.c : A `G ∆ AL Γ `G v : A | ∆ Γ′ | e : A `G ∆′ 〈v||e〉 : ( Γ′,Γ `G ∆′,∆ ) Cut Logical rules: Given data F( #»X )where # » Ki : # » Aij j ` F( #»X ) | # »Bijj i ∈ G, we have the rules: # » Γ′j | e : Bij # »{C/X} `G ∆′j j # » Γj | v : Aij # »{C/X} `G ∆j j #»Γj j , #» Γ′j j `G Ki( #»e , #»v ) : F( #» C ) | # »∆j j , # » ∆′j j FRKi # » ci : ( Γ, # » xi : Ai # »{C/X} `G # » αi : Bi # »{C/X} ,∆ )i Γ | µ˜ [ # »Ki( #»αi , #»xi).ci i ] : F( #»C ) `G ∆ FL Given codataG( #»X )where # » Oi : # » Aij j | G( #»X ) ` # »Bijj i ∈ G, we have the rules: # » ci : ( Γ, # » xi : Ai # »{C/X} `G # » αi : Bi # »{C/X} ,∆ )i Γ `G µ ( # »Oi[ #»xi , #»αi ].ci i ) : G( #»C ) | ∆ GR # » Γj | v : Aij # »{C/X} `G ∆j j # » Γ′j | e : Bij # »{C/X} `G ∆′j j #»Γj j , #» Γ′j j | Oi[ #»v , #»e ] : G( #»C ) `G # »∆j j , # » ∆′j j GLOi FIGURE 5.8. Types of declared (co-)data in the parametric µµ˜ sequent calculus. 150 Our call-by-value strategy V will mimic ML-like languages. Therefore, we can say that a data structure is a value of V when all of its sub-terms are values. For example, a pair (v1, v2) is a value when both v1 and v2 are values, and an injection, ι1 (v) or ι2 (v), is a value when v is a value. Additionally, all co-data case abstractions (i.e. objects) are considered values. This comes from the fact that a λ-abstraction, which we represent as a case abstraction, is a value in the call-by-value λ-calculus. As before, though, we continue to admit every single co-term as a co-value. Thus, we achieve the V strategy with arbitrary (co-)data types shown in Figure 5.9. Our call-by-name strategy N will mimic call-by-name λ-calculi with data types, similar to Haskell. Therefore, we still admit every single term as a value. The co-values of N represent “strict” contexts from a call-by-name λ-calculus. For example, case analysis is always strict in these languages, therefore the case abstraction of a data type is a co-value. Additionally, an observation of a co-data type is a co-value when all sub-(co-)terms are (co-)values. This follows the definition of co-values from the call-by-name half of the dual calculi from Section 3.3 as well as the hereditary nature of strict contexts for functions and products in a call-by-value λ-calculus. For example, the contexts: letx =  1 in 5 letx = pi1  in 4 is not strict because x is not required to compute the result 5, even though we are applying the hole  to an argument or projecting out one of its components. However, the contexts: case 1of ι1(x)⇒ 5 | ι2(y)⇒ 10 case pi1 of ι1(x)⇒ 5 | ι2(y)⇒ 10 are both strict because we need to compute the input plugged into  to determine which branch to take. Thus, we achieve the N strategy with arbitrary (co-)data types shown in Figure 5.9. Finally, our call-by-need strategy LV is the most complex, since it accounts for the memoization used to efficiently implement lazy evaluation for the Haskell language. Intuitively, the key to understanding call-by-need is to think about sharing, where the values and co-values of LV represent terms and co-terms that may be freely copied as many times as necessary. In LV , a structure can be copied if all of its sub- 151 V ∈ ValueV ::= x | K( #»e , #»V ) | µ ( # »O[ #»x , #»α ].c) E ∈ CoValueV ::= e V ∈ ValueN ::= v E ∈ CoValueN ::= α | O[ #»v , #»E ] | µ˜ [ # »K( #»α , #»x ).c] FIGURE 5.9. Call-by-value (V) and call-by-name (N ) substitution strategies extended with arbitrary (co-)data types. V ∈ ValueLV ::= x | K( #»E, #»V ) | µ ( # »O[ #»x , #»α ].c) E ∈ CoValueLV ::= α | µ˜x.D[〈x||E〉] | O[ #»v , #»E ] | µ˜ [ # »K( #»α , #»x ).c] V ∈ ValueLN ::= x | µα.D[〈V ||α〉] | K( #»e , #»V ) | µ ( # »O[ #»x , #»α ].c) E ∈ CoValueLN ::= α | O[ #»V , #»E ] | µ˜ [ # »K( #»α , #»x ).c] FIGURE 5.10. “Lazy-call-by-value” (LV) and “lazy-call-by-name” (LN ) substitution strategies extended with arbitrary (co-)data types. (co-)terms can be copied, following the usual treatment of sharing for data structures in implementations of Haskell. Additionally, a case abstraction can always be copied, following the treatment of λ-abstractions in implementations of Haskell. Thus, we achieve the LV strategy with arbitrary (co-)data types shown in Figure 5.10. The dual lazy call-by-name strategy LN is also shown in Figure 5.10, which is derived by exchanging the role of terms and co-terms from LV . Evaluating Data and Co-Data Having resolved the fundamental dilemma of computation in the parametric µµ˜-calculus via a variety of strategies, and having extended the language with new syntactic forms for user-defined (co-)data types, we now need to explain how the constructs of (co-)data types behave. To that end, we introduce two different semantics for (co-)data in the parametric sequent calculus: – a typed βη theory that is independent of the chosen strategy, and – an untyped βς theory that depends on the chosen strategy. 152 Both of these theories have their own advantages and disadvantages. On the one hand, the βη theory gives a canonical definition of the dynamic semantics of (co-)data independently of any evaluation strategy, but relies on types to do so sensibly. On the other hand, the βς theory gives a mechanism for running programs without resorting to types and equational reasoning, but it depends on the chosen evaluation strategy and relates fewer programs than βη. The typed βη theory of (co-)data Since the evaluation strategy is handled by the equational theory of the core µµ˜-calculus, we should express the behavior of (co-)data type structures in some way that is valid for any choice of strategy, S. In other words, given a set of data and co-data type declarations G, we would like to describe the equational theory for the language extended with those types. As we saw in Chapter II, in the λ-calculus the dynamic meaning of types are expressed by β and η laws. The β laws characterize the main computational force of a type, whereas the η laws characterize a form of extensionality for a type. Therefore, to accomplish our goal in the sequent calculus, we will use an analogous form of β and η laws for defining the dynamic meaning of user-defined (co-)data types, and like in the λ-calculus, the η laws must be typed to be sensible. For example, we may extend the equational theory with the following β law for functions: (β→) 〈µ([x · α].c)||v · e〉 β→ 〈v||µ˜x. 〈µα.c||e〉〉 which matches on the structure of a function call and binds the sub-components to the appropriate (co-)variables. Notice that this rule applies for any function call, v · e, whether or not v or e are (co-)values, so β→ does not depend on any substitution strategy. This works because we avoid performing substitution in the β→ axiom, and instead v and e are put in interaction with input and output abstractions. Since we have already informed the core structural theory about our chosen strategy, we know that the substitutions will be performed in the correct order. Therefore, if we are evaluating our program according to call-by-value, we would have to evaluate v first (via the µS rule if necessary) before substituting for x. Likewise, in call-by-name, we would have to evaluate e first (via the µ˜S rule if necessary) before substituting for α. 153 Next, we have the following η law for functions: (η→) z : A→ B ≺η→ µ([x · α].〈z||x · α〉) which says that an unknown function, z, is equivalent to a trivial case abstraction that matches a function call and forwards it along, unchanged, to z. Here, we use the variable z to stand in for an unknown value, since we are only allowed to substitute values for variables. Note that the more general but strategy-dependent presentation of the η law, which applies to an arbitrary value rather than just a variable, is derivable from the more restrictive η→ law above and the equational theory of substitution in the parametric µµ˜-calculus: V : A→ B =ηµ µγ. 〈V ||γ〉 =ηµ˜ µγ. 〈V ||µ˜z. 〈z||γ〉〉 =η→ µγ. 〈V ||µ˜z. 〈µ([x · α].〈z||x · α〉)||γ〉〉 =µ˜S µγ. 〈µ([x · α].〈V ||x · α〉)||γ〉 =ηµ µ([x · α].〈V ||x · α〉) This has the nice side effect that neither the β→ or η→ rules themselves explicitly mention values or co-values in any way—they are strategy independent.3 Remark 5.3. To make the comparison with previous characterizations of functions in the sequent calculus from Chapter III, we can be more formal about the relationship between λ-abstractions and co-case abstractions over call stacks. In particular, taking the round trip of the mutual syntactic sugar definitions presented in Section 5.2 results in equal (co-)terms: λx.v , µ([x · α].〈v||α〉) , λx.µα. 〈v||α〉 =ηµ λx.v µ([x · α].c) , λx.µα.c , µ([x · α].〈µα.c||α〉) =µS µ([x · α].c) 3It also has the pleasant effect that the side conditions on the free variables of V used to prevent static variable capture automatically come from capture-avoiding substitution in the equational theory. 154 where the application of µS is valid for any S, since co-variables are always co-values. We may also rephrase these β and η axioms for functions into the λ-based syntax: (βλ) 〈λx.v||v′ · e〉 βλ 〈v′||µ˜x. 〈v||e〉〉 (ηλ) z ≺ηλ λx.µα. 〈z||x · α〉 Note that these are mutually derivable from the β→ and η→ axioms according to the syntactic sugar definition for λ-abstractions, along with the µS and ηµ axioms. Thus, the two presentations of functions really are equivalent to one another: we can view a function as a λ-abstraction mapping an input to an output, or as an object that deconstructs an observation in the shape of a call-stack. End remark 5.3. Remark 5.4. Even though we can derive a generalized version of the η→ axiom which applies to values, it is important to note that the η[→] rule would not work if we replaced z with a general term v. The exact same problem occurs in the call-by-value λ-calculus, where we admit non-terminating terms. If we are allowed to η expand any term, then we have the equality: 5 =β (λx.5) (λy.Ω y) =η (λx.5) Ω ≈ Ω where Ω stands in for a term that loops forever. So if we allow η expansion of arbitrary terms in the call-by-value λ-calculus, then a value like 5 is the same thing as a program that loops forever. The solution in the call-by-value λ-calculus is to limit the η rule to only apply to values. It should then be no surprise that the same limitation is necessary for the analogous η→ axiom in the classical sequent calculus, where we can always have the term µ .c that never returns a result just like an infinite loop. End remark 5.4. Similarly, we can explain the behavior of the co-data type for products with an analogous set of β and η axioms. The β& axiom demonstrates how an object of A&B matches on the structure of projection, binding the consumer for its output to the appropriate co-variable: (β&) 〈µ(pi1 [α].c1 | pi2 [β].c2)||pi1 [e]〉 β& 〈µα.c1||e〉 (β&) 〈µ(pi1 [α].c1 | pi2 [β].c2)||pi2 [e]〉 β& 〈µβ.c2||e〉 Again, this rule is safe for any projection pi1[e] or pi2[e] because the underlying co-term e is put in interaction with an output abstraction, so that the substitution is performed only in the correct situation. Likewise, the η& axiom states that an unknown product 155 value z is equivalent to a redundant co-case analysis which forwards its output to z: (η&) z : A&B ≺η& µ(pi1 [α].〈z||pi2 [α]〉 | pi2 [β].〈z||pi2 [β]〉) In other words, the variable z, which stands in for some object of A & B, must be equivalent to an object with the same response to the pi1 and pi2 projections. As before, we have the generalized, strategy-dependent version of the η& as an equality: V : A&B = µ(pi1 [α].〈V ||pi1 [α]〉 | pi2 [β].〈V ||pi2 [β]〉) α, β /∈ FV (V ) which is derivable from η&, ηµ, ηµ˜, and µ˜S , meaning that the only thing that is observable about an object of A & B is its response to observations of the form pi1 [α] and pi2 [β]. The β and η laws for user-defined data types follow a similar, but mirrored, pattern. For example, the β rules for ⊕ is exactly dual to β&, and performs case analysis on the tag of the term without requiring that the sub-term be a value: (β⊕) 〈ι1 (v)||µ˜[ι1 (x).c1 | ι2 (y).c2]〉 β⊕ 〈v||µ˜x.c1〉 (β⊕) 〈ι2 (v)||µ˜[ι1 (x).c1 | ι2 (y).c2]〉 β⊕ 〈v||µ˜y.c2〉 These rules work for any injected terms ι1 (v) or ι2 (v) because they put the sub- term v in interaction with an input abstraction, allowing the equational theory of the underlying structural core take care of managing evaluation order. For example, while this rule is stronger than the one given for the call-by-value half of the dual sequent calculi from Section 3.3, it is still valid according to its Wadler’s (2003) call-by-value continuation-passing style (CPS) transformation. The η rule for ⊕ is also dual to η&, where we expand an unknown co-value γ into a case abstraction: (η⊕) γ : A⊕B ≺η⊕ µ˜[ι1 (x).〈ι1 (x)||γ〉 | ι2 (y).〈ι2 (y)||γ〉] Thus, the only thing that matters for an unknown sum co-value γ is the way that it responds to an input of the form ι1 (x) or ι2 (x). As a final example, consider the β axiom for pairs, which matches on the structure of the pair and binds the sub-terms to the appropriate variables: (β⊗) 〈(v, v′)||µ˜[(x, y).c]〉 β⊗ 〈v||µ˜x. 〈v′||µ˜y.c〉〉 156 (βF) 〈Ki( #»e , #»v )||µ˜[· · · | Ki( #»α , #»x ).ci | · · ·]〉 βF 〈µ #»α . 〈 #»v ||µ˜ #»x .ci〉|| #»e 〉 (βG) 〈µ(· · · | Oi[ #»x , #»α ].ci | · · ·)||Oi[ #»v , #»e ]〉 βG 〈 #»v ||µ˜ #»x . 〈µ #»α .ci|| #»e 〉〉 (ηF) γ : F( #»C ) ≺ηF µ˜ [ # »Ki( #»α , #»x ).〈Ki( #»α , #»x )||γ〉 i ] (ηG) z : G( #»C ) ≺ηG µ ( # »Oi[ #»x , #»α ].〈z||Oi[ #»x , #»α ]〉 i ) FIGURE 5.11. The βη laws for declared data and co-data types. The β⊗ rule follows the intuition that a destructuring binding on the structure of a known pair is the same thing as binding the sub-terms of the pair one at a time. Next, the η axiom for pairs states that a co-variable γ expecting a pair A⊗ B as input is the same as the redundant case abstraction which breaks apart and re-assembles it input before forwarding it to γ: (η⊗) γ : A⊗B ≺η⊗ µ˜[(x, y).〈(x, y)||γ〉] We now look to summarize all the β and η laws considered so far into their general form for user-defined (co-)data types. That way, we can take an arbitrary declaration for a user-defined (co-)data type and automatically generate the appropriate axioms to characterize the run-time behavior of its programs. In particular, given the declarations for a generic data type constructor F and co-data type constructor G in Figure 5.8, we show the corresponding β and η axioms in Figure 5.11. Note that these rules use syntactic sugar for writing a sequence of input and output bindings. That is, given a sequence of terms #»v = v1, . . . , vn and variables #»x = x1, . . . , xn, or a sequence of co-terms #»e = e1, . . . , en and co-variables #»α = α1, . . . , αn, the sequence bindings are defined as: 〈 #»v ||µ˜ #»x .c〉 , 〈v1||µ˜x1. . . . 〈vn||µ˜xn.c〉〉 〈µ #»α .c|| #»e 〉 , 〈µα.1 . . . 〈µαn.c||en〉||e1〉 The type restriction on the η laws are necessary to prevent the associated equational theory from collapsing, similar to the situation in the λ-calculus as discussed in Section 2.2. For example, the nullary case of the η law for co-data gives us x =η µ() =η y which is fine if both x : > and y : >, but is troublesome if x and y stand for some other kind of object like functions or products. 157 (βS) 〈 K( #»E, #»V ) ∣∣∣∣∣∣µ˜[· · · | K( #»α , #»x ).c | · · ·]〉 βS c { # »E/α, # »V/x} (βS) 〈 µ(· · · | O( #»x , #»α ).c | · · ·) ∣∣∣∣∣∣O( #»E, #»V )〉 βS c { # »V/x, # »E/α} (ςS) K( #» E, e′, #»e , #»v ) ςS µα. 〈 µβ. 〈 K( #»E, β, #»e , #»v ) ∣∣∣∣∣∣α〉∣∣∣∣∣∣e′〉 (ςS) K( #» E, #» V , v′, #»v ) ςS µα. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈K( #»E, #»V , y, #»v )∣∣∣∣∣∣α〉〉 (ςS) O( #» V , v′, #»v , #»e ) ςS µ˜x. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈x∣∣∣∣∣∣O( #»V , y, #»v , #»e )〉〉 (ςS) O( #» V , #» E, e′, #»e ) ςS µ˜x. 〈 µβ. 〈 x ∣∣∣∣∣∣O( #»V , #»E, β, #»e )〉∣∣∣∣∣∣e′〉  v′ /∈ ValueS e′ /∈ CoValueS x,y, α, β fresh FIGURE 5.12. The parametric βSςS laws for arbitrary data and co-data. The untyped βς theory of (co-)data Next, we consider an alternative semantics for (co-)data in the sequent calculus which is based on system L’s strategy-dependent β laws from Figure 4.8 and ς laws from Figure 4.11 in Chapter IV. These rules can be generalized to arbitrary (co-)data structures as shown in Figure 5.12. Both the β and ς perform two separate and non- overlapping duties. The ς laws evaluate unevaluated data and co-data structures by lifting out an unevaluated (i.e. non-(co-)value) sub-expression and giving it a name, so that computation can proceed to determine its (co-)value. The β laws perform pattern-matching on fully-evaluated structures built from (co-)values by substituting the contained (co-)values for the corresponding (co-)variables in the matching pattern of a case abstraction. Note that the strategy-dependent β laws in Figure 5.12 are less general than the strategy-independent ones from Figure 5.11, which can pattern-match on any structure, so that they do not accidentally perform the same work of giving names to unevaluated components that would otherwise be done by a ς rule. Also notice that these rewriting rules do not depend on types at all: they function over untyped syntax, letting us evaluate programs without resorting to information about static types. Besides just being meaningful for executing untyped programs, the βς semantics for (co-)data has another advantage over the βη semantics: the βς reduction theory is easily confluent. 158 Definition 5.1 (confluence). A reduction relation →R in the sequent calculus is (strongly) confluent if and only if all divergent reductions c1← R cR c2 join together as c1 R c′← R c2 for some c′, and similarly for (co-)terms. Furthermore, a reduction relation →R in the sequent calculus is locally (or weakly) confluent if and only if all divergent reductions c1 ←R c→R c2 join together as c1 R c′← R c2 for some c′, and similarly for (co-)terms. A well-known consequence of confluence is that, for any (strongly) confluent→R, the equational theory =R is the same thing as convertibility R ← R. That means that in order to determine if two expressions are equal by a confluent theory, we only need to normalize both and compare their normal forms. Unfortunately, even putting issues involving types aside, the combination of the η law with the µµ˜ laws notoriously breaks confluence. For example, if we consider just functions, we have the following critical pair between η→S (which generalizes η→ to values) and µS : µ .c←η→S µ([x · β].〈µ .c||x · β〉)→µS µ([x · β].c) So confluence in the presence of η and µµ˜ is not so straightforward. Contrarily, confluence of the βς theory of (co-)data is straightforwardly confluent when combined with the core µµ˜ theory. Theorem 5.1 (Parametric confluence). The →µS µ˜Sηµηµ˜βSςS reduction relation is confluent for any substitution strategy S such that µS µ˜S is deterministic and the sets ValueS and CoValueS are both forward closed under →µS µ˜Sηµηµ˜βSςS . Proof. By the decreasing diagrams (van Oostrom, 1994) method of confluence. As shorthand, let R = µS µ˜Sηµηµ˜βSςS . Our measure of decreasingness based on increasing depth of the context in which reduction occurs, which finds the context of compatibility lifting a basic R rewrite into→R. First, we define the depth of a reduction c1 →R c2, denoted by depth(c1 →R c2), as the height of the hole in the context C from its root such that c1 = C[c′1], c2 = C[c′2], and c′1 R c′2, and similarly for reduction on (co-)terms. This measure is well-founded (i.e. for any set of reductions, there is a minimal one with no others less than it) because the syntax of commands and (co-)terms are finitely deep. Second, we define the measures of strict decreasingness on reductions, written (c1 →R1 c′1) < (c2 →R2 c′1), as depth(c1 →R1 c′1) > depth(c2 →R2 c′2), and similarly for (co-)term reductions. The goal of the proof is then to show that for every rule R1 159 and R2 of R giving a divergent pair of reductions c1 ←R1 c→R2 c2 (and similarly for (co-)terms), the two ends join back together as c1 R′1 →R′′2 R′ c′← R′ ←R′′1 ← R′2 c2 where →R′′i is zero or one R reductions of the same measure as c →Ri ci, each R′i reduction is less than c→Ri ci, and each R′ reduction is less than either c→R1 c1 or c→R2 c2. We now demonstrate the (strong) confluence of →R by showing that the local confluence diagrams of each diverging pair of →R reductions are all decreasing by the above measure. In the cases where the two diverging reductions are disjoint (i.e. their depths are unordered, so the reductions occur in separate sub-expressions of the overall expression), then they trivially join in one step via compatibility. Otherwise, the two diverging reductions are nested (i.e. their depths are ordered, so that one reduction occurs inside the other or directly on the same expression). In this case, we proceed by cases on rewriting rule used for the outer-most reduction, so the possible nested diverging reductions join back together by decreasing diagrams as follows: – 〈µα.c||E〉 µS c {E/α} has four different possible nested reductions: ∗ If 〈µα.c||E〉 R c′ then c′ = c {E/α}, because the only possibility for R is µ˜S , but µS µ˜S is deterministic by assumption. ∗ If 〈µα.c||E〉 →ηµ 〈v||E〉 because c = 〈v||α〉 and α /∈ FV (v) then c {E/α} = 〈v||E〉 as well, so the two divergent reductions trivially join. ∗ If 〈µα.c||E〉 →R 〈µα.c′||E〉 because c→R c′ then c {E/α} →R c′ {E/α} ≺µS 〈µα.c′||E〉 because →R reduction is closed under substitution. ∗ If 〈µα.c||E〉 →R 〈µα.c||E ′〉 because E →R E ′ then E ′ must be a co-value since co-values are closed under reduction and c {E/α}R c {E ′/α} ≺µS 〈µα.c||E ′〉 160 which is decreasing because depth(c {E/α}R c {E ′/α}) > 0 = depth(〈µα.c||E〉 µS c {E/α}) – 〈V ||µ˜x.c〉 µ˜S c {V/x} is analogous to the previous case by duality. – µα. 〈v||α〉 ηµ v has two different possible nested reductions: ∗ If µα. 〈v||α〉 →µS µα.c {α/β} because v = µβ.c then v =α µα.c {α/β}, so the two divergent reductions trivially join. ∗ If µα. 〈v||α〉 →R µα. 〈v′||α〉 because v →R v′ then v′ ≺ηµ µα. 〈v′||α〉. – µ˜x. 〈x||e〉 ηµ˜ e is analogous to the previous case by duality. – 〈 K( #»E, #»V ) ∣∣∣∣∣∣µ˜[· · · | K( #»α , #»x ).c | · · ·]〉 βS c{ # »E/α, # »V/x} has several possible nested reductions inside the (co-)values #»E, #»V of the data structure or inside the commands . . . c . . . inside the case abstraction, all of which follow similarly to the latter two cases for µS and µ˜S . Otherwise, there are no other nested reductions. – 〈 µ(· · · | O( #»x , #»α ).c | · · ·) ∣∣∣∣∣∣O( #»E, #»V )〉 βS c{ # »V/x, # »E/α} is analogous to the previous case by duality. – K( #»E, e′, #»e , #»v ) ςS µα. 〈 µβ. 〈 K( #»E, β, #»e , #»v ) ∣∣∣∣∣∣α〉∣∣∣∣∣∣e′〉 has the following possible nested reductions: ∗ Any reduction inside #»E , #»e , or #»v trivially joins in one step because (co-)values are closed under reduction. Likewise, any reduction inside e′ which does not convert e′ into a co-value also joins in one step. ∗ If K( #»E, e′, #»e , #»v )→R K( #»E,E ′, #»e , #»v ) because e′ →R E ′ then µα. 〈 µβ. 〈 K( #»E, β, #»e , #»v ) ∣∣∣∣∣∣α〉∣∣∣∣∣∣e′〉→R µα. 〈µβ. 〈K( #»E, β, #»e , #»v )∣∣∣∣∣∣α〉∣∣∣∣∣∣E ′〉 →µS µα. 〈 K( #»E,E ′, #»e , #»v ) ∣∣∣∣∣∣α〉 ηµ K( #» E,E ′, #»e , #»v ) which is decreasing because the first two reductions occur in non-empty contexts (i.e. their depth is greater than 0) and the final reduction occurs in the empty context, so its measure is the same as the ςS reduction. 161 – All three other ςS are similar to the previous case. As special cases, each of the particular substitution strategies we considered in Section 5.1 (except for U) is confluent. Corollary 5.1. The →µS µ˜Sηµηµ˜βSςS reduction relation is confluent for S = V, S = N , and S = LV. Proof. Follows from Theorem 5.1, since each of V ,N , and LV make µµ˜ deterministic and their (co-)values are closed under reduction. Extensionality and lifting Now that we have two competing theories for the dynamic semantics of (co-)data, how do they compare? Do they agree, and give similar results for the same programs? As it turns out, when the restriction of the βς equational theory to typed commands and (co-)terms is derivable from the βη equational theory with help from the µµ˜ core. For example, we have the specific ς rules specialized for the ⊕ connective declared in Figure 5.6: (ς⊕) ι1(v) = µα. 〈v||µ˜x. 〈ι1(x)||α〉〉 (ς⊕) ι2(v) = µα. 〈v||µ˜x. 〈ι2(x)||α〉〉 These rules can be derived by η expansion followed by β reduction: ι1(v) : A⊕B =ηµ µα. 〈ι1(v)||α〉 =η⊕ µα. 〈ι1(v)||µ˜[ι1(x).〈ι1(x)||α〉 | . . .]〉 =β⊕ µα. 〈v||µ˜x. 〈ι1(x)||α〉〉 Notice here that the steps of this derivation are captured exactly by our formulation of β and η axioms: (1) the ability to η expand a co-variable, and (2) the ability to perform β reduction immediately to break apart a structure once the constructor is seen. We also have similar specialized ς rules for functions: (ς→) v · e = µ˜x. 〈v||µ˜y. 〈x||y · e〉〉 (ς→) V · e = µ˜x. 〈µα. 〈x||V · α〉||e〉 162 which are again derivable by a similar procedure of η expansion and β reduction: V · e : A→ B =ηµ˜ µ˜x. 〈x||V · e〉 =η→ µ˜x. 〈µ(y · α.〈x||y · α〉)||V · e〉 =β→ µ˜x. 〈V ||µ˜y. 〈µα. 〈x||y · α〉||e〉〉 =µ˜V µ˜x. 〈µα. 〈x||V · α〉||e〉 v · e : A→ B =ηµ˜ µ˜x. 〈x||v · e〉 =η→ µ˜x. 〈µ(y · α.〈x||y · α〉)||v · e〉 =β→ µ˜x. 〈v||µ˜y. 〈µα. 〈x||y · α〉||e〉〉 =µ˜V µ˜x. 〈v||µ˜y. 〈x||µ˜x. 〈µα. 〈x||y · α〉||e〉〉〉 =ς→ µ˜x. 〈v||µ˜y. 〈x||y · e〉〉 These particular ς axioms for functions are interesting because they were left out of Wadler’s (2003) sequent calculus, however, we now know they were implicitly present in the equational theory (Wadler, 2005) as a consequence of the β and η axioms. This same procedure words for all the definable (co-)data types, so that the βη axioms for the F and G (co-)data type constructors as declared in Figure 5.8 generate the derived ς axioms shown in Figure 5.12. These rules search for the left-most non-value or non-co-value found in a data or co-data structure, and give it a name with an input or output abstraction, which comes from the ordering of bindings implied by the β laws in Figure 5.11. For example, the instance of the derived lift axioms for pair types A⊗B, following the general pattern, are: (ς⊗) (v, v′) = µα. 〈v||µ˜x. 〈(x, v′)||α〉〉 (ς⊗) (V, v′) = µα. 〈v′||µy. 〈(V, y)||α〉〉 Remark 5.5. Notice that all of the strategies we have considered so far follow a particular pattern. More specifically, each of the V, N , and LV strategies fit the following focalizing criteria. Definition 5.2 (Focalizing strategy). A strategy S is focalizing if and only if – (co-)variables are (co-)values (as assumed to hold for all strategies), – structures built from (co-)values are themselves (co-)values (i.e. K( #»E, #»V ) and O[ #»V , #»E ] are (co-)values), and 163 – case abstractions are (co-)values (i.e. µ˜[K( #»α , #»x ).c | . . .] and µ(O[ #»x , #»α ].c | . . .) are (co-)values). These criteria correspond to the impact of focalization on the typing rules for system L from Section 4.4, and further justifies the connection between maintaining focus with the stoup in proof search and values and strictness in languages. Furthermore, it also happens that the non-(co-)values of each of these three strategies are closed under ς-reduction as well. In other words, the ς laws cannot create or destroy (co-)values, but instead only serve to identify and lift out sub-(co-)terms that are out of focus. Thus, these strategies are all focalizing, in that they follow a focalization procedure dynamically at run-time. Besides demonstrating the connection between focalization and evaluation, these criteria give us a general technique for developing strategies. In particular, we can take a core strategy, which covers only the structural core of the sequent calculus, and automatically extend it with data and co-data with a single, generic method. First, close the sets of values and co-values under the above three focalization criteria, so that K( #»E, #»V ), O[ #»V , #»E ], µ˜[K( #»α , #»x ).c | . . .], and µ(O[ #»x , #»α ].c | . . .) are all (co-)values. Second, close the sets of values and co-values under ς expansion, so that if v →ς V and e→ς E then v and e are themselves (co-)values. This generic method let’s us generate the previously known strategies for the parametric sequent calculus. For example, applying this method to the core V, N , and LV strategies from Figures 5.2 and 5.3 gives exactly the extended strategies in Figures 5.9 and 5.10. So the core strategy gives enough information to recover its corresponding focalizing strategy. Furthermore, we already assumed that strategies always consider (co-)variables to be (co-)values. Thus, in the world of focalizing strategies for the parametric sequent calculus, the only crucial decision is what to do with general input and output abstractions; everything else follows from focalization. End remark 5.5. More generally, we can say that the βς equational theory of (co-)data is sound with respect to the βη equational theory, with help from the core µµ˜ theory of substitution. Theorem 5.2 (Soundness of βς w.r.t. βη). For any substitution strategy S: a) If c : ( Γ `G ∆ ) , c′ : ( Γ `G ∆ ) , and c =βSςS c ′, then c =µS µ˜Sηµηµ˜βGηG c′. b) If Γ `G v : A|∆, Γ `G v′ : A|∆, and v =βSςS v′, then v =βGηG v′. 164 c) If Γ|e : A `G ∆, Γ|e′ : A `G ∆, and e =βSςS e′, then e =µS µ˜Sηµηµ˜βGηG e′. Proof. Note that compatibility, reflexivity, symmetry, and transitivity of =βSςS implies the same in =µS µ˜Sηµηµ˜βGηG , so we only need to check that the βSςS rewriting rules can be derived as =µS µ˜Sηµηµ˜βGηG equalities: – βS restricted to a data type F( #» C ) is derived as: 〈 K( #»E, #»V ) ∣∣∣∣∣∣µ˜[· · · | K( #»α , #»x ).c | · · ·]〉 =βF 〈µ #»α . 〈 #»V ∣∣∣∣∣∣µ˜ #»x .c〉∣∣∣∣∣∣ #»E〉 =µS µ˜S c { # » E/α, # » V/x } – βS restricted to a co-data type G( #» C ) is derived analogously to the previous case. – ςS restricted to a data type F( #» C ) is derived inductively on the structure of constructions from right-to-left as: K( #»E, #»V , v′, #»v ) : F( #»C ) =ηµ µα. 〈 K( #»E, #»V , v′, #»v ) ∣∣∣∣∣∣α〉 =ηF µα. 〈 K( #»E, #»V , v′, #»v ) ∣∣∣∣∣∣µ˜[· · · | K( #»β , #»x , y, #»z ).〈K( #»β , #»x , y, #»z )∣∣∣∣∣∣α〉 | · · ·]〉 =βF µα. 〈 µ #» β . 〈 #» V ∣∣∣∣∣∣µ˜ #»x . 〈v′∣∣∣∣∣∣µ˜y. 〈 #»v ∣∣∣∣∣∣µ˜z. 〈K( #»β , #»x , y, #»z )∣∣∣∣∣∣α〉〉〉〉∣∣∣∣∣∣ #»E〉 =µS µα. 〈 #» V ∣∣∣∣∣∣µ˜ #»x . 〈v′∣∣∣∣∣∣µ˜y. 〈 #»v ∣∣∣∣∣∣µ˜z. 〈K( #»E, #»x , y, #»z )∣∣∣∣∣∣α〉〉〉〉 =µ˜S µα. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈 #»v ∣∣∣∣∣∣µ˜z. 〈K( #»E, #»V , y, #»z )∣∣∣∣∣∣α〉〉〉 =ςF µα. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈K( #»E, #»V , y, #»v )∣∣∣∣∣∣α〉〉 K( #»E, e′, #»e , #»v ) : F( #»C ) =ηµ µα. 〈 K( #»E, e′, #»e , #»v ) ∣∣∣∣∣∣α〉 =ηF µα. 〈 K( #»E, e′, #»e , #»v ) ∣∣∣∣∣∣µ˜[· · · | K( #»β , γ, #»δ , #»x ).〈K( #»β , γ, #»δ , #»x )∣∣∣∣∣∣α〉 | · · ·]〉 =βF µα. 〈 µ #» β . 〈 µγ. 〈 µ #» δ . 〈 #»v ∣∣∣∣∣∣µ˜ #»x . 〈K( #»β , γ, #»δ , #»x )∣∣∣∣∣∣α〉〉∣∣∣∣∣∣ #»e 〉∣∣∣∣∣∣e′〉∣∣∣∣∣∣ #»E〉 =µS µα. 〈 µγ. 〈 µ #» δ . 〈 #»v ∣∣∣∣∣∣µ˜ #»x . 〈K( #»E, γ, #»δ , #»x )∣∣∣∣∣∣α〉〉∣∣∣∣∣∣ #»e 〉∣∣∣∣∣∣e′〉 =ςF µα. 〈 µγ. 〈 K( #»E, γ, #»e , #»v ) ∣∣∣∣∣∣α〉∣∣∣∣∣∣e′〉 – ςS restricted to a co-data type G( #» C ) is derived analogously to the previous case. 165 Going the other way, the strategy-independent β law is sound with respect to the strategy-dependent βς rewriting theory, with the help from the core µµ˜S theory of substitution, for any focalizing strategy (Definition 5.2). Theorem 5.3 (Soundness of β w.r.t. βς). For any focalizing strategy S, if c =βG c′, then c =µS µ˜SβSςS c ′, and similarly for (co-)terms. Proof. Note that compatibility, reflexivity, symmetry, and transitivity of =βG implies the same in =µS µ˜SβSςS , so we only need to check that the βG rewriting rules can be derived as =µS µ˜SβSςS equalities: – βS for a data structure is derived as: 〈K( #»e , #»v )||µ˜[· · · | K( #»α , #»x ).c | · · ·]〉 =ςSµS 〈µ #»α . 〈K( #»α , #»v )||µ˜[· · · | K( #»α , #»x ).c | · · ·]〉|| #»e 〉 =ςSµSµS 〈µ #»α . 〈 #»v ||µ˜ #»x . 〈K( #»α , #»x )||µ˜[· · · | K( #»α , #»x ).c | · · ·]〉〉|| #»e 〉 =βS 〈µ #»α . 〈 #»v ||µ˜ #»x .c〉|| #»e 〉 The first two steps follow by applying ςS reduction to name non-(co-)values and applying µS µ˜S to name (co-)values, and then substituting the case abstraction (which must be a co-value because S is focalizing) for the outer µ-abstraction generated by ςS . The last step follows because (co-)variables are (co-)values since S is focalizing. – βS for a co-data structure is derived analogously to the previous case. So equationally speaking, in the presence of the core µµ˜ theory of substitution, typed versions of the βSςS laws can be derived from the typed βGηG laws, and untyped versions of the βG laws can be derived from the untyped βSςS laws. However, the typed ηG law cannot be derived by βSςS , so βGηG equates more typed programs. Combining Strategies in Connectives The parametric µµ˜-calculus provides a general framework for describing all the basic connectives discussed in Section 5.2, giving a mechanism for extending the syntax and semantics of the sequent calculus to account for a wide variety of new structures. However, what about the connectives of polarized system L from Section 4.3 which 166 involved both polarities? Can we include the shifts, negation, and polarized function type into our notion of user-defined data and (co-)data types? Also, what about polarized logic ability to utilize multiple evaluation strategies in a single program? Is there a way to instantiate the parametric equational theory with two strategies at the same time? Or even more than two strategies at once? To answer to all of these questions, let’s look at how the parametric µµ˜-calculus described thus far compares to a polarized languages like system L. In polarized system L, all types are classified by one of two polarities: positive or negative. The distinction between data and co-data determines the polarity of a type, and furthermore the type’s polarity determines the evaluation order used for programs of that type. In polarized system L, data types are positive and describe call-by-value programs, whereas co-data types are negative and describe call-by-name programs. In the parametric µµ˜-calculus, we have stepped outside this regimine, so that programs of data types and co-data types can be evaluated with the strategy of our own choosing. However, we can still allow for this choice of strategy while remaining compatible with polarized logic’s type-based approach to evaluation strategy. In particular, we can still have multiple classifications of types, as a generalization of polarized types, and use the type’s classification to determine which strategy to use for programs of that type. In other words, even though we have decoupled the link between data vs co-data and evaluation order, we can still have the evaluation strategy depend on the type. Separating types into different classifications is not a new idea, and shows up in several type systems in the form of kinds. Effectively, kinds classify types in the same way that types classify terms, i.e. kinds are types “one level up the chain.” Therefore, we will look at extending the parametric µµ˜-calculus with multiple base kinds for classifying (co-)data types of different strategies. For example, if we are interested in both call-by-value (V) and call-by-name (N ) evaluation, then we would have two different base kinds, called V and N , which classify the various types of call-by-value and call-by-name programs, respectively.4 This extension to the language of kinds involves understanding more about which kinds involved in the various connectives: we to know the kinds of types expected as parameters to the connective, as well as 4Here we use the names V andN to mean both a strategy (a set of values, co-values, and evaluation contexts) and a kind (a “type of types”). Even though the two are different things, the clash in naming is meant to make obvious the connection between the kind and the strategy. Both kinds and strategies are used in very different places, so the meaning of V and N can be distinguished from context. 167 the kind of type the connective builds. Thus, we need to be more explicit in our data and co-data declarations in order to specify the link with strategy. For example, let’s suppose we want a wholly call-by-value pair type, corresponding to the polarized version of the positive ⊗ connective. We can make this intent known by adding explicit kind annotations to the declaration of ⊗ from Figure 5.6:5 data (X : V)⊗ (Y : V) : V where ( , ) : X : V , Y : V ` X ⊗ Y : V | Here, we say that the types for both components of the pair belong to kind V, and the resulting pair type itself also belongs to kind V . Because we interpret the kind V as containing the types of programs which should be evaluated according to the V strategy, then this declaration gives us the basic pair type in the call-by-value instance of the parametric equational theory. The main difference here is that we are being explicit about the fact that types A, B, and A⊗B must be call-by-value, and cannot be interpreted by any other evaluation strategy, as opposed to the previous situation where the programs of a (co-)data type could be interpreted by any evaluation strategy of our choice. The impact of these explicit kind annotation on typing is minor: the rules for typing terms and co-terms of type A⊗B are essentially the same as before. The main change is that we need to make sure that types are well-kinded. In particular, we have a new judgement X1 : k1, . . . , Xn : kn `G A : k that says that A is a type of kind k with respect to the assigned kinds of type variables in the typing environment Θ = X1 : k1, . . . , Xn : kn and the declarations in G. Then A ⊗ B is a type of kind V under an typing environment Θ and set of declarations G containing the data declaration of ⊗ when both A and B are as well: Θ `G A : V Θ `G A : V Θ `G A⊗B : V Additionally, we can also describe a wholly call-by-name product type, corresponding to the polarized version of the negative & connective. Making this intent known in the more general setting is done by adding N kind annotations to 5Adding explicit kinds to a data type declaration is not new; it is supported by GHC with the extension “kind signatures.” Rather, the new idea is to have the kind impact the meaning of a term by denoting its evaluation strategy. 168 data (X : V)⊕ (Y : V) : V where ι1 : X : V ` X ⊕ Y : V | ι2 : Y : V ` X ⊕ Y : V | codata (X : N ) & (Y : N ) : N where pi1 : | X & Y : N ` X : N pi2 : | X & Y : N ` Y : N data (X : V)⊗ (Y : V) : V where ( , ) : X : V , Y : V ` X ⊗ Y : V | codata (X : N )` (Y : N ) : N where [ , ] : | X ` Y : N ` X : N , Y : N data 1 : V where () : ` 1 : V | codata⊥ : N where [] : | ⊥ : N ` data 0 : V where codata> : N where FIGURE 5.13. Declarations of the basic single-strategy data and co-data types. the declaration of & from Figure 5.6: codata (X : N ) & (Y : N ) : N where pi1 : | X & Y : N ` X : N pi2 : | X & Y : N ` Y : N Here, we say that the types for both components of the product belong to the kind N , and the resulting product type itself also belongs to kind N . Thus, this declaration forces us to evaluate programs of this type in a way that matches the corresponding interpretation in polarized languages like system L. In general, we can annotate all the basic types of Figure 5.6 to force them into their polarized interpretations, giving the annotated declarations in Figure 5.13. Essentially, this process involves us annotating all data types with the kind V and all co-data types with the kind N , following the assertion that data types describe call-by-value evaluation and co-data types describe call-by-name evaluation. As before, the typing rules for terms and co-terms of type A&B do not change with the addition of kind annotations, we only have an additional rule for the well-kinded uses of the & connective: Θ `G A : N Θ `G B : N Θ `G A&B : N 169 data ↓(X : N ) : V where ↓ : X : N ` ´X : V | codata ↑(X : V) : N where ↑ : | ˆX : N ` X : V data∼(X : N ) : V where ∼ : ` ∼X : V | X : N codata¬(X : V) : N where ¬ : X : V | ¬X : N ` data (X : V)− (Y : N ) : V where · : X : V ` (X − Y ) : V | Y : N codata (X : V)→ (Y : N ) : N where · : X : V | (X → Y ) : N ` Y : N FIGURE 5.14. Declarations of basic mixed-strategy data and co-data types. While annotating the kinds of types involved in (co-)data declarations is relatively straightforward for the single-polarity connectives, the exercise becomes more important when representing polarized connectives that involve both polarities. For example, the polarized function type made non-trivial use of both polarities in its definition, which can be captured by the following annotated co-data declaration: codata (X : V)→ (Y : N ) : N where · : X : V | X → Y : N ` Y : N Intuitively, the source A of the function type must be positive so it belongs to the kind V , and the target B of the function type must be negative so it belongs to kind N . Furthermore, since polarized languages assume that all co-data types themselves are negative, the overall type A→ B belongs to the kind N . This declaration gives us the primordial, Zeilberger’s (2009) polarized function type, with the same impact on evaluation order. Likewise, we can give annotated (co-)data type declarations for other mixed-polarity connectives, like the polarity shifts ´A and ˆA and involutive negations ¬A and ∼A, as shown in Figure 5.14. Thus, kind-annotated (co-)data type declarations give us a syntactic mechanism for summarizing all the simple polarized connectives that we have previously seen. In general, the extension of (co-)data declarations to include multiple base kinds (R, S, T ), along with the necessary kinding restrictions, is given in Figure 5.15. This extension means that we need to keep track of what kind each type variable has, since 6This is just shorthand for a (co-)data declaration of F( # »X : k) : S in G. 170 k ∈ Kind ::= S R,S, T ∈ BaseKind ::= . . . A,B,C ∈ Type ::= X | F( #»A) X, Y, Z ∈ TypeVariable ::= . . . F,G ∈ TypeCon ::= . . . decl ∈ Declaration ::= data F( # »X : k) : Swhere # » K : ( # » A : T ` F( #»X ) | # »B : R ) | codataG( # »X : k) : Swhere # » O : ( # » A : T | G( #»X ) ` # »B : R ) G ∈ GlobalEnv ::= # »decl Θ ∈ TypeEnv ::= # »X : k Γ ∈ InputEnv ::= # »x : A ∆ ∈ OutputEnv ::= # »α : A J,H ∈ Judgement ::= (G ` decl) | (Θ `G A : k) Declaration rules: # » # » X : k `G A : T # » # » X : k `G B : R G ` data F( # »X : k) : Swhere # » K : ( # » A : T ` F( #»X ) | # »B : R ) data # » # » X : k `G A : T # » # » X : k `G B : R G ` codataG( # »X : k) : Swhere # » O : ( # » A : T | F( #»X ) ` # »B : R ) codata Kind rules: Θ, X : k `G X : k TV # »Θ `G C : k (F( # »X : k) : S)6 ∈ G Θ `G F( #»C ) : S FT FIGURE 5.15. Kinds of multi-strategy (co-)data declarations and types. 171 there are now multiple options, necessitating the introduction of type environments X1 : k1, . . . , X2 : k2 denoted by Θ which are analogous to input (Γ) and output (∆) environments at the level of types instead of programs. These type environments Θ are used for checking the kind of a type A as in the first new form of judgement Θ `G A : k which checks that the type A has kind k under the assumption that type variables have the kind listed in Θ given the set of declarations G. Since all the specific types are generated by (co-)data declarations, there are only two inference rules for finding the kind of a type: reference to a type variable (TV ) or an instance of a particular (co-)data type former F( # »X : k) : S from the global set of declarations G. We annotate the types and type variables in (co-)data declarations to make the intension of the declaration explicit in the syntax. This explicit annotation makes it straightforward to check that declarations are well-formed. The second new form of judgement G ` decl checks that the declaration decl is well-formed—meaning that it includes only well-kinded types—given a previously established set of declarations G. To accomodate the generalization to multiple base kinds, we must also update the typing rules for programs of the parametric µµ˜-calculus, as shown in Figure 5.16. For the most part, the change from the single-kinded type system from Figure 5.8 is that we thread the type environment Θ around the rules, as demonstrated by the updated judgement forms c : ( Γ `ΘG ∆ ) , Γ `ΘG v : A | ∆, and Γ | e : A `ΘG ∆. Note that the only substantial update in the typing rules is in the cut rule: Cut now takes an additional premise Θ `G A : S checking that the cut type is indeed a type of some base kind S. This extra premise is needed because, reading the rules bottom-up, the Cut is the only inference rule that invents a new type out of thin air (see Section 3.1). It is therefore prudent to check that this new type actually makes sense under the given type environment Θ and global declarations G. Other than this change to Cut, the other core inference rules (VR, VL, AR, AL) and the logical rules are essentially the same as from Figure 5.8, ignoring Θ. Having outlined the general pattern for mixed-strategy (co-)data types, we can use the declaration mechanism to come up with special-purpose types that might be used in a program. For example, we can represent the use of strictness in Haskell to create lazy data structures with strict fields, like a lazy pair where the first component is strict. We can signify this intent by declaring a different pair type that uses two 172 Judgement ::= c : ( Γ `ΘG ∆ ) | (Γ `ΘG v : A | ∆) | (Γ | e : A `ΘG ∆) Core rules: x : A `ΘG x : A | VR | α : A `ΘG α : A VL c : ( Γ `ΘG α : A,∆ ) Γ `ΘG µα.c : A | ∆ AR c : ( Γ, x : A `ΘG ∆ ) Γ | µ˜x.c : A `ΘG ∆ AL Γ `ΘG v : A | ∆ Θ `G A : S Γ′ | e : A `ΘG ∆′ 〈v||e〉 : ( Γ′,Γ `ΘG ∆′,∆ ) Cut Logical rules: Given data F( # »X : k) : Swhere # » Ki : ( # » Aij : Tijj ` F( #»X ) | # »Bij : Rijj )i ∈ G, we have the rules: # » Γ′j | e : Bij # »{C/X} `ΘG ∆′j j # » Γj | v : Aij # »{C/X} `ΘG ∆j j #»Γj j , #» Γ′j j `ΘG Ki( #»e , #»v ) : F( #» C ) | # »∆j j , # » ∆′j j FRKi # » ci : ( Γ, # » xi : Ai # »{C/X} `ΘG # » αi : Bi # »{C/X} ,∆ )i Γ | µ˜ [ # »Ki( #»αi , #»xi).ci i ] : F( #»C ) `ΘG ∆ FL Given codataG( # »X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # »Bij : Rijj )i ∈ G, we have the rules: # » ci : ( Γ, # » xi : Ai # »{C/X} `ΘG # » αi : Bi # »{C/X} ,∆ )i Γ `ΘG µ ( # »Oi[ #»xi , #»αi ].ci i ) : G( #»C ) | ∆ GR # » Γj | v : Aij # »{C/X} `ΘG ∆j j # » Γ′j | e : Bij # »{C/X} `ΘG ∆′j j #»Γj j , #» Γ′j j | Oi[ #»v , #»e ] : G( #»C ) `ΘG # »∆j j , # » ∆′j j GLOi FIGURE 5.16. Types of multi-strategy (co-)data in the parametric µµ˜ sequent calculus. 173 different kinds, N and V : dataMixedPair(X : V , Y : N ) : N where MPair : X : V , Y : N ` MixedPair(X, Y ) : N | In this declaration, the fact that the type A belongs to kind V denotes that the first component should be evaluated with the call-by-value strategy V , whereas the second component and the pair as a whole should be evaluated with the call-by-name strategy N . We could better reflect such a data type in Haskell with strict fields, by accounting for memoization through the call-by-need strategy, by just replacing N with LV . Remark 5.6. Recall from Section 5.2 that although the η axioms for data and co-data types do not reference the chosen strategy, their expressive power is affected by the substitution principle, which is in turn affected by the choice of values and co-values. In light of this observation, if we were forced to pick only one strategy for all data types and one strategy for all co-data types, it would make sense to pick the strategies that would give us the strongest equational theories. Therefore, if we want to make the η axiom for a data type as strong as possible, we should choose the call-by-value V strategy, since by substitution every co-term of that data type is equivalent to a case abstraction on the structure of the type. Likewise, if we want to make the η axiom for a co-data type as strong as possible, we should choose the call-by-name N strategy, since by substitution every term of that co-data type is equivalent to a co-case abstraction on the co-structure of the type. In this sense, the decision use of polarities (i.e. the data/co-data divide) to determine evaluation strategy is the same as choosing strategies to get the strongest and most universal η principles for every (co-)data type. End remark 5.6. Combining Strategies in Evaluation Now that we are looking at programs with multiple different strategies running around, we need to be able make sure that only terms and co-terms from the same strategy interact with one another. Otherwise, the same fundamental dillema that we were trying to avoid could crop back up again. For example, suppose we have a program using both the call-by-value and call-by-name strategies, V and N , and face the usual problematic command c0 = 〈µ .c1||µ˜ .c2〉. If we interpret the term µ .c1 as call-by-name then it is a value of N , meaning it is a valid instance of µ˜V substitution, 174 and if we interpret the co-term µ˜ .c2 as call-by-value, then it is a co-value of V , meaning it is a valid instance of µE substitution. This puts us back where we started, where c1 =µE c0 =µ˜V c2 due to the conflict in a N -V interaction. Thus, our goal is to be able to instantiate the parametric equational theory with a more complex composite strategy made up of several primitive strategies, and use the kinds of types to make sure that the terms and co-terms agree on which strategy to use in a command. This way, we can understand how to write and run programs that interleave several different evaluation strategies, and be sure that we will still get out the expected result in the end. Recall from Chapter IV that as a way out of the dilemma, Danos et al. (1997) shows that we can use types to disambiguate the expected evaluation order in unclear commands. This procedure follows the assumption that η laws are universal (Graham- Lengrand, 2015): the η law of every (co-)data type applies to arbitrary (co-)terms of the type without restriction. However, that procedure is not directly applicable in the more general setting where the η laws are restricted to (co-)values, since we no longer assume that data types must follow a call-by-value order and co-data types must follow a call-by-name order. However, we still assume that each type, be it data or co-data, must belong to a kind specifying some evaluation order. Thus, we can still use a type-based approach for evaluation, albeit a more general one, by just checking the kind of the principle type of interaction in a command. In this sense, the typed µµ˜ and βη laws can already be generalized to multiple strategies S, giving the typed µ #»S µ˜ #»ηµηµ˜β GηG equational theory for multi-strategy (co-)data types G. Of note, we only need to perform the type-based strategy-lookup during µ or µ˜ substitution: (µ #»S ) 〈µα.c||E〉 µ #»S c {E/α} (E : A,A : S, E ∈ CoValueS) (µ˜ #»S ) 〈V ||µ˜x.c〉 µ˜ #»S c {V/x} (V : A,A : S, V ∈ ValueS) and otherwise restrict the rewriting rules as usual so that both sides have the same type. The type-restricted rules rely on the type associated with (co-)terms in a command to decide on the appropriate strategy for deciding values and co-values, thus fixing an priority between the opposing µ and µ˜ substitution rules. In other words, we can always use typing information to evaluate a multi-strategy program without falling back into the fundamental dilemma of classical computation. 175 As an example, consider an application of the typed β law for the data connective MixedPair as defined previously in Section 5.4. Recall that the typed β laws do not make reference to the chosen strategy in any way, they are only responsible for breaking apart structures. This means that the β rules are completely unaffected by the use of composite strategies. For instance, we may simplify a program using MixedPair in the same way as the call-by-value ⊗: 〈MPair(v, v′)||µ˜[MPair(x : A, y : B).c]〉 →βMixedPair 〈v||µ˜x : A. 〈v′||µ˜y : B.c〉〉 →µ˜N 〈v||µ˜x : A.c {v′/y : B}〉 Notice that as before, the input abstractions take over for determining evaluation order in even with multiple primitive strategies, only now the type of the command comes more directly into play. In this case, we are allowed to substitute v′ for y : B since v′ : B and B : N , which can be found in the implied typing derivation of the command, and ValueN includes every term. However, we must first evaluate v before substituting it for x : A. The implied typing derivation tells us that v : A and A : V, so v can only be substituted by the µ˜V rule if it has the restrictive form of value given by ValueV . But the input abstraction for x : A likewise has the type A : V, so it is already a co-value of CoTermV . However, since we are only interested in determinism, a full typing discipline is overkill for the untyped βς theory of (co-)data. After all, neither the parametric core µS µ˜∫ theory nor the βSςS theory needed to use types to maintain determinism when instantiated with a single strategy S. Therefore, we use a type-agnostic kind system for making sure that all commands are well-kinded. By “type-agnostic,” we mean that we are checking the property v :: S, that is v is a term of some unknown type of kind S. The kind system for the structural core µµ˜-calculus is shown in Figure 5.17, and unremarkably resembles the ordinary type system except at “one level up.” The whole point of the system is shown in the Cut rule that only allows commands between term and co-term of the same kind, whereas (co-)variables have the kind assumed in the environment, and input and output abstractions are generic over the kind of variable they abstract. Furthermore, the additional kinding rules for generic declared (co-)data types is shown in Figure 5.18. The main property that distinguishes this from an ordinary type system is that we “forget” the types, effectively collapsing them down into a single universal type for each kind, similar to a generalized version of Zeilberger’s 176 Γ ∈ InputEnv ::= x1 :: S1, . . . , xn :: Sn ∆ ∈ OutputEnv ::= α1 : S1, . . . , αn : Sn Judgement ::= c :: ( Γ `G ∆ ) | (Γ `G v :: S | ∆) | (Γ | e :: S `G ∆) Core rules: x :: S `G x :: S | VR | α :: A `G α :: S VL c :: ( Γ `G α :: S,∆ ) Γ `G µα.c :: S | ∆ AR c :: ( Γ, x :: S `G ∆ ) Γ | µ˜x.c :: S `G ∆ AL Γ ` v :: S | ∆ Γ′ | e :: S ` ∆′ 〈v||e〉 :: (Γ′,Γ ` ∆′,∆) Cut FIGURE 5.17. Type-agnostic kind system for the core µµ˜ sequent calculus. (2009) “bi-typed” system, where we now allow for as many base kinds as desired. This kind system is a relaxation of the full typing regime, in that all well-typed commands and (co-)terms are well-kinded by demoting the environments x1 : A1, . . . , xn : An and α1 : B1, . . . , αm : Bm to x1 :: T1, . . . , xn :: Tn and α1 :: R1, . . . , αm : Rm, where A1 : T1, . . . , An : Tn and B1 : R1, . . . , Bm : Rm in the given typing environment. Now that we have refined the untyped syntax into the well-kinded sub-syntax, we can build composite strategies that combine multiple primitive ones. Essentially, a composite substitution strategy is one whose values and co-values are further sub- divided into different base kinds. This way, each value in a composite strategy belongs to exactly one kind of term, corresponding to the particular “primitive” strategy that it comes from. Furthermore, to get a full composite evaluation strategy, we also need to compose the evaluation contexts that come from each “primitive” strategy, to get a single set of evaluation contexts that intermingles them all. In general, we can form the composite strategy #»Sii = S1, . . . ,Sn as shown in Figure 5.19. As discussed previously in Remark 5.5, each of the substitution strategies we have considered so far follows a predictable pattern, so we will first just focus on the core of the strategy without (co-)data. For example, combining call-by-value and call-by-name into a single composite strategy is the most straightforward, and is essentially just a disjoint union of the V and N strategies, as shown in Figure 5.20, where the “disjointness” is enforced by the kinding restriction on (co-)values. Note that this combination exactly captures the 177 Given data F( # »X : k) : Swhere # » Ki : ( # » Aij : Tijj ` F( #»X ) | # »Bij : Rijj )i ∈ G, we have: # » Γ′j | e :: Rij `G ∆′j j # » Γj | v :: Tij `G ∆j j #»Γj j , #» Γ′j j `G Ki( #»e , #»v ) :: S | # »∆j j , # » ∆′j j FRKi # » ci :: ( Γ, # »xi :: Tij `G # » αi :: Rij ,∆ )i Γ | µ˜ [ # »Ki( #»αi , #»xi).ci i ] :: S `G ∆ FL Given codataG( # »X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # »Bij : Rijj )i ∈ G, we have: # » ci :: ( Γ, # »xi :: Tij `G # » αi :: Rij ,∆ )i Γ `G µ ( # »Oi[ #»xi , #»αi ].ci i ) :: S | ∆ GR # » Γj | v :: Tij `G ∆j j # » Γ′j | e :: Rij `G ∆′j j #»Γj j , #» Γ′j j | Oi[ #»v , #»e ] :: S `G # »∆j j , # » ∆′j j GLOi FIGURE 5.18. Type-agnostic kind system for multi-kinded (co-)data. V ∈ Value #»Sii ::= VSi :: Si VSi ∈ ValueSi ::= . . . E ∈ CoValue #»Sii ::= ESi :: Si ESi ∈ CoValueSi ::= . . . D ∈ EvalCxt #»Sii ::=  | Di[D] Di ∈ EvalCxtSi ::= . . . FIGURE 5.19. Composite #»S strategy. V ∈ ValueP ::= VV :: V | VN :: N E ∈ CoValueP ::= EV :: V | EN :: N VV ∈ ValueV ::= x EV ∈ CoValueV ::= e VN ∈ ValueN ::= v EN ∈ CoValueN ::= α D ∈ EvalCxtP ::=  | 〈||e :: V〉 | 〈V :: V||〉 | 〈v :: N||〉 | 〈||E :: N〉 FIGURE 5.20. Composite core polarized strategy P = V ,N . 178 V ∈ ValueLV,LN ::= VLV :: LV | VLN :: LN VLV ∈ ValueLV ::= x VLN ∈ ValueLN ::= x | µα.D[〈VLN ||α〉] E ∈ CoValueLV,LN ::= ELV :: LV | ELN :: LV ELV ∈ CoValueLV ::= α | µ˜x.D[〈x||ELV〉] ELN ∈ CoValueLN ::= α D ∈ EvalCxt ::=  | 〈v :: LV||µ˜x.D〉 | 〈µα.D||e :: LN〉 | 〈v :: LV||〉 | 〈||E :: LV〉 | 〈||e :: LN〉 | 〈V :: LN||〉 FIGURE 5.21. Composite core LV and LN strategy. polarized evaluation strategy P for system L in Section 4.3. Combining call-by-need with its dual is a little more involved, since both the LV and LN substitution strategies form “closures” over evaluation contexts that can include delayed (co-)terms that have not yet been evaluated, but whose results should be shared. Thus, to combine these two strategies, we rely on the merged evaluation contexts of the composite strategy, as shown in Figure 5.21. Additionally, all four primitive strategies can be combined into a single composite strategy by taking the disjoint union of the previous two combinations in the expanded composite syntax, so that the V ,N ,LV ,LN strategy is defined as the following sets of (co-)values: ValueV,N ,LV,LN , ValueP ∪ ValueLV,LN CoValueV,N ,LV,LN , CoValueP ∪ CoValueLV,LN EvalCxtV,N ,LV,LN , EvalCxtP ∪ EvalCxtLV,LN Finally, we add (co-)data to all of these composite strategies using the method described in Remark 5.5: we extend every V alueS with all well-kinded co-case abstractions and terms of the form K( #»E, #»V ) in TermS , extend every CoV alueS with all well-kinded case abstractions and co-terms of the form O[ #»V , #»E ] in CoTermS , and close every V alueS and CoV alueS under ς expansion. The main goal in tracking the strategy in the kinds is to continue to avoid the fundamental dilemma of classical computation when mixing strategies. Well- 179 kindedness ensures that we cannot have a command between a term and a co-term following different primitive strategies, so that the kind restriction is enough to determine a consistent strategy for every substitution and avoid the fundamental dillema. For example, it is enough for composite strategies like P or LV ,LN , since it lets us determine the appropriate strategy to use for every substitution, which prevents re-introducing the critical pair between µ and µ˜. Furthermore, well-kindedness is preserved by the untyped reduction theory, so that we only need to begin with a well-kinded command or (co-)term to ensure that every step stays well-kinded. Theorem 5.4 (Kind preservation). For all strategies #»S = S1, . . . ,Sn: a) If c :: ( Γ `G ∆ ) and c→µ #»S µ˜ #»S ηµηµ˜β #»S ς #»S c ′ then c′ : ( Γ `G ∆ ) . b) If Γ `G v :: Si | ∆ and v →µ #»S µ˜ #»S ηµηµ˜β #»S ς #»S v ′ then Γ `G v′ :: Si | ∆. c) If Γ | e :: Si `G ∆ and e→µ #»S µ˜ #»S ηµηµ˜β #»S ς #»S e ′ then Γ | e′ :: Si `G ∆. Proof. By (mutual) induction on the kinding derivations c :: ( Γ `G ∆ ) , Γ `G v :: Si | ∆, and Γ | e :: Si `G ∆. The cases of the compatible closure of the base  rewriting rules follow directly from the inductive hypothesis, and the base cases for the  rewriting rules follows from the fact that well-kindedness is preserved under substitution, i.e. that for any Γ′ `G V :: Si | ∆′ and Γ′ | E :: Si `G ∆′, 1. c :: ( Γ, x :: Si `G ∆ ) implies c {V/x} :: ( Γ,Γ′ `G ∆,∆′ ) and c ::( Γ `G α :: Si,∆ ) implies c {E/α} :: ( Γ,Γ′ `G ∆,∆′ ) , 2. Γ, x :: Si `G v :: Sj | ∆ implies Γ,Γ′ `G v {V/x} :: Sj | ∆,∆′ and Γ `G v :: Sj | α :: Si,∆ implies Γ,Γ′ `G v {E/α} :: Sj | ∆,∆′, and 3. Γ, x :: Si | e :: Sj `G ∆ implies Γ,Γ′ | e {V/x} :: Sj `G ∆,∆′ and Γ | e :: Sj `G α :: Si,∆ implies Γ,Γ′ | e {E/α} :: Sj `G ∆,∆′, and each of which follows by induction on the kinding derivations for c, v, and e. This means that we can safely compute the result of any untyped command or (co-)term so long as it is well-kinded to begin with. Returning to the MixedPair example, if we begin with the well-kinded command 〈MPair(V, V ′)||µ˜[MPair(x, y).c]〉, 180 then we know that V :: V and V ′ :: N , so V cannot be an output abstraction but V ′ can be due to the kinded definition of ValueP . This gives us the reduction 〈MPair(V, V ′)||µ˜[MPair(x, y).c]〉 →βP c {V/x, V ′/y} which induces the combined substitution of the V-value V and N -value V ′, resulting in the command c {V/x, V ′/y} of the same kind that we started with. Duality of Connectives and Evaluation Having laid out a general system for both data and co-data and with the possibility of intermingling multiple evaluation strategies, we now rephrase the duality of the sequent calculus. In particular, given any instance of the parametric µµ˜-calculus, we are able to automatically generate its dual instance, such that the two are isomorphic to one another by the involutive duality operation. Additionally, particular application of the duality-generating operation recapitulates the previous results of duality in the sequent calculus, giving a single setting for summarizing the study of computational duality. Effectively, duality applies in both the static world of types as well as the dynamic world of programs. In types, duality expresses the opposing purpose of assumption and conclusion on the two sides of a sequent. In programs, duality expresses the opposing purpose of production and consumption on the two sides of a command. Thus, the entailment (`) of a sequent and the dividing line of a command provide the fundamental pole about which opposing entities turn in their dance of duality. The main difference from before is that we now have many sources for names that must be dualized. Types and programs in the parametric sequent calculus contain a variety of names—free variables and co-variables, constructors and observers, and connectives for data and co-data types. These names are arbitrary identifiers which ultimately do not impact the meaning of types or programs. However, to examine duality we must relate pairs of these arbitrary names. Therefore, we build our duality on a given relationship between dual names, written as an overline. Recall that in both the dual calculi (Chapter III) and system L (Chapter IV), duality swaps variables with co-variables, and vice versa. Formally, this is represented by an assumed bijection, x and α, between the two dual variable sets. But in the parametric µµ˜-calculus, (co-)variables aren’t the only names we must think about; we also have to do something about the names of connectives (F) as well as the names of constructors and observers (K and 181 O). Therefore, we also assume a bijection between constructors and observers, K and O, as well as a bijection between connective names, F. Additionally, for multi-kinded programs we need a bijection S between the names for base kinds. As shorthand, we may use the dual identifier relation ∼ which identifies the chosen duals to the various bijections, so that x ∼ α means x = α and α = x and so on for the other namespaces. With the bijections between names at hand, we first consider the duality of types as shown in Figure 5.22. As before, the static aspect of this duality is exactly the usual form of logical duality of the sequent calculus, where the input environment, Γ, is swapped with the output environment, ∆, in a sequent. Duality of the environments is defined pointwise, so for every variable x : A we associate a dual co-variable denoted x : A⊥, and likewise every co-variable α : A is associated with a dual variable denoted α : A⊥. Duality of the kinding environments from Figure 5.15 for multi-kinded programs is similar, except that instead of types we have base kinds, and the dual of S is S. In sequents, terms swap places with co-terms and vice versa. For example, the dual of a closed term, `G v : A | or `G v :: S | , is a closed co-term, | v⊥ : A⊥ `G⊥ or | v⊥ :: S⊥ `G⊥ . Going the other way, a type derivation of a closed co-term, | e : A `G or | e :: S `G , is dualized as a type derivation of a closed term, `G⊥ e⊥ : A⊥ | or `G⊥ e⊥ :: S⊥ | . Commands, which sit outside of the sequent, stay in place and instead describe the dynamic aspect of dualization inside a program. Each data type declaration is dual to a co-data type declaration, and vice versa. On the one hand, the constructors, Ki, of data type declaration become the observers of the dual co-data type declaration, denoted Ki. On the other hand, the observers, Oi, of a co-data type declaration become the constructors of the dual data type declaration, denoted O⊥i . Furthermore, the sequents describing each constructor are also reversed by the duality operation on sequents, similar to the action of duality on typing judgements. For example, Figure 5.6 shows several dual data and co-data declarations side-by-side, so that set of declarations is self-dual, under the following dualization relationship for names: ⊕ ∼ & ι1 ∼ pi1 ι2 ∼ pi2 ⊗ ∼ ` ( , ) ∼ [ , ] 1 ∼ ⊥ () ∼ [] 0 ∼ > − ∼→ · ∼ · 182 Duality of environments: ( # »X : k)⊥ , # » X : k⊥ ( # »decl)⊥ , # » decl⊥ ( # »x : A)⊥ , # » x : A⊥ ( # »x :: S )⊥ , # »x :: S ( # »α : A)⊥ , # » α : A⊥ ( # »α :: S )⊥ , # »α1 :: S1 Duality of sequents:( c : ( Γ `ΘG ∆ ))⊥ , c⊥ : ( ∆⊥ `Θ⊥G⊥ Γ⊥ ) ( c :: ( Γ `G ∆ ))⊥ , c⊥ :: ( ∆⊥ `G⊥ Γ⊥ ) ( Γ `ΘG v : A | ∆ )⊥ , ∆⊥ | v⊥ : A⊥ `Θ⊥G⊥ Γ⊥ ( Γ `G v :: S | ∆ )⊥ , ∆⊥ | v⊥ :: S `G⊥ Γ⊥( Γ | e : A `ΘG ∆ )⊥ , ∆⊥ `Θ⊥G⊥ e⊥ : A⊥ | Γ⊥ ( Γ | e :: S `G ∆ )⊥ , ∆⊥ `Θ⊥G⊥ e⊥ :: S | Γ⊥ Duality of declarations:  data F( # »X : k) : Swhere K1 : # » A1 : T1 ` F( #»X ) | # »B1 : R1 . . . Kn : # » An : Tn ` F( #»X ) | # »Bn : Rn  ⊥ , codata F( # » X : k⊥) : Swhere K1 : # » B⊥1 : R1 | F( #» X ) ` # »A⊥1 : T1 . . . K⊥n : # » B⊥n : Rn | F( #» X ) ` # »A⊥n : Tn codataG( # » X : k⊥) : Swhere O1 : # » A1 : T1 | G( #»X ) ` # »B1 : R1 . . . On : # » An : Tn | G( #»X ) ` # »Bn : Rn  ⊥ , dataG( # » X : k⊥) : Swhere O1 : # » B⊥1 : R1 ` G( #» X ) | # »A⊥1 : T1 . . . On : # » B⊥n : Rn ` G( #» X ) | # »A⊥n : Tn Duality of types and kinds: S⊥ , S X⊥ , X F( #»A)⊥ , F( # »A⊥) G( #»A)⊥ , G( # »A⊥) FIGURE 5.22. The duality of types of the parametric µµ˜-calculus. 183 Duality of the core calculus: 〈v||e〉⊥ , 〈 v⊥ ∣∣∣∣∣∣e⊥〉 x⊥ , x (µα.c)⊥ , µ˜α.c⊥ α⊥ , α [µ˜x.c]⊥ , µx.c⊥ Duality of data and co-data: K( #»e , #»v )⊥ , K[ # » e⊥ , # » v⊥ ] µ˜[K( #»α , #»x ).c | . . .]⊥ , µ ( K[ #»α , #»x ].c⊥ | · · · ) O[ #»v , #»e ]⊥ , O( # » v⊥ , # » e⊥) µ(O[ #»x , #»α [.c | . . .)⊥ , µ˜ [ O( #»x , #»α ).c⊥ | · · · ] FIGURE 5.23. The duality of programs of the parametric µµ˜-calculus. ∼ ∼ ¬ ∼ ∼ ¬ The duality between types is defined inductively on the structure of the types, such that all data connectives F are replaced with their dual co-data connectives F, as described above, and vice versa. Next, we move on to consider the effect of duality on programs as shown in Figure 5.23. In the core of the µµ˜-calculus, every command 〈v||e〉 is dual to another command representing the flipped version of itself 〈 v⊥ ∣∣∣∣∣∣e⊥〉, variables are dual to co-variables, and input abstractions and output abstractions are dual to one another. On the constructive side of data and co-data, every data structure K( #»e , #»v ) dual is a co-data observation K[ # » e⊥ , # » v⊥ ] and every co-data observation O[ #»v , #»e ] is dual to a data structure O( # » v⊥ , # » e⊥). On the destructive side of data and co-data, every case analysis on a data structure is dual to a co-data object, and every co-data object is dual to a case analysis on a data structure. Example 5.1. Let’s consider how to swap the results of a product: swapx , µ(pi1[α].〈x||pi2[α]〉 | pi2[β].〈x||pi1[β]〉) swapx,γ , 〈swapx||γ〉 , 〈µ(pi1[α].〈x||pi2[α]〉 | pi2[β].〈x||pi1[β]〉)||γ〉 Given that x stands for a value of B & A, then swapx is a term of A & B such that whenever we ask for the pi1 of swapx we get the pi2 of x, and whenever we ask for the 184 pi2 of swapx, we get the pi1 of x. The command swapx,γ then represents a program that sends the request γ : A&B to the swapped product swapx. The duality of the sequent calculus lets us turn this program around, so that we are calculating with data instead of co-data. First, we need to specify how names are treated in order to generate the dualized program. For the connectives and (co-)constructors, we use the naming convention relating products (&) and sums (⊕) ⊕ ∼ & ι1 ∼ pi1 ι2 ∼ pi2 along with the following bijection between the variables and co-variables involved: x ∼ α′ x′ ∼ α y′ ∼ β z′ ∼ γ What we get out from duality is then a program that swaps an injection: swap⊥x , µ˜[ι1(x′).〈ι2(x′)||α′〉 | ι2(y′).〈ι1(y′)||α′〉] swap⊥x,γ , 〈 z′ ∣∣∣∣∣∣swap⊥x 〉 , 〈z′||µ˜[ι1(x′).〈ι2(x′)||α′〉 | ι2(y′).〈ι1(y′)||α′〉]〉 In particular, z′ stands for a value of type A⊥ ⊕ B⊥, and α′ stands for a co-value of type B⊥ ⊕ A⊥. The co-term swap⊥x consumes an input of type A⊥ ⊕ B⊥ and swaps the injection tag, turning ι1(x′) into ι2(x′) or turning ι2(y′) into ι1(y′), in order to pass a value of type B⊥ ⊕ A⊥ along to α′. The whole command swap⊥x,γ then feeds z′ into the consumer v⊥0 . Notice how even though the roles of input and output have been exchanged by the duality operation, so that requests become results, the overall structure in the dual program follows the same pattern as before. End example 5.1. The final piece to the puzzle is to determine the effect of duality on the strategy parameter(s) to the parametric µµ˜-calculus. Fortunately, this duality is straightforward, since the strategy S is just a set of terms and co-terms (the (co-)values of the substitution strategy component of S) and contexts (the evaluation contexts of S). Thus, this final duality is achieved by applying the defined duality operation pointwise. Given a substitution strategy S whose values are given by the set ValueS and co-values are given by the set CoValueS , then we can automatically generate the dual substitution strategy S⊥ by swapping values with co-values, so that the values, 185 ValueS⊥ , and co-values, CoValueS⊥ , of S⊥ are defined as: ValueS⊥ , {E⊥ | E ∈ CoValueS} CoValueS⊥ , {V ⊥ | V ∈ ValueS} Additionally, for a full evaluation strategy S, we can automatically generate the dual evaluation strategy by dualizing the substitution strategy component of S as well as its evaluation contexts EvalCxtS as follows: EvalCxtS⊥ , {D⊥ | D ∈ EvalCxtS} where the duality operation is generalized to contexts in the obvious way by taking ⊥ = . For example, dualizing the call-by-value strategy V generates the call-by- name strategy N and vice versa, and similarly for the call-by-need strategy LV and its dual: V⊥ = N N⊥ = V LV⊥ = LN LN⊥ = LV Also, the unrestricted strategy U is self-dual, so that U⊥ = U . With all the dualities in place, we can now verify that the duality operation satisfies the properties we would expect. Firstly, the duality operation is involutive at all levels, so that the double-dual is an identity operation for any chosen bijection between dual namespaces. Theorem 5.5 (Involutive duality). The ⊥ operation on environments, sequents, declarations, types, commands, and (co-)terms is involutive, so that ⊥⊥ is the identity transformation. Proof. By (mutual) induction on the definition of the duality operation ⊥, where each case follows immediately by the inductive hypothesis. Secondly, the duality operation respects the static semantics of the parametric µµ˜-calculus, so that typing of commands and (co-)terms is preserved. Theorem 5.6 (Static duality). If the typing judgement J (from Figures 5.8, 5.15, 5.16, 5.17 and 5.18) is derivable then J⊥ is. 186 Proof. By induction on the derivation of J , where in each case we must show that for some conclusion J ′, premises H1, . . . , Hn, and inference rule I, the derivation of H1 . . . Hn J ′ I and the inductive hypothesized derivations of H⊥1 , . . . , H⊥n implies the derivation of H⊥1 . . . H ⊥ n J ′⊥ I⊥ where I⊥ is the dual inference rule to I, which we define as follows for both the type and kind system for programs VR⊥ , VL VL⊥ , VR AR⊥ , AL AL⊥ , AR Cut⊥ , Cut FR⊥K , FLK GL⊥O , GRO FL⊥ , FR GR⊥ , GL WR⊥ ,WL WL⊥ ,WR CR⊥ , CL CL⊥ , CR XR⊥ , XL XL⊥ , XR and the kind system for types data⊥ , codata codata⊥ , data TV⊥ , TV FT⊥ , FT The cases for the left and right rules of (co-)data (FR and FL) follow from the inductive hypotheses and the fact that substitution of types commutes with duality (A⊥ { B⊥/X } =α A {B/X}⊥), which is guaranteed because the duality operation is compositional and hygienic (Downen & Ariola, 2014a). The rest of the cases follow immediately from the inductive hypotheses. Thirdly, the duality operation respects the dynamic aspect of the parametric µµ˜- calculus, so that it preserves the rewriting rules between commands and (co-)terms. Theorem 5.7 (Equational duality). For any (possibly composite) strategy S and set of declarations G, a) if c RGS c ′ then c⊥  RG⊥S⊥ c′⊥, b) if v RGS v ′ then v⊥  RG⊥S⊥ v′⊥, and 187 c) if e RGS e ′ then e⊥  RG⊥S⊥ e′⊥, whenever RGS = µS µ˜Sηµηµ˜, RGS = βGηG, or RGS = βSςS . Proof. By cases on each possible rewriting rules, using the more specific fact that a) if c R c′ then c⊥ R⊥ c′⊥, b) if v R v′ then v⊥ R⊥ v′⊥, and c) if e R e′ then e⊥ R⊥ e′⊥, where the dual of each rewriting rule R is defined as follows: µS⊥ , µ˜S⊥ µ˜⊥S , µS⊥ ηµ⊥ , ηµ˜ ηµ˜⊥ , ηµ βG ⊥ , βG⊥ ηG⊥ , ηG⊥ βS ⊥ , βS⊥ ςS⊥ , ςS⊥ Each case follows by the definition of the rewriting rules, the definition of the duality operation on strategies S and declarations G, and the fact that substitution commutes with the duality operation (that c⊥ { V ⊥/x } =α (c {V/x})⊥, c⊥ { E⊥/α } =α (c {E/α})⊥, and similarly for (co-)terms) which is guaranteed by the fact that the duality operation is compositional and hygienic (Downen & Ariola, 2014a). Remark 5.7. Note that the duality operation discussed here does not just compare two existing languages, as in previous work on computational duality (Curien & Herbelin, 2000; Wadler, 2003), but it actively generates the dual language to any instance of the parametric sequent calculus. Thus, we can use this operation to create the dual to any strategy of our choice. For example, applying the duality operation to the call-by-need strategy LV from Figures 5.3 and 5.10 generates the dual to call-by-need evaluation from Figures 5.4 and 5.10. Intuitively, the dual of call-by-need delays computation of consumers and prioritizes producers. Then we switch attention to a consumer only when we have a value to return to it. However, we do not copy complex consumers, like the way control operators in Scheme-like languages copy arbitrary call-stacks. Rather, we memoize such call-stacks, so that control operations cannot duplicate extra work inside of a continuation. And in fact, this is essentially how the “lazy call-by-name” evaluation strategy was developed by Ariola et al. (2011). The parametric µµ˜-calculus generalizes the procedure to any starting evaluation strategy. End remark 5.7. 188 A (De-)Construction of the Dual Calculi We have now seen a general language of the sequent calculus for studying a wide variety of types. Each type is characterized by two actions: building up a structure by construction, and analyzing the shape of a structure by deconstruction. The types are primarily categorized by the way they orient these actions along the producer-consumer protocol: data types produce via construction and consume via deconstruction, and co-data types produce via deconstruction and consume via construction. This viewpoint aligns neatly with system L from Chapter IV. In fact, polarized system L corresponds exactly to the P instance of the parametric µµ˜-calculus with the (co-)data type declarations from Figures 5.13 and 5.14. It also aligns with the treatment of functions and polymorphism in the dual calculi: implication and the universal quantifier are both co-data types (in both call-by-value and call-by-name) that have constructed call-stacks and deconstructive λ- and Λ−abstractions, whereas the existential quantifier is a data type with constructed packages and deconstructive Λ˜-abstractions. However, the rest of the types in the dual calculi do not seem to follow this pattern: both the terms and co-terms of every type appear to be constructed, with no deconstructive pattern-matching to be found. As it turns out, however, even the dual calculi’s construction-oriented sequent calculus still follows the construction-deconstruction discipline, albeit indirectly. More formally, the simply-typed sub-language of the dual calculi (i.e. without the quantifiers) and the appropriate instances of the parametric µµ˜-calculus are in equational correspondence (Sabry & Felleisen, 1992) with one another. This means that every command and (co-)term of the dual calculi can be translated to the µµ˜-calculus, and vice versa, such that the two translations are inverses of each other, up to the equational theory, and the equations of each calculus are preserved by translation. In other words, the dual calculi can be seen as syntactic sugar by macro-expansion for a particular use-case of the µµ˜-calculus. Since the dual calculi really stands for a pair of two separate but dual sequent calculi—one for call-by-value and one for call-by-name—we need two translations into two different instances of the parametric µµ˜-calculus. Because we have several representations of conjunction, disjunction, and negation as (co-)data types in the µµ˜-calculus, as shown in Figure 5.6, our task requires us to determine which particular types correspond to the dual calculi’s characterization in both call-by-value and call-by- name. Furthermore, since we aim to achieve an equational correspondence, our choice 189 of (co-)data types must respect both the computational (β rules) and extensional (η rules) aspects of the types found in the dual sequent calculi. First, let’s focus on the call-by-value half of the dual calculi. To represent call-by- value conjunction, we will use the A⊗B data type. On the one hand, the terms for conjunction, (v1, v2), translate directly to a constructed pair of A⊗B. On the other hand, the co-terms for conjunction, pi1[e] and pi2[e], need to be expressed as the basic deconstructions on an input of type A⊗B which extract one component of a pair: pi1[e] ≈ µ˜[(x, ).〈x||e〉] pi2[e] ≈ µ˜[( , y).〈y||e〉] The representation of call-by-value disjunction is similar, for which we use the A⊕B data type. On the one hand, the terms for disjunction, ι1(v) and ι2(v), translate directly to the constructed values of the sum type A ⊕ B. On the other hand, the co-terms for disjunction, [e1, e2], need to be expressed as the basic deconstruction on an input of type A⊕B which checks which of the two constructors was used: [e1, e2] ≈ µ˜[ι1(x).〈x||e1〉 | ι2(y).〈y||e2〉] Finally, we represent the call-by-value negation with the function-like co-data type ¬A. On the one hand, the terms for negation, not(e), need to be expressed as the basic deconstruction on an output of type ¬A: not(e) ≈ µ(¬ [x].〈x||e〉) On the other hand, the co-terms for negation, not[v], translate directly to the constructed co-values of the type ¬A. Intuitively, the role of negation in the call- by-value half of the dual calculi is to represent functions from the call-by-value λ- calculus, as used by Wadler’s (2003) call-by-value encoding. Thus, we choose the form of negation that most resembles functions: the values of ¬A are function-like abstractions that accept an input but do not return a result. Having seen how to embed the call-by-value half of the dual calculi into the µµ˜⊕,⊗,¬,→V -calculus, we also need to translate back. As before, the constructed terms of A⊗B and A⊕B, as well as the constructed co-terms of ¬A, translate directly. The only interesting part of the translation is in encoding deconstruction as the constructive 190 forms. Translating the deconstructive terms of ¬A is straightforward, and only requires us to place a generic input abstraction inside of the negation constructor: µ(¬ [x].c) ≈ not(µ˜x.c) Likewise, translating the deconstructive co-terms of A ⊕ B requires us to form a co-term pair of two generic input abstractions: µ˜[ι1(x).c1 | ι2(y).c2] ≈ [µ˜x.c1, µ˜y.c2] Translating a deconstructive co-term of A⊗B is the most involved, since it requires us to copy its input in order to extract both the first and second components one at a time. This can be achieved by naming its input with an input abstraction, and using both pi1 and pi2 on it: µ˜[(x, y).c] ≈ µ˜z. 〈z||pi1[µ˜x. 〈z||pi2[µ˜y.c]〉]〉 The full translation between call-by-value half of the dual calculi and the µµ˜⊕,⊗,¬,→V instance of the parametric sequent calculus is shown in Figure 5.24. Second, let’s consider the call-by-name half of the dual calculi. Contrary to the call- by-value case, we will choose the opposite representations of conjunction, disjunction and negation from the (co-)data types listed in Figure 5.6: conjunction is A & B, disjunction is A`B, and negation is ∼A. Likewise, the translations follow an opposite story as before: the co-terms of A&B and A`B and terms of ∼A translate directly, whereas the terms of A&B and A`B and co-terms of ∼A require more work. The disjunctive and conjunctive terms for the call-by-name calculus are translated as: ι1(v) ≈ µ([α, ].〈α||v〉) ι2(v) ≈ µ([ , β].〈β||v〉) (v1, v2) ≈ µ(pi1[α].〈v1||α〉 | pi2[β].〈v2||β〉) and the negative co-terms for the call-by-name calculus are translated as: not[v] ≈ µ˜[∼ (α).〈v||α〉] 191 〈v||e〉?v , 〈v?v ||e?v〉 x?v , x (µα.c)?v , µα.c?v ιi(v)?v , ιi(v?v) (v1, v2)?v , (v1?v, v2?v) not(e)?v , µ(¬ [x].〈x||e?v〉) (λx.v)?v , µ([x · β].〈v?v ||β〉) α?v , α [µ˜x.c]?v , µ˜x.c?v pii[e]?v , µ˜[(x1, x2).〈xi||e?v〉] [e1, e2]?v , µ˜[ι1(x).〈x||e1?v〉 | ι2(y).〈y||e2?v〉] not[v]?v , ¬ [v?v ] [v · e]?v , v?v · e?v 〈v||e〉v? , 〈vv? ||ev?〉 xv? , x (µα.c)v? , µα.cv? ιi(v)v? , ιi(vv?) (v1, v2)v? , (v1v?, v2v?) µ(¬ [x].c)v? , not(µ˜x.cv?) µ([x · β].c)v? , λx.µβ.cv? αv? , α [µ˜x.c]v? , µ˜x.cv? µ˜[ι1(x).c1 | ι2(y).c2]v? , [µ˜x.c1v?, µ˜y.c2v?] µ˜[(x, y).c]v? , µ˜z. 〈z||pi1[µ˜x. 〈z||pi2[µ˜y.cv?]〉]〉 ¬ [v]v? , not[vv? ] [v · e]v? , vv? · ev? FIGURE 5.24. Translation between the call-by-value half of the simply-typed dual calculi and µµ˜⊕,⊗,¬,→V . 192 Going the other way, the deconstructive co-term of ∼A is translated as a negated output abstraction: µ˜[∼ (α).c] ≈ not[µα.c] and the deconstructive term of A&B is translated as a pair of output abstractions: µ(pi1[α].c1 | pi2[β].c2) ≈ (µα.c1, µβ.c2) As before with call-by-value conjunction, translating terms of call-by-name disjunction is more involved in the dual way, requiring us to copy the output in order to extract both components one at a time. This can be achieved by naming its output with an output abstraction and using both ι1 and ι2 on it: µ([α, β].c) ≈ µγ. 〈ι1(µα. 〈ι2(µβ.c)||γ〉)||γ〉 The full translation between the call-by-name half of the dual calculi and the µµ˜&,`,∼,→N instance of the parametric sequent calculus is shown in Figure 5.25. With the full translations to and from the dual calculi and instances of the parametric µµ˜-calculus, we have a correspondence between the dual calculi respecting their equational theories. In particular, the βη theory of (co-)data in the parametric µµ˜-calculus corresponds to an appropriate βης theory for the dual calculi. To that point, we need to extend the dual calculi with η laws as well as with additional values in call-by-value and additional co-values in call-by-name as shown in Figure 5.26, which is based on the semantics for the dual calculi by Wadler (2005). The extra values in V ′ extend those in V to say that the result of projecting out of a value is itself a value, which makes intuitive sense by the meaning of call-by-value. The extra co-values in N ′ extend those in N to say that forcing a tagged injection also forces its payload, which may not be so obvious intuitively, but is still semantically sound by the interpretation of sum types in the call-by-name dual calculus. With this extension to the dual calculi, we get an equational correspondence. Theorem 5.8. – The call-by-value half of the simply-typed dual calculi is in equational correspondence with the µµ˜⊕,⊗,¬,→V -calculus. – The call-by-name half of the simply-typed dual calculi is in equational correspondence with the µµ˜&,`,∼,→N -calculus. 193 〈v||e〉?n , 〈v?n||e?n〉 x?n , x (µα.c)?n , µα.c?n ιi(v)?n , µ([α1, α2].〈v||αi〉) (v1, v2)?n , µ(pi1[α].〈v1||α〉 | pi2[β].〈v2||β〉) not(e)?n , ∼ (e?n) (λx.v)?n , µ([x · β].〈v?v ||β〉) α?n , α [µ˜x.c]?n , µ˜x.c?n pii[e]?n , pii[e?n] [e1, e2]?n , [e1?n, e2?n] not[v]?n , µ˜[∼ (α).〈v||α〉] [v · e]?n , v?n · e?n 〈v||e〉n? , 〈vn? ||en? 〉 xn? , x (µα.c)n? , µα.cn? µ(pi1[α].c1 | pi2[β].c2)n? , (µα.c1n? , µβ.c2n? ) µ([α, β].c)n? , µγ. 〈ι1(µα. 〈ι2(µβ.cn? )||γ〉)||γ〉 ∼ (e)n? , not(en? ) µ([x · β].c)n? , λx.µβ.cn? αn? , α [µ˜x.c]n? , µ˜x.cn? pii[e]n? , pii[en? ] [e1, e2]n? , [e1n? , e2n? ] µ˜[∼ (α).c]n? , not[µ˜x.cn? ] [v · e]n? , vn? · en? FIGURE 5.25. Translation between the call-by-name half of the simply-typed dual calculi and µµ˜&,`,∼,→N . Call-by-value extended values (V ′): V ∈ ValueV ′ ::= . . . | µα. 〈V ||pi1 [α]〉 | µα. 〈V ||pi2 [α]〉 Call-by-name extended co-values (N ′): E ∈ CoValueN ′ ::= . . . | µ˜x. 〈ι1 (x)||E〉 | µ˜x. 〈ι2 (x)||E〉 η laws for both call-by-value (S = V ′) and call-by-name (S = N ′): (η×S ) V : A×B ≺η×S (µα. 〈V ||pi1 [α]〉, µβ. 〈V ||pi2 [β]〉) (α, β /∈ FV (V )) (η⊕S ) E : A⊕B ≺η⊕S [µ˜x. 〈ι1 (x)||E〉, µ˜y. 〈ι2 (y)||E〉] (x, y /∈ FV (E)) (η¬S ) V : ¬A ≺η¬S not(µ˜x. 〈V ||not[x]〉) (x /∈ FV (V )) (η→S ) V : A→ B ≺η→S λx.µβ. 〈V ||x · β〉 (x, β /∈ FV (V )) FIGURE 5.26. The η laws for the dual calculi and extended (co-)values (V ′,N ′). 194 Proof. a) To demonstrate the call-by-value equational correspondence, we must prove the following conditions (1) The translations ( )?v and ( ) v ? are inverses up to the respective equational theories of the two calculi: cv??v = c in the µµ˜ ⊕,⊗,¬,→ V -calculus and c?vv? = c in the call-by-value dual calculus, and similarly for (co-)terms. (2) The two equational theories are sound under translation with respect to each other: c = c′ in the µµ˜⊕,⊗,¬,→V -calculus implies cv? = c′ v ? in the call-by- value dual calculus and c = c′ in the call-by-value dual calculus implies c?v = c′ ? v in the µµ˜ ⊕,⊗,¬,→ V -calculus, and similarly for (co-)terms. The inversion of the translation follows by induction on the syntax of both languages. In each direction, the round-trip translation of the core µµ˜ sublanguage (commands, (co-)variables, and µ- and µ˜-abstractions), as well as the round-trip translation of injections, pairs, negation co-terms, and call stacks, follows directly by the inductive hypothesis. The other cases for the round-trip translation of the µµ˜⊕,⊗,¬,→V -calculus are: µ(¬ [x].c)v??v =IH µ(¬ [x].〈x||µ˜x.c〉) =µ˜V µ(¬ [x].c) µ([x · β].c)v??v =IH µ([x · β].〈µβ.c||β〉) =µV µ([x · β].c) µ˜[ι1 (x).c1 | ι2 (y).c2]v??v =IH µ˜[ι1 (x).〈x||µ˜x.c1〉 | ι2 (y).〈y||µ˜y.c2〉] =µ˜V µ˜[ι1 (x).c1 | ι2 (y).c2] µ˜[(x, y).c]v? ? v =IH µ˜z. 〈z||µ˜[(x, ).〈x||µ˜x. 〈z||µ˜[( , y).〈y||µ˜y.c〉]〉〉]〉 =µ˜V µ˜z. 〈z||µ˜[(x, ).〈z||µ˜[( , y).c]〉]〉 =η⊗V µ˜[(x, y).〈(x, y)||µ˜z. 〈z||µ˜[(x, ).〈z||µ˜[( , y).c]〉]〉〉] =µ˜V µ˜[(x, y).〈(x, y)||µ˜[(x, ).〈(x, y)||µ˜[( , y).c]〉]〉] =β⊗S µ˜[(x, y).c] where the most interesting case is for the round-trip of a case abstraction on a pair, which requires the βη laws for ⊗ to simplify. The other cases for the round-trip translation of the call-by-value dual calculus are: not(e)?v v ? =IH not(µ˜x. 〈x||e〉) =ηµ˜ not(e) 195 λx.v?v v ? =IH λx.µβ. 〈v||β〉 =ηµ λx.v pi1 [e]?v v ? =IH µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y. 〈x||e〉]〉]〉 =µV µ˜z. 〈z||pi1 [µ˜x. 〈µβ. 〈z||pi2 [β]〉||µ˜y. 〈x||e〉〉]〉 =µ˜V′ µ˜z. 〈z||pi1 [µ˜x. 〈x||e〉]〉 =ηµ˜ pi1 [e] pi2 [e]?v v ? =IH µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y. 〈y||e〉]〉]〉 =µV µ˜z. 〈µα. 〈z||pi1 [α]〉||µ˜x. 〈z||pi2 [µ˜y. 〈y||e〉]〉〉 =µ˜V′ µ˜z. 〈z||pi2 [µ˜y. 〈y||e〉]〉 =ηµ˜ pi2 [e] [e1, e2]?v v ? =IH [µ˜x. 〈x||e1〉, µ˜y. 〈y||e2〉] =ηµ˜ [e1, e2] where the most interesting cases are the pi1 and pi2 projections which requires the extended notion of values in V ′ to simplify. The soundness of equations follows by cases on the possible rewrite rules of the respective equational theories, which may make use of the facts that substitution commutes with translation (since both translations are compositional and hygienic (Downen & Ariola, 2014a)) and V (co-)values translate to V (co-)values in both directions. The cases for the core µV , µ˜V , ηµ, and ηµ˜ rules are immediate since they are the same in both calculi. The one tricky issue in relating the core µµ˜ calculus is the extended notion of V ′ value, which does not translate to a value in µµ˜V . Thankfully, these extra terms are still semantically substitutable within the µµ˜⊕,⊗,¬,→V equational theory. In particular, we have the following derived equality for µ˜V ′ within µµ˜⊕,⊗,¬,→V by induction on the values of V ′. The case for a first projection value is 〈µα. 〈V ||pi1 [α]〉||µ˜z.c〉?v = 〈µα. 〈V ?v ||µ˜[(x, y).〈x||α〉]〉||µ˜x.c?v〉 =µV 〈V ?v ||µ˜[(x, y).〈z||µ˜z.c?v〉]〉 =µ˜V 〈V ?v ||µ˜[(x, y).c?v {x/z}]〉 =ηµ 〈V ?v ||µ˜[(x, y).c?v {µα. 〈x||α〉/z}]〉 =β⊗V 〈V ? v ||µ˜[(x, y).c?v {µα. 〈(x, y)||µ˜[(x, y).〈x||α〉]〉/z}]〉 =β⊗V 〈V ? v ||µ˜[(x, y).〈(x, y)||µ˜z′.c?v {µα. 〈z′||µ˜[(x, y).〈x||α〉]〉/z}〉]〉 =η⊗V 〈V ? v ||µ˜z′.c?v {µα. 〈z′||µ˜[(x, y).〈x||α〉]〉/z}〉 196 =IH c?v {µα. 〈V ?v ||µ˜[(x, y).〈x||α〉]〉/z} = c?v {(µα. 〈V ||pi1 [α]〉)?v/z} = (c {µα. 〈V ||pi1 [α]〉/z})?v and the case for a second projection value is similar. What remains is to check the soundness of the rewrite rules for each connective. The ς rules are the same in both calculi, so they translate directly. Going from µµ˜⊕,⊗,¬,→V to the call-by-dual calculus, we have: (β⊕) 〈ιi (v)||µ˜[ι1 (x1).c1 | ι2 (x2).c2]〉v? = 〈ιi (vv?)||[µ˜x1.c1v?, µ˜x2.c2v?]〉 =ς⊕V µ˜V 〈v v ? ||µ˜z. 〈ιi (z)||[µ˜x1.c1v?, µ˜x2.c2v?]〉〉 =β⊕V 〈v v ? ||µ˜z. 〈z||µ˜xi.civ?〉〉 =µ˜V 〈vv? ||µ˜xi.civ?〉 = 〈v||µ˜xi.ci〉v? (β⊗) 〈(v1, v2)||µ˜[(x, y).c]〉v? = 〈(v1v?, v2v?)||µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y.cv?]〉]〉〉 =ς×V µV 〈v1 v ?||µ˜x. 〈(x, v2v?)||µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y.cv?]〉]〉〉〉 =ς×V µV 〈v1 v ?||µ˜x. 〈v2v?||µ˜y. 〈(x, y)||µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y.cv?]〉]〉〉〉〉 =µ˜V 〈v1v?||µ˜x. 〈v2v?||µ˜y. 〈(x, y)||pi1 [µ˜x. 〈(x, y)||pi2 [µ˜y.cv?]〉]〉〉〉 =β×V 〈v1 v ?||µ˜x. 〈v2v?||µ˜y. 〈x||µ˜x. 〈y||µ˜y.cv?〉〉〉〉 =µ˜V 〈v1v?||µ˜x. 〈v2v?||µ˜y.cv?〉〉 = 〈v1||µ˜x. 〈v2||µ˜y.c〉〉v? (β¬) 〈µ(¬ [x].c)||¬ [v]〉v? = 〈not(µ˜x.cv?)||not[vv? ]〉 =β¬V 〈vv? ||µ˜x.cv?〉 = 〈v||µ˜x.c〉 v ? (β→) 〈µ([x · β].c)||v · e〉v? = 〈λx.µβ.cv?||vv? · ev?〉 =ς→V µ˜V 〈vv? ||µ˜x. 〈λx.µβ.cv?||x · ev?〉〉 =β→V 〈vv? ||µ˜x. 〈µβ.cv?||ev?〉〉 = 〈v||µ˜x. 〈µβ.c||e〉〉 v ? (η⊕) µ˜[ι1 (x).〈ι1 (x)||α〉 | ι2 (y).〈ι2 (y)||α〉]v? = [µ˜x. 〈ι1 (x)||α〉, µ˜y. 〈ι2 (y)||α〉] =η+V α (η⊗) µ˜[(x, y).〈(x, y)||α〉]v? = µ˜z. 〈z||pi1 [µ˜x. 〈z||pi2 [µ˜y. 〈(x, y)||α〉]〉]〉 =η×V β×V′ µ˜z. 〈µβ1. 〈z||pi1 [β1]〉||µ˜x. 〈z||pi2 [µ˜y. 〈(x, y)||α〉]〉〉 =µ˜V′ µ˜z. 〈z||pi2 [µ˜y. 〈(µβ1. 〈z||pi1 [β1]〉, y)||α〉]〉 =η×V β×V′ µ˜z. 〈µβ2. 〈z||pi2 [β2]〉||µ˜y. 〈(µβ1. 〈z||pi1 [β1]〉, y)||α〉〉 197 =µ˜V′ µ˜z. 〈(µβ1. 〈z||pi1 [β1]〉, µβ2. 〈z||pi2 [β2]〉)||α〉 =η×V µ˜z. 〈z||α〉 =ηµ˜ α (η¬) µ(¬ [x].〈z||¬ [x]〉)v? = not(µ˜x. 〈z||not[x]〉) =η¬V z (η→) µ([x · β].〈z||x · β〉)v? = λx.µβ. 〈z||x · β〉 =η→V z Going from the call-by-value dual calculus to µµ˜⊕,⊗,¬,→V , we have: (β+V ) 〈ιi (V )||[e1, e2]〉?v = 〈ιi (V ?v )||µ˜[ι1 (x).〈x||e1?v〉 | ι2 (x).〈x||e2?v〉]〉 =β⊕V 〈V ? v ||ei?v〉 = 〈V ||ei〉?v (β×V ) 〈(V1, V2)||pii [e]〉?v = 〈(V1?v, V2?v)||µ˜[(x1, x2).〈xi||e?v〉]〉 =β⊗V 〈Vi ? v||e?v〉 = 〈Vi||e〉?v (β¬V ) 〈not(e)||not(v)〉?v = 〈µ(¬ [x].〈x||e?v〉)||¬ [v?v ]〉 =β¬V 〈v?v ||e?v〉 = 〈v||e〉 ? v (β→V ) 〈λx.v||V · e〉?v = 〈µ([x · β].〈v?v ||β〉)||V ?v · e?v〉 =β→µV 〈V ?v ||µ˜x. 〈v?v ||e?v〉〉 =µ˜V′ 〈v?v {V ?v /x}||e?v〉 = 〈v {V/x}||e〉?v (η+V ) [µ˜x. 〈ι1 (x)||e〉, µ˜y. 〈ι2 (y)||e〉]?v = µ˜[ι1 (x).〈x||µ˜x. 〈ι1 (x)||e?v〉〉 | ι2 (y).〈y||µ˜y. 〈ι2 (y)||e?v〉〉] =µ˜V µ˜[ι1 (x).〈ι1 (x)||e?v〉 | ι2 (y).〈ι2 (y)||e?v〉] =η⊕V e ? v (η×V ) (µα. 〈v||pi1 [α]〉, µβ. 〈v||pi2 [β]〉)?v = (µα. 〈V ?v ||µ˜[(x, ).〈x||α〉]〉, µβ. 〈V ?v ||µ˜[( , y).〈y||β〉]〉) =ηµ µγ. 〈(µα. 〈V ?v ||µ˜[(x, ).〈x||α〉]〉, µβ. 〈V ?v ||µ˜[( , y).〈y||β〉]〉)||γ〉 =µ˜V′ µγ. 〈V ?v ||µ˜z. 〈(µα. 〈z||µ˜[(x, ).〈x||α〉]〉, µβ. 〈z||µ˜[( , y).〈y||β〉]〉)||γ〉〉 =η⊗V µγ. 〈 V ?v ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ (x, y).〈(x, y) ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜z. 〈(µα. 〈z||µ˜[(x, ).〈x||α〉]〉, µβ. 〈z||µ˜[( , y).〈y||β〉]〉) ∣∣∣∣∣∣ ∣∣∣∣∣∣γ 〉〉〉 =µ˜V µγ. 〈 V ?v ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ (x, y).〈(µα. 〈(x, y)||µ˜[(x, ).〈x||α〉]〉, µβ. 〈(x, y)||µ˜[( , y).〈y||β〉]〉) ∣∣∣∣∣∣ ∣∣∣∣∣∣γ 〉〉 198 =β⊗V µγ. 〈V ? v ||µ˜[(x, y).〈(µα. 〈x||α〉, µβ. 〈y||β〉)||γ〉]〉 =ηµ µγ. 〈V ?v ||µ˜[(x, y).〈(x, y)||γ〉]〉 =η⊗ µγ. 〈V ?v ||γ〉 =ηµ V ?v (η¬V) not(µ˜x. 〈V ||not[x]〉)?v = µ(¬ [x].〈x||µ˜x. 〈V ?v ||¬ [x]〉〉) =µ˜V µ(¬ [x].〈V ?v ||¬ [x]〉) =η¬V′ V ? v (η→V ) λx.µβ. 〈V ||x · β〉?v = µ([x · β].µβ. 〈V ?v ||x · β〉β) =µ˜V µ([x · β].〈V ?v ||x · β〉) =η→V′ V ? v b) This follows from part (a) by duality. More specifically, translation commutes with duality in the two calculi as c?v ⊥ = c⊥?n c?n ⊥ = c⊥?v cv?⊥ = c⊥ n ? c n ? ⊥ = c⊥v? and similarly for (co-)terms, which follows directly from the definitions of the duality and translation operations by (mutual) induction on the syntax of commands and (co-)terms. Therefore, the fact that the translations are inverses comes from part (a) by applying Theorems 3.6 and 5.5, so cn? ? n =Theorem 5.5 cn? ? n ⊥⊥ = c⊥⊥v? ? v = c⊥⊥ =Theorem 5.5 c c?n n ? =Theorem 3.6 c?n n ? ⊥⊥ = c⊥⊥?v v ? = c⊥⊥ =Theorem 3.6 c and similarly for (co-)terms. Furthermore N is dual to V and &,`,∼,→ is dual to ⊕,⊗,¬,→, so if we have c = c′ in the µµ˜&,`,∼,→N then c⊥ = c⊥ in µµ˜⊕,⊗,¬,→N by Theorem 5.7, c⊥v? = c′⊥ v ? in the call-by-value dual calculus by part (a), and thus cn? = cn?⊥⊥ = c⊥ v ? ⊥ = c′⊥v? ⊥ = c′n? ⊥⊥ = cn? in the call-by-name dual calculus by Theorems 3.8 and 3.6 and the above, and similarly for (co-)terms. Going the other way, if we have c = c′ in the call-by- value dual calculus then we have c⊥?v = c′⊥ ? v in µµ˜ ⊕,⊗,¬,→ V by Theorem 5.7 and 199 part (a), so c?n = c?n ⊥⊥ = c⊥?v ⊥ = c′⊥?v ⊥ = c′?n ⊥⊥ = c?n in µµ˜&,`,∼,→N by Theorem 3.8 and the above, and similarly for (co-)terms. It follows that the idea of distinguishing data and co-data provides a unifying framework for studying the computational meaning of types in the sequent calculus. The distinction is baked into polarized languages, like system L as previously seen in Chapter IV. But even for the dual calculi, in which there is no apparent division between data types and co-data types, the difference between the two is instead buried inside the dual call-by-value and call-by-name interpretations of the types. Next in the following Chapter VI, we will move beyond just the simple types considered here (variations of products, sums, functions, and so on) to also incorporate more advanced type features into the data and co-data framework. In particular, Chapter VI will show how polymorphism in the form of type abstraction, previously seen in Chapters II and III, can be rephrased in terms of the data and co-data framework explored here. This extension will serve as a platform to study the duality between induction and co-induction as two modes of structural recursion which improves the treatment of co-induction as the equal-and-opposite partner to induction, and also clarifies the murky issues of “well-foundedness” surrounding co-induction. In particular, the tendency to view co-inductive objects as “necessarily lazy” comes from the fact that they are co-data objects. The delicate balance of evaluation order that is required to combine inductive and co-inductive objects falls out automatically by modeling as data and co-data which already implies the correct computational meaning. 200 CHAPTER VI Induction and Co-Induction This chapter is a revised version of (Downen et al., 2015) to fit in the context of this dissertation of which I was the primary author and developed the language and theory of structural recursion in the classical sequent calculus presented in this chapter. I would like to thank my co-authors Philip Johnson-Freyd and Zena M. Ariola for their assistance and feedback in writing that publication. Martin-Löf’s type theory (Martin-Löf, 1998, 1975; Martin-Löf, 1982) taught us that inductive definitions and reasoning are pervasive throughout proof theory, mathematics, and computer science. Inductive data types are used in programming languages like ML and Haskell to represent structures, and in proof assistants and dependently typed languages like Coq and Agda to reason about finite structures of arbitrary size. Mendler (1988) showed us how to talk about recursive types and formalize inductive reasoning over arbitrary data structures. However, the foundation for the opposite to induction, co-induction, has not fared so well. Co-induction is a major concept in programming, representing endless processes, but it is often neglected, misunderstood, or mistreated. As articulated by McBride (Singh et al., 2011): We are obsessed with foundations partly because we are aware of a number of significant foundational problems that we’ve got to get right before we can do anything realistic. The thing I would think of . . . in particular in that respect is co-induction and reasoning about co-recursive processes. That’s currently, in all major implementations of type theory, a disaster. And if we’re going to talk about real systems, we’ve got to actually have something sensible to say about that. The introduction of co-patterns for co-induction Abel et al. (2013) is a major step forward in rectifying this situation. Abel et al. emphasize that there is a dual view to inductive data types, in which the values of types are defined by how they are used instead of how they are built, a perspective on co-data types first spurred on by Hagino (1987, 1989). Co-inductive co-data types are exciting because they may solve 201 the existing problems with representing infinite objects in proof assistants like Coq (Abel & Pientka, 2013). Our goal here is to improve the understanding and treatment of co-induction, and to integrate both induction and co-induction into a cohesive whole for representing well-founded recursive programs. Our main tools for accomplishing this goal are the pervasive and overt duality and symmetry that runs through classical logic and the sequent calculus. By developing a representation of well-founded induction in a language for the classical sequent calculus, we get an equal and opposite version of well-founded co-induction “for free.” Thus, the challenges that arise from using classical sequent calculus as a foundation for induction are just as well the challenges of co-induction, as the two are inherently developed simultaneously. Later in Chapter IX, we will translate the developments of induction and co-induction in the classical sequent calculus to a λ-calculus based language for effect-free programs, to better relate to the current practice of type theory and functional programming. As the λ- based style lacks symmetries present in the sequent calculus, some of the constructs for recursion are lost in translation. Unsurprisingly, the cost of an asymmetrical viewpoint is blindness to the complete picture revealed by duality. Our philosophy is to emphasize the disentanglement of the recursion in types from the recursion in programs, to attain a language rich in both data and co-data while highlighting their dual symmetries. On the one hand, the Coq viewpoint is that all recursive types—both inductive and co-inductive—are represented as data types (positive types in polarized logic (Munch-Maccagnoni, 2009)), where induction allows for infinitely deep destruction and co-induction allows for infinitely deep construction. On the other hand, the co-pattern approach (Abel et al., 2013; Abel & Pientka, 2013) which is inspired by Hagino’s (1987) treatment of co-induction via finiate observations represents inductive types as data and co-inductive types as co-data. In contrast, we take the view that separates the recursive definition of types from the types used for specifying recursive processing loops. Thereby, the types for representing the structure of a recursive process are given first-class status, defined on their own independently of any other programming construct. This makes the types more compositional, so that they may be combined freely in more ways, as they are not confined to certain restrictions about how they relate to data vs co-data or induction vs co-induction. More traditional views on the distinction between inductive and co-inductive programs 202 come from different modes of use for the same building blocks, emerging from particular compositions of several (co-)data types. We will base our calculus for recursion on the parametric µµ˜-calculus with data and co-data from Chapter V which corresponds to a classical logic, so it inherently contains control effects (Griffin, 1990) that allow programs to abstract over their own control-flow—intuitionistic logic and effect-free functional programs are later considered as a special case in Chapter IX. As we saw, the fundamental dilemma of classical computation (Section 3.2) means that the intended evaluation strategy for a program becomes an essential part of understanding its meaning: even terminating programs give different results for different strategies. For example, the functional program length(Cons (error “boom”)Nil) returns 1 under call-by-name (lazy) evaluation, but goes “boom” with an error under call-by-value (strict) evaluation. Therefore, a calculus that talks about the behavior of programs needs to consider the impact of the evaluation strategy. We therefore leverage the parametric nature of the µµ˜-calculus to disentangle this choice from the calculus itself, boiling down the distinction as a substitution strategy. Note that, unlike many accounts of co-induction, we do not rely on a particular choice of evaluation strategy—like some sort of lazy evaluation which delays computing results until they are needed—but instead the apt use of data and co-data forces the correct interpretation of infinite objects. We therefore get a family of calculi, parameterized by the strategy, for reasoning about the behavior of programs ultimately executed with some evaluation strategy. The issue of strong normalization is then framed uniformly over this family of calculi by specifying some basic requirements of the chosen substitution strategy which are inspired by focusing in logic. The bedrock on which we build our structures for recursion is the connection between logic and programming languages, and the cornerstone of the design is the duality permeating these programming concepts. Induction and co-induction are clearly dual, and the duality of their opposition shines through in the the symmetric setting of the sequent calculus. Here, classicality is not just a feature, but an essential completion of the duality needed to fully express the connections between recursion and co-recursion. We consider several different types for representing recursion in programs based on the mathematical principles of primitive and noetherian recursion which are reflected as pairs of dual data and co-data types. As we will find, both of these different recursive principles have different strengths as programming features: 203 primitive recursion allows us to depend on the statically-known sizes of constructions at run-time à la GADTs and simulate seemingly infinite constructed objects, like potentially infinite lists in Coq or Haskell, whereas noetherian recursion admits type- erasure. In essence, we demonstrate how this parametric sequent calculus can be used as a core calculus and compilation target for establishing well-foundedness of recursive programs, via the computational interpretation of common principles of mathematical induction. This chapter covers the following topics: – A presentation of some basic functional programs, including co-patterns (Abel et al., 2013), in a sequent based syntax to illustrate how the sequent calculus gives a language for programming with structures and duality (Section 6.1). – A language for the higher-order sequent calculus in which all types, including functions and polymorphism, are treated as user-defined data and co-data types (Section 6.2). – Two forms of well-founded recursion in types—based on primitive and noetherian recursion—along with specific data and co-data types for performing well- founded recursion in programs (Section 6.3). – An extension of the language of the sequent calculus with recursion, where the reduction theory is strongly normalizing for well-typed programs and supports erasure of computationally irrelevant types at run-time (Section 6.4). – Programming with Structures and Duality Pattern-matching is an integral part of functional programming languages, and is a great boon to their elegance. However, the traditional language of pattern-matching can be lacking in areas, especially when we consider dual concepts that arise in all programs. For example, when defining a function by patterns, we can match on the structure of the input—the argument given to the function—but not its output—the observation being made about its result. In contrast, calculi inspired by the sequent calculus that we’ve seen in Chapters III, IV, and V feature a more symmetric language which both highlights and restores this missing duality. Indeed, in a setting with such 204 ingrained symmetry, maintaining dualities is natural. We now consider how concepts from functional programming translate to a sequent-based language, and how programs can leverage duality by writing basic recursive functional programs in this symmetric setting. Example 6.1. One of the most basic functional programs is the function that calculates the length of a list. We can write this length function in a Haskell- or Agda-like language by pattern-matching over the structure of the given List a to produce a Nat: dataNatwhere Z : Nat S : Nat→ Nat data List awhere Nil : List a Cons : a→ List a→ List a length : List a→ Nat length Nil = Z length (Cons x xs) = let y = length xs in S y This definition of length describes its result for every possible call. Similarly, we can define length in the parametric µµ˜-calculus1 from Chapter V in much the same way. First, we introduce the types in question by data declarations in the sequent calculus: dataNatwhere Z : ` Nat | S : Nat ` Nat | data List(X)where Nil : ` List(X) | Cons : X, List(X) ` List(X) | While these declarations give the same information as before, the differences between these specific data type declarations are largely stylistic. Instead of describing the constructors in terms of a pre-defined function type, the shape of the constructors are described via sequents, replacing function arrows with entailment (`) and commas for separating multiple inputs. Furthermore, the type of the main output produced by each constructor is highlighted to the right of the sequent between entailment and a vertical bar, as in ` Nat | or ` List(X) |, and all other types describe the parameters that must be given to the constructor to produce this output. Thus, we can construct 1Recall that following the notation of Chapter III the symbols µ and µ˜ used here are not related to recursion, as they sometimes are in other languages, but rather are binders for variables and co-variables. 205 a list as either Nil or Cons(x, xs), much like in functional languages. Next, we define length by specifying its behavior for every possible call: length : List(X)→ Nat 〈length||Nil · α〉 = 〈Z||α〉 〈length||Cons(x, xs) · α〉 = 〈length||xs · µ˜y. 〈S(y)||α〉〉 The main difference is that we consider more than just the argument to length. Instead, we are describing the action of length with its entire context by showing the behavior of a command connecting it together with a consumer. For example, in the command 〈Z||α〉, Z is a term producing zero and α is a co-term—specifically a co-variable—that consumes that number. Besides co-variables, we have other co-terms that consume information. The call-stack Nil · α consumes a function by supplying it with Nil as its argument and consuming its returned result with α. The input abstraction µ˜y. 〈S(y)||α〉 names its input y before running the command 〈S(y)||α〉, similarly to the context let y =  in S(y) from the functional program. In functional programs, it is common to avoid explicitly naming the result of a recursive call, especially in such a short program. Instead, we would more likely define length as: length : List a→ Nat length Nil = Z length (Cons x xs) = S (length xs) We can mimic this definition in the sequent calculus as: length : List(X)→ Nat 〈length||Nil · α〉 = 〈Z||α〉 〈length||Cons(x, xs) · α〉 = 〈S(µβ. 〈length||xs · β〉)||α〉 Note that to represent the functional call length xs inside the successor constructor S, we need to make use of the output abstraction µβ. 〈length||xs · β〉 that names its output channel β before running the command 〈length||xs · β〉, which calls length with xs as the argument and β as the return point. As we saw in Section 5.6, output 206 abstractions are exactly dual to input abstractions, and defining length in µµ˜ requires us to name the recursive result as either an input or an output. Just as functions can be represented as first-class values through λ-abstractions in functional languages, their sequent calculus counter-parts can be represented as first-class values in terms of case abstractions in the µµ˜-calculus. Using a recursively- defined case abstraction with deep pattern-matching, we can represent length in the µµ˜-calculus from Chapter 5.2: length = µ(Nil · α.〈Z||α〉 |Cons(x, xs) · α.〈length||xs · µ˜y. 〈S(y)||α〉〉) Furthermore, the deep pattern-matching can be mechanically translated to the shallow case analysis on (co-)data structures: length = µ(xs · α. 〈xs||µ˜[Nil.〈Z||α〉 |Cons(x, xs′).〈length||xs′ · µ˜y. 〈S(y)||α〉〉]〉) This case abstraction describes exactly the same specification as the definition for length according to the reduction theory of the parametric µµ˜-calculus: when run with the call-stack Nil ·α, the command reduces to 〈Z||α〉, and when run with the call-stack Cons(x, xs) · α, the command reduces to 〈length||xs · µ˜y. 〈S(y)||α〉〉. However, here we will favor presenting the example programs in the style of specifying the behavior of commands using deep pattern-matching, as this gives a higher-level and more abstract reading of programs, with the understanding that they can be mechanically compiled down to (recursive) case abstractions with shallow pattern-matching as above. End example 6.1. We have seen how to write a recursive function by pattern-matching on the first argument, x, in a call-stack x ·α. However, why should we be limited to only matching on the structure of the argument x? If the observations on the returned result must also follow a particular structure, why can’t we match on α as well? Indeed, in a symmetric language, there is no such distinction. For example, the function call-stack itself can be viewed as a structure, so that a curried chain of function applications f x y z is represented by the pattern x · y · z · α, which reveals the nested structure down the output side of function application, rather than the input side. Thus, the sequent calculus reveals a dual way of thinking about information in programs phrased 207 as co-data, as we saw in Chapter V, in which observations follow predictable patterns, and values respond to those observations by matching on their structure. In such a symmetric setting, it is only natural to match on any structure appearing in either inputs or outputs. Example 6.2. We can consider this view on co-data to understand programs with “infinite” objects. For example, infinite streams may be defined by the primitive projections out of streams: codata Stream(X)where Head : | Stream(X) ` a Tail : | Stream(X) ` Stream(X) Contrarily to data types, the type of the main input consumed by co-data constructors is highlighted to the left of the sequent in between a vertical bar and entailment, as in | Stream(X) `. The rest of the types describe the parameters that must be given to the constructor in order to properly consume this main input. For Streams, the observation Head[α] requests the head value of a stream which should be given to α, and Tail[β] asks for the tail of the stream which should be given to β.2 We can now define a function countUp—which turns an x of type Nat into the infinite stream x, S(x), S(S(x)), . . .—by pattern-matching on the structure of observations on functions and streams: countUp : Nat→ Stream(Nat) 〈countUp||x · Head[α]〉 = 〈x||α〉 〈countUp||x · Tail[β]〉 = 〈countUp||S(x) · β〉 If we compare countUp with length in this style, we can see that there is no fundamental distinction between them: they are both defined by cases on their possible observations. The only point of difference is that length happens to match on the structure of its argument in its call-stack, whereas countUp matches on the return co-data structure of in its call-stack. 2Keeping the convention from Chapter III, we use square brackets as grouping delimiters in observations, like the head projection Head[α] out of a stream, as opposed to round parentheses used as grouping delimiters in results, like the successor number S(y). This helps to disambiguate between results (terms) and observations (co-terms) in a way that is syntactically apparent independently of their context. 208 Abel et al. (2013) have carried this intuition back into the functional paradigm. For example, we can still describe streams by their Head and Tail projections, and define countUp through co-patterns: codata Stream awhere Head : Stream a→ a Tail : Stream a→ Stream a countUp : Nat→ Stream(X) (countUp x).Head = x (countUp x).Tail = countUp (Sx) This definition gives the functional program corresponding to the sequent version of countUp. So we can see that co-patterns arise naturally, in Curry-Howard isomorphism style, from the computational interpretation of Gentzen’s (1935a) sequent calculus. End example 6.2. Example 6.3. Since a symmetric language is not biased against pattern-matching on inputs or outputs, and indeed the two are treated identically, there is nothing special about matching against both inputs and outputs simultaneously. For example, we can model infinite streams with possibly missing elements as SkipStream(X) = Stream(Maybe(X)) where Maybe(X) corresponds to the Haskell data type of the same name defined as: dataMaybe(X)where Nothing : ` Maybe(X) | Just : X ` Maybe(X) | with constructors Nothing and Just(x) for x of type X. Then we can define the empty skip stream which gives Nothing at every position, and the countDown function that 209 transforms Sn(Z) into the stream Sn(Z), Sn−1(Z), . . . ,Z,Nothing, . . . : empty : SkipStream(Nat) 〈empty||Head[α]〉 = 〈Nothing||α〉 〈empty||Tail[β]〉 = 〈empty||β〉 countDown : Nat→ SkipStream(Nat) 〈countDown||x · Head[α]〉 = 〈Just(x)||α〉 〈countDown||Z · Tail[β]〉 = 〈empty||β〉 〈countDown||S(x) · Tail[β]〉 = 〈countDown||x · β〉 End example 6.3. Example 6.4. As opposed to the co-data approach to describing infinite objects, there is a more widely used approach in lazy functional languages like Haskell and proof assistants like Coq that still favors framing information as data. For example, an infinite list of zeroes is expressed in this functional style by an endless sequence of Cons: zeroes : List(Nat) zeroes = Cons Z zeroes We could emulate this definition in sequent style as the expansion of zeros when observed by any α: zeroes : List(Nat) 〈zeroes||α〉 = 〈Cons(Z, zeroes)||α〉 Likewise, we can describe the concatenation of two, possibly infinite lists in the same way, by pattern-matching on the call: cat : List(X)→ List(X)→ List(X) 〈cat||Nil · ys · α〉 = 〈ys||α〉 〈cat||Cons(x, xs) · ys · α〉 = 〈Cons(x, µβ. 〈cat||xs · ys · β〉)||α〉 The intention is that, so long as we do not evaluate the sub-components of Cons eagerly, then α receives a result even if xs is an infinitely long list like zeroes. 210 End example 6.4. In each of these examples, we were only concerned with writing recursive programs, but have not showed that they always terminate. Termination is especially important for proof assistants and dependently typed languages, which rely on the absence of infinite loops for their logical consistency. If we consider the programs in Examples 6.1 and 6.2, then termination appears fairly straightforward by structural recursion somewhere in a function call: each recursive invocation of length has a structurally smaller list for the argument, and each recursive invocation of countUp, and countDown has a smaller stream projection out of its returned result. However, formulating this argument in general turns out to be more complicated. Even worse, the “infinite data structures” in Example 6.4 do not have as clear of a concept of “termination:” zeroes and concatenation could go on forever, if they are not given a bound to stop. To tackle these issues, we will phrase principles of well-founded recursion in the parametric µµ˜- calculus, so that we arrive at a core calculus capable of expressing complex termination arguments (parametrically to the chosen evaluation strategy) inside the calculus itself (see Section 6.4). Polymorphism and Higher Kinds Before we can talk about statically-guaranteed termination arguments in types, we must first be able to quantify over types. That is to say, we need to extend the parametric µµ˜-calculus with type quantifiers like ∀ and ∃ that we had seen previously in natural deduction (Chapter II) and the sequent calculus (Chapter III). We could just add special connectives with their own separate rules for the quantifiers to the calculus. However, instead let’s look at how we can enrich the existing mechanisms of data and co-data to incorporate both ∀- and ∃-style quantifiers as just more declared (co-)data types like products, sums, and functions. As it turns out, starting from the multi-kinded parametric sequent calculus from Sections 5.4 and 5.5 we are almost already there. First of all, we will extend the syntax of terms and co-terms to let (co-)data structures contain types in addition to sub-expressions, as shown in Figure 6.1. This change means that the patterns in case abstractions can now bind type variables in addition to ordinary (co-)variables, so that (co-)terms can abstract over types as well as other (co-)terms like in the polymorphic λ-calculus (Section 2.2) or polymorphic sequent calculus (Section 3.3). In addition, we will also allow types to abstract over types by extending the language 211 X, Y, Z ∈ TypeVariable ::= . . . R,S, T ∈ BaseKind ::= . . . F,G ∈ Connective ::= . . . k, l ∈ Kind ::= S | k → l A,B,C ∈ Type ::= X | F( #»A) | λX : k.B | A B x, y, z ∈ Variable ::= . . . α, β, γ ∈ CoVariable ::= . . . K ∈ Constructor ::= . . . O ∈ Observer ::= . . . c ∈ Command ::= 〈v||e〉 v ∈ Term ::= x | µα.c | K #»A ( #»e , #»v ) | µ ( O # » X:k [ #»x , #»α ].c | . . . ) e ∈ CoTerm ::= α | µ˜x.c | µ˜ [ K # » X:k( #»α , #»x ).c | . . . ] | O #»A [ #»v , #»e ] FIGURE 6.1. The syntax of types and programs in the higher-order µµ˜-calculus. of kinds (denoted by the metavariables k, l) to include arrow kinds k → l in addition to base kinds S, which gives us type functions also shown in Figure 6.1. The type-level language of functions uses the notation of the λ-calculus, so that a type function with the parameter X : k is introduced as the λ-abstraction λX : k.B and a type function is applied as A B. Intuitively, the motivation for adding type functions to the language is to let (co-)data declarations abstract over them, giving us higher-order (co-)data types. In particular, the addition of type abstraction in both programs and types lets us extend the multi-kinded (co-)data declaration mechanism and kind system as shown in Figure 6.2. The main addition is that now the constructors in a data declaration of F( #»X ) and the observers in a co-data declaration of G( #»X )can introduce hidden quantified type variables #»Y that do not appear in the externally visible interface #»X of the connective. For example, for some fixed kind S, we can give declarations for the universal (∀) and existential (∃) quantification over a type of kind k as follows: codata ∀k(X : k → S) : Swhere @ : ( | ∀k(X) `Y :k X Y : S ) data ∃k(X : k → S) : Swhere @ : ( X Y : S `Y :k ∃k(X) | ) These declarations extend the same notion of quantifiers in the dual calculi to higher kinds k, where we use the shorthand ∀Y :k.A for ∀k(λY :k.A) and ∃Y :k.A for 3As before, this is shorthand for a (co-)data declaration of F( # »X : k) : S in G. 212 decl ∈ Declaration ::= data F( # »X : k) : Swhere # » K : ( # » A : T ` # »Y :l F( #»X ) | # »B : R ) | codataG( # »X : k) : Swhere # » O : ( # » A : T | G( #»X ) ` # »Y :l # »B : R ) G ∈ GlobalEnv ::= # »decl Θ ∈ TypeEnv ::= # »X : k Γ ∈ InputEnv ::= # »x : A ∆ ∈ OutputEnv ::= # »α : A J,H ∈ Judgement ::= ( Γ `ΘG ∆ ) seq | (G ` decl) | (Θ `G A : k) Declaration rules: # » # » X : k, # »Y : l `G A : T # » # » X : k, # »Y : l `G B : R # »( ` # » X:k, # »Y :l G ) seq G ` data F( # »X : k) : Swhere # » K : ( # » A : T ` # »Y :l F( #»X ) | # »B : R ) data # » # » X : k, # »Y : l `G A : T # » # » X : k, # »Y : l `G B : R # »( ` # » X:k, # »Y :l G ) seq G ` codataG( # »X : k) : Swhere # » O : ( # » A : T | F( #»X ) ` # »Y :l # »B : R ) codata Kind rules: Θ, X : k `G X : k TV # »Θ `G C : k (F( # »X : k) : S)3 ∈ G Θ `G F( #»C ) : S FT Θ, X : k `G A : l Θ `G λX : k.A : k → l →I 2 Θ `G A : k → l Θ `G B : k Θ `G A B : l →E 2 Well-formed sequent rules: ( ` ) seq G ` decl ( `G ) seq( `G,decl ) seq ( `ΘG ) seq( `Θ,X:kG ) seq Θ `G A : S ( Γ `ΘG ∆ ) seq( Γ, x : A `ΘG ∆ ) seq Θ `G A : S ( Γ `ΘG ∆ ) seq( Γ `ΘG α : A,∆ ) seq FIGURE 6.2. The kind system for the higher-order parametric µµ˜ sequent calculus. 213 ∃k(λY :k.A). A term of type ∀Y :k.A is introduced as the case abstraction µ(Y :k @ α.c) that is consumed buy the observation B @ e. Dually, a term of type ∃Y : k.A is introduced by the construction B @ v that is consumed by the case abstraction µ˜[Y :k @ x.c]. Note that the kind system in Figure 6.2 also includes an entirely new kind of judgement ( Γ `G ∆ ) seq that says a general sequent Γ `ΘG ∆ is well-formed. This judgement is now necessary because of the addition of type functions, which are a new kind of type that does not actually classify any term or co-term. In other words, supposing that a free variable x has type λX:S.X would be nonsensical. Therefore, we rule any such possibility by the rules of ( Γ `G ∆ ) seq , which enforce that for every x : A in Γ and α : A in ∆, A must belong to some base kind S and not some other kind like k → l. This is the same reason that the declarations for (co-)data types can only declare connectives of the form F( # »X : k) : S for some base kind S, and similarly the sequents that give the types of constructors and observers are well-formed whenever the declaration is well-formed according to the data and codata rules. Since we have added new forms of terms and co-terms which package up and abstract over types, we also need to update the typing rules to accomodate these new forms in the higher-order parametric µµ˜-calculus, as shown in Figure 6.3. Note that the judgements and core typing rules are exactly the same as the core typing rules for the multi-kinded type system from Figure 5.16 plus the addition of the type conversion rules TCR and TCL. These conversion rules say that any β = equivalent types (in the sense of the typed βη equational theory of the λ-calculus from Chapter II Section 2.2 and denoted by the judgement Θ `G A =βη B : S with the rules given in Figure 6.4) contain exactly the same terms and co-terms. The only other update is in the left and right introduction rules for particular (co-)data types, which now account for the possibility that constructions and observations might include types which are referenced in the components of the pattern. For (co-)data structures, this means that there is a choice of hidden types # » C ′i which must be substituted for the quantified type variables # »Yi : li in the sub-(co-)terms of the structure. For (co-)data case abstractions, we need to extend the local type environment Θ with the abstracted type variables, just as we must extend the local input and output environments with the abstracted (co-)variables. For example, the specific instances of the general typing rules for the two families of quantifiers ∀k and 214 Judgement ::= c : ( Γ `ΘG ∆ ) | (Γ `ΘG v : A | ∆) | (Γ | e : A `ΘG ∆) Type conversion rules: Γ `ΘG v : A | ∆ Θ `G A =βη B : S Γ `ΘG v : B | ∆ TCR Γ | e : A `ΘG ∆ Θ `G A =βη B : S Γ | e : B `ΘG ∆ TCL Logical rules: Given data F( # »X : k) : Swhere # » Ki : ( # » Aij : Tijj ` # » Yi:li F( #»X ) | # »Bij : Rijj )i ∈ G, we have the rules: θ = { # » C/X } # » Θ `G C ′iθ : liθ θ′ = { # » C ′i/Yi } θ # » Γ′j | e : Bijθ′ `ΘG ∆′j j # » Γj | v : Aijθ′ `ΘG ∆j j # »Γj j , # » Γ′j j `ΘG K # » C′ i ( #»e , #»v ) : F( #» C ) | # »∆jj , # » ∆′j j FRKi θ = { # » C/X } # » ci : ( Γ, # »xi : Aiθ `Θ, # » Yi:liθ G # » αi : Biθ,∆ )i Γ | µ˜ [ # » K # » Yi:li i ( #»αi , #»xi).ci i ] : F( #»C ) `ΘG ∆ FL Given codataG( # »X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # » Yi:li # »Bij : Rijj )i ∈ G, we have the rules: θ = { # » C/X } # » ci : ( Γ, # »xi : Aiθ `Θ, # » Yi:liθ G # » αi : Biθ,∆ )i Γ `ΘG µ ( # » O # » Yi:li i [ #»xi , #»αi ].ci i ) : G( #»C ) | ∆ GR θ = { # » C/X } # » Θ `G C ′i : li θ′ = { # » C ′i/Yi } θ # » Γj | v : Aijθ′ `ΘG ∆j j # » Γ′j | e : Bijθ′ `ΘG ∆′j j #»Γj j , #» Γ′j j | O # » C′i i [ #»v , #»e ] : G( #» C ) `ΘG # »∆j j , # » ∆′j j GLOi FIGURE 6.3. Types of higher-order (co-)data in the parametric µµ˜ sequent calculus. 215 Θ, X : k `G A : l Θ `G B : k Θ `G (λX:k.A) B =βη A {B/X} : l β Θ `G A : k → l Θ `G λX:k.A X =βη A : k → l η Θ `G A : k Θ `G A =βη A : k refl Θ `G B =βη A : k Θ `G A =βη B : k symm Θ `G A =βη B : k Θ `G B =βη C : k Θ `G A =βη C : k trans Θ, X : k `G X =βη X : k TV Θ `G F( #»C ) : S # »Θ `G C =βη C ′ : k Θ `G F( # »C ′) : S (F( # »X : k) : S) ∈ G Θ `G F( #»C ) =βη F( # »C ′) : S FT Θ, X : k `G A =βη A′ : l Θ `G λX:k.A =βη λX:k.A′ : k → l →I 2 Θ `G A =βη A′ : k → l Θ `G B =βη B′ : k Θ `G A B =βη A′ B′ : l →E 2 FIGURE 6.4. βη conversion of higher-order types. 216 (βF) 〈 K #» C i ( #»e , #»v ) ∣∣∣∣∣∣µ˜[· · · | K # »Y :li ( #»α , #»x ).ci | · · ·]〉 βF 〈µ #»α . 〈 #»v ∣∣∣∣∣∣µ˜ #»x .ci { # »C/Y }〉∣∣∣∣∣∣ #»e 〉 (βG) 〈 µ ( · · · | O # » Y :l i [ #»x , #»α ].ci | · · · )∣∣∣∣∣∣O #»Ci [ #»v , #»e ]〉 βG 〈 #»v ∣∣∣∣∣∣µ˜ #»x . 〈µ #»α .ci { # »C/Y }∣∣∣∣∣∣ #»e 〉〉 (ηF) γ : F( #»C ) ≺ηF µ˜ [ # » K # » Y :l i ( #»α , #»x ). 〈 K # » Y :l i ( #»α , #»x ) ∣∣∣∣∣∣γ〉i] (ηG) z : G( #»C ) ≺ηG µ ( # » O # » Y :l i [ #»x , #»α ]. 〈 z ∣∣∣∣∣∣O # »Y :li [ #»x , #»α ]〉i ) FIGURE 6.5. The βη laws for higher-order data and co-data types. ∃k above are: c : ( Γ `Θ,X:kG α : A X,∆ ) Γ `ΘG µ(X : k @ α.c) : ∀k(A) | ∆ ∀Rk Θ `G B : k Γ | e : A B `ΘG ∆ Γ | B @ e : ∀k(A) `ΘG ∆ ∀Lk Θ `G B : k Γ `ΘG v : A B | ∆ Γ `ΘG B @ v : ∃k(A) | ∆ ∃Rk c : ( Γ, x : A X `Θ,X:kG ∆ ) Γ | µ˜[X : k @ x.c] : ∃k(A) `ΘG ∆ ∃Lk Other than this addition, the rules are the same as before in Section 5.4. Thus concluding the static semantics of the higher-order parametric µµ˜-calculus, we must also consider how the extension affects the dynamic semantics. The short answer is: not much. In general, the types contained in structures must be substituted for the type variables bound by patterns during pattern-matching, but this does not significantly alter the behavior of a program. More specifically, the core µS µ˜Sηµηµ˜ theory of substitution does not change at all, since the form of input and output abstractions remain the same, the typed βη theory of (co-)data accounts for the presence of types in programs as shown in Figure 6.5, where the connectives F and G are declared in G as in Figure 6.3, and likewise the untyped βς theory of (co-)data is extended as shown in Figure 6.6. We must also extend the inference rules from Figure 5.18 for checking that expressions are well-kinded so that we know which substitution strategy to use when mixing several within a program, as shown in Figure 6.7, by just ignoring the additional type annotations on (co-)data structures. Likewise, the definitions of particular substitution strategies, like V , N , LV , and LN , are only changed by annotating structures and patterns with types and type variables, and otherwise exactly the same as their definitions in Chapter V. 217 (βS) 〈 K #» C ( #»E, #»V ) ∣∣∣∣∣∣µ˜[· · · | K # »Y :l( #»α , #»x ).c | · · ·]〉 βS c { # »C/Y , # »E/α, # »V/x} (βS) 〈 µ ( · · · | O # » Y :l( #»x , #»α ).c | · · · )∣∣∣∣∣∣O #»C ( #»E, #»V )〉 βS c { # »C/Y , # »V/x, # »E/α} (ςS) K #» C ( #»E, e′, #»e , #»v ) ςS µα. 〈 µβ. 〈 K #» C ( #»E, β, #»e , #»v ) ∣∣∣∣∣∣α〉∣∣∣∣∣∣e′〉 (ςS) K #» C ( #»E, #»V , v′, #»v ) ςS µα. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈K #»C ( #»E, #»V , y, #»v )∣∣∣∣∣∣α〉〉 (ςS) O #» C ( #»V , v′, #»v , #»e ) ςS µ˜x. 〈 v′ ∣∣∣∣∣∣µ˜y. 〈x∣∣∣∣∣∣O #»C ( #»V , y, #»v , #»e )〉〉 (ςS) O #» C ( #»V , #»E, e′, #»e ) ςS µ˜x. 〈 µβ. 〈 x ∣∣∣∣∣∣O #»C ( #»V , #»E, β, #»e )〉∣∣∣∣∣∣e′〉  v′ /∈ ValueS e′ /∈ CoValueS x,y, α, β fresh FIGURE 6.6. The parametric βSςS laws for arbitrary higher-order data and co-data. Given data F( # »X : k) : Swhere # » Ki : ( # » Aij : Tijj ` # » Y :li F( #»X ) | # »Bij : Rijj )i ∈ G, we have: # » Γ′j | e :: Rij `G ∆′j j # » Γj | v :: Tij `G ∆j j #»Γj j , #» Γ′j j `G K #» C i ( #»e , #»v ) :: S | # »∆j j , # » ∆′j j FRKi # » ci :: ( Γ, # »xi :: Tij `G # » αi :: Rij ,∆ )i Γ | µ˜ [ # » K # » Y :li i ( #»αi , #»xi).ci i ] :: S `G ∆ FL Given codataG( # »X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # » Y :li # »Bij : Rijj )i ∈ G, we have: # » ci :: ( Γ, # »xi :: Tij `G # » αi :: Rij ,∆ )i Γ `G µ ( # » O # » Y :li i [ #»xi , #»αi ].ci i ) :: S | ∆ GR # » Γj | v :: Tij `G ∆j j # » Γ′j | e :: Rij `G ∆′j j #»Γj j , #» Γ′j j | O #»Ci [ #»v , #»e ] :: S `G # »∆j j , # » ∆′j j GLOi FIGURE 6.7. Type-agnostic kind system for higher-order multi-kinded (co-)data. 218 Well-Founded Recursion Principles There is one fundamental difficulty in ensuring termination for programs written in a sequent calculus style: even incredibly simple programs perform their structural recursion from within some larger overall structure. For example, consider the humble length function from Example 6.1. The decreasing component in the definition of length is clearly the list argument which gets smaller with each call. However, in the sequent calculus, the actual recursive invocation of length is the entire call-stack. This is because the recursive call to length does not return to its original caller, but to some place new. When written in a functional style, this information is implicit since the recursive call to length is not a tail-call, but rather S(length xs). When written in a sequent style, this extra information becomes an explicit part of the function call structure, necessary to remember to increment the output of the function before ultimately returning. This means that we must carry around enough memory to store our ever increasing result amidst our ever decreasing recursion. Establishing termination for sequent calculus therefore requires a more finely controlled language for specifying “what’s getting smaller” in a recursive program, pointing out where the decreasing measure is hidden within recursive invocations. For this purpose, we adopt a type-based approach to termination checking (Abel, 2006). Besides allowing us to abstract over termination-ensuring measures, we can also specify which parts of a complex type are used as part of the termination argument. As a consequence for handling simplistic functions like length, we will find that, for free, the calculus ends up as a robust language for describing more advanced recursion over structures, including lexicographic and mutual recursion over both data and co-data structures simultaneously. In considering the type-based approach to termination in the sequent calculus, we identify two different styles for the type-level measure indices. The first is an exacting notion of index with a predictable structure matching the natural numbers and which we use to perform primitive recursion. This style of indexing gives us a tight control over the size of structures and depend on the specific structure of the index in the style of GADTs, allowing us to define types like the fixed-sized vectors of values from dependently typed languages as well as a direct encoding of “infinite” structures as found in lazy functional languages. The second is a looser notion that only tracks the upper bound of indices and which we use to perform noetherian recursion. This style of indexing is more in tune with typical structurally recursive programs like length 219 and also supports full run-time erasure of bounded indices while still maintaining termination of the index-erased programs. Primitive Recursion We begin with the seemingly more basic of the two recursion schemes: primitive recursion on a single natural number index. These natural number indices are used in types in two different ways. First, the indices act as an explicit measure in recursively defined (co-)data types, tracking the recursive sub-components of their structures in the types themselves. Second, the indices are abstracted over by the primitive recursion principle, allowing us to generalize over arbitrary indices and write looping programs. For simplicity, we will limit ourselves to a single arbitrary base kind S in the discussion to follow, although using multiple different ones is still admissible. Let’s consider some examples of using natural number indices for the purpose of defining (co-)data types with recursive structures. We extend the higher-order (co-)type declaration mechanism from Section 6.2 with the ability to define new (co-)data types by primitive recursion over an index, giving a mechanism for describing recursive (co-)data types with statically tracked measures. Essentially, the constructors are given in two groups—the constructors for the zero case and the constructors for the successor case—and may only contain recursive sub-components at the (strictly) previous index. For example, we may describe vectors of exactly N values of type A, Vec(N,A), as in dependently typed languages: dataVec(i : Ix, X : S) : S by primitive recursion on i where i = 0 Nil : ` Vec(0, X) | where i = j + 1 Cons : X : S,Vec(j,X) : S ` Vec(j + 1, X) | where Ix is the kind of type-level natural number indices. Nil builds an empty vector of type Vec(0, A), and Cons(v, v′) extends the vector v′ : Vec(N,A) with another element v : A, giving us a vector with one more element of type Vec(N + 1, A). These terms are typed by the right rules for Vec: Γ `ΘG Nil : Vec(0, A) | ∆ VecRNil Γ `ΘG v : A | ∆ Γ′ `ΘG v : Vec(M,A) | ∆′ Γ′,Γ `ΘG Cons(v, v′) : Vec(M + 1, A) | ∆′,∆ VecRCons 220 Other than these restrictions on the instantiations of i : Ix for vectors constructed by Nil and Cons, the typing rules for terms of Vec(N,A) follow the normal pattern for declared data types.4 Destructing a vector diverges more from the usual pattern of non-recursive data types. Since the constructors of vector values are put in two separate groups, we have two separate case abstractions to consider, depending on whether the vector is empty or not. On the one hand, to destruct an empty vector, we only have to handle the case for Nil, as given by the co-term µ˜[Nil.c]. On the other, destructing a non-empty vector requires us to handle the Cons case, as given by the co-term µ˜[Cons(x, xs).c]. These co-terms are typed by the two left rules for Vec—one for both its zero and successor instances: c : ( Γ `ΘG ∆ ) Γ | µ˜[Nil.c] : Vec(0, A) `ΘG ∆ VecL0 c : ( Γ, x : A, xs : Vec(M,A) `ΘG ∆ ) Γ | µ˜[Cons(x, xs).c] : Vec(M + 1, A) `ΘG ∆ VecL+1 As a similar example, we can define a less statically constrained list type by primitive recursion. The IxList indexed data type is just like Vec, except that the Nil constructor is available at both the zero and successor cases: data IxList(i : Ix, X : S)by primitive recursion on i where i = 0 Nil : ` IxList(0, X) | where i = j + 1 Nil : ` IxList(j + 1, X) | Cons : X : S, IxList(j,X) : S ` IxList(j + 1, X) | Now, destructing a non-zero IxList(N + 1, A) requires both cases, as given in the co-term µ˜[Nil.c | Cons(x, xs).c′]. IxList has three right rules for building terms: for Nil at both 0 and M + 1 and for Cons: Γ `ΘG Nil : IxList(0, A) | ∆ IxListRNil 0 Γ `ΘG Nil : IxList(M + 1, A) | ∆ IxListRNil+1 Γ `ΘG v : A | ∆ Γ′ `ΘG v : IxList(M,A) | ∆′ Γ′,Γ `ΘG Cons(v, v′) : IxList(M + 1, A) | ∆′,∆ IxListRCons 4We can have a vector with an abstract index if we don’t yet know what shape it has, as with the variable x or abstraction µα.c of type Vec(i, A). 221 It also has two left rules: one for case abstractions handling the constructors of the 0 case and another for the M + 1 case: c : ( Γ `ΘG ∆ ) Γ | µ˜[Nil.c] : IxList(0, A) `ΘG ∆ IxListL0 c0 : ( Γ `ΘG ∆ ) c1 : ( Γ, x : A, xs : Vec(M,A) `ΘG ∆ ) Γ | µ˜[Nil.c0Cons(x, xs).c1] : IxList(M + 1, A) `ΘG ∆ IxListL+1 To write looping programs over these indexed recursive types, we use a recursion scheme which abstracts over the index occurring anywhere within an arbitrary type. As the types themselves are defined by primitive recursion over a natural number, the recursive structure of programs will also follow the same pattern. The trick then is to embody the primitive induction principle for proving a proposition P over natural numbers: P [0] ∧ (∀j : N.P [j]→ P [j + 1])→ (∀i : N.P [i]) and likewise the refutation of such a statement, as is given by any specific counter- example—n : N ∧ P [n] → (∀i : N.P [i])—into logical rules of the sequent calculus.5 Recall from the reading of sequents in Chapter III, proofs come to the right of entailment (` A means “A is true”), whereas refutations come to the left (A ` means “A is false”). Because we will have several recursion principles, we denote this particular one as ∀ quantification over Ix, ∀Ix, so that the primitive recursive proposition ∀i : N.P [i] on natural numbers corresponds to the type ∀i : Ix.A which is shorthand for ∀Ix(λi : Ix.A) with the following inference rules: ` A 0 A j `j:Ix A (j + 1) ` ∀Ix(A) `M : Ix A M ` ∀Ix(A) ` We use this translation of primitive induction into logical rules as the basis for our primitive recursive co-data type. The refutation of primitive recursion is given as a specific counter-example, so the co-term is a specific construction. Whereas, proof by primitive recursion is a process given by cases, the term performs case analysis over its observations. The canonical counter-example is described by the co-data type 5We use the overbar notation, P , to denote that the proposition P is false. The use of this notation is to emphasize that we are not talking about negation as a logical connective, but rather the dual to a proof that P is true, which is a refutation of P demonstrating that it is false. 222 declaration for ∀Ix: codata ∀Ix(X : Ix → S) : Swhere @ : ( | ∀Ix(X) `j:Ix X j : S ) Notice that this is exactly the same co-data definition of ∀ quantification from Section 6.2 except that the generic kind k has been specialized to Ix. Therefore, the general mechanism for co-data automatically generates the same left rule for constructing the counter-example, and a right rule for extracting the parts of this construction. However, to give a recursive process for ∀Ix, we need an additional right rule that gives us access to the recursive argument by performing case analysis on the particular index. This scheme for primitive recursion is expressed by the term µ(0:Ix @ α.c0 | j + 1:Ix @x α.c1) which performs case analysis on type-level indices at run-time, and which can access the recursive result through the extra variable x in the successor pattern j + 1:Ix @x α. This term has the typing rule: c0 : ( Γ `ΘG α : A 0,∆ ) c1 : ( Γ, x : A j `Θ,j:ixG α : A (j + 1),∆ ) Γ `ΘG µ(0:Ix @ α.c0 | j + 1:Ix @x α.c1) : ∀Ix(A) | ∆ ∀IxRrec Note that this extension of the ∀Ix connective is allowed by the pragmatist view of co-data types: the observations of a co-data type are fixed up front, but the terms can be “whatever works” with respect to those observations. Terms of type ∀i:Ix.A describe a process which is able to produce A {N/i}, for any index N , by stepwise producing A {0/i}, A {1/i}, . . . , A {N/i} and piping the previous output to the recursive input x of the next step, thus “inflating” the index in the result arbitrarily high. In essence, this follows the interface of an infinitary & (an additive conjunction) of the form A {0/i} & A {1/i} & A {2/i} & . . . . The index of the particular step being handled is part of the observer pattern, so that the recursive case abstraction knows which branch to take. In contrast, co-terms of type ∀i:Ix.A hide the particular index at which they can consume an input, thereby forcing their input to work for any index. By just applying duality in the sequent calculus and flipping everything about the turnstyles, we get the opposite notion of primitive recursion as a data type. In particular, we get the data declaration describing a dual type, named ∃Ix: data ∃Ix(X : Ix → S)where @ : ( X j : S `j:Ix ∃Ix(X) | ) 223 Again, note that this data declaration is just the Ix instance of the general ∃ quantifier from Section 6.2. The general mechanism for data automatically generates the right rule for constructing an index-witnessed example case, and a left rule for extracting the index and value from this structure. Further, as before we need an additional left rule for performing self-referential recursion for consuming such a construction: c0 : ( Γ, x : A 0 `ΘG ∆ ) c1 : ( Γ, x : A (j + 1) `Θ,j:IxG α : A j,∆ ) Γ | µ˜[0:Ix @ x.c0 | j + 1:Ix @α x.c1] : ∃Ix(A) `ΘG ∆ ∃IxLrec This extension of the ∃Ix connective is allowed by the verificationist view of data types: the constructions of a data type are fixed up front, but the co-terms can be “whatever works” with respect to those constructions. Dual to before, the recursive output sink can be accessed through the extra co-variable α in the pattern j + 1:Ix @α x. The terms of type ∃Ixi:Ix.A hide the particular index at which they produce an output. In contrast, it is now the co-terms of the type ∃Ixi:Ix.A which describe a process which is able to consume A {N/i} for any choice of N in steps by consuming A {N/i}, . . . , A {0/i} and piping the previous input to the recursive output α of the next step, thus “deflating” the index in the input down to 0. In essence, this follows the interface of an infinitary ⊕ (an additive disjunction) of the form A {0/i} ⊕ A {1/i} ⊕ A {2/i} ⊕ . . . . Noetherian Recursion We now consider the more complex of the two recursion schemes: noetherian recursion over well-ordered indices. As opposed to ensuring a decreasing measure by matching on the specific structure of the index, we will instead quantify over arbitrary indices that are less than the current one. In other words, the details of what these indices look like are not important. Instead, they are used as arbitrary upper bounds in an ever decreasing chain, which stops when we run out of possible indices below our current one as guaranteed by the well-foundedness of their ordering. Intuitively, we may jump by leaps and bounds down the chain, until we run out of places to move. Qualitatively, this different approach to recursion measures allows us to abstract parametrically over the index, and generalize so strongly over the difference in the steps to the point where the particular chosen index is unknown. Thus, because a process receiving a bounded index has so little knowledge of what it looks like, the 224 index cannot influence its action, thereby allowing us to totally erase bounded indices during run-time. Now let’s see how to define some types by noetherian recursion on an ordered index. Unlike primitive recursion, we do not need to consider the possible cases for the chosen index. Instead, we quantify over any index which is less than the given one. For example, recall the recursive definition of the Nat data type from Example 6.1. We can be more explicit about tracking the recursive sub-structure of the constructors by indexing Nat with some ordered type, and ensuring that each recursive instance of Nat has a smaller index, so that we may define natural numbers by noetherian recursion over ordered indices from a new kind called Ord: dataNat(i : Ord)by noetherian recursion on iwhere Z : ` Nat(i) | S : Nat(j) `j>-closure Pitts (2000). The basic idea is to capture safety as a binary predicate (‚) on two opposite entities: answers and questions. We can pose the two dual problems: “which questions are safe to ask about these answers?” and “which answers are safe to give to these questions?” This style of formulation matches perfectly with the language of the sequent calculus. Terms are answers, co-terms are questions, and commands are the action of asking a particular question about a particular answer. The ‚ predicate represents a collection of commands that are safe to run. The safety properties of types are then modeled by a collection of answers (i.e. terms) and questions (i.e. co-terms) where every possible questions-answers combination is safe (i.e. a command in ‚). The orthogonality approach gives a heavily test-based view of language properties, where we use test suits of canonical observations to carve out a space of valid programs that pass the test for each of those observations, or alternative a specification of obviously correct results to carve out a space of valid 244 use-cases. The magic of this approach is that we quickly reach a fixed point: after flipping back and forth between questions and answers with orthogonality twice, we learn everything we possibly can. This chapter covers the following topics: – A general introduction to the idea of orthogonality in an abstract setting of spaces and poles (Section 7.1) that explores the connection between orthogonality and negation in intuitionistic logic. – A representation of types based on orthogonality that are oriented around either a positive or negative bias (Section 7.2), and a generic presentation of the closure under expansion property, which is pervasive to semantic models of programming languages, which is appropriate for many different applications. – A binary model of the parametric µµ˜-calculus with higher-order and recursive types that interprets sequents as statements about program behavior (Section 7.3), which is parameterized by a choice of (co-)data types, evaluation strategies, and safety condition. – A proof of the fundamental adequacy lemmas (Section 7.4): the existence of a syntactic derivation of a sequent implies the truth of the semantic interpretation of that sequent. – Several applications of the model, using adequacy to prove language-wide facts about the parametric µµ˜-calculus (Section 7.5), including: logical consistency, type safety, strong normalization, and soundness of the extensional βη theory with respect to the operational βς semantics. Poles, Spaces, and Orthogonality We’re going to look at a semantic model of understanding computation in the sequent calculus in terms of orthogonality. The model hinges on a representation of commands of the sequent calculus that we deem to be valid execution states for our purposes. In other words, we isolate some form of commands that can run. We represent such a set of runnable commands abstractly as a computational pole which is any set capable of running with a computation relation . This way, the model is extensible and does not pin down the precise nature of commands ahead of time. 245 Definition 7.1 (Computational poles). A computational pole P (or just pole for short) is any set equipped with a relation between elements of P. In addition, the terms and co-terms of the sequent calculus are represented as an interaction space with a positive and negative side oriented around some pole, which likewise abstracts over their precise form. Definition 7.2 (Interaction spaces). Given any computational pole P, a P-interaction space A (or just P-space for short) is a pair of sets (A+,A−) equipped with a cut operation 〈 || 〉 : A+ → A− → P (i.e. for all v ∈ A+ and e ∈ A−, 〈v||e〉 ∈ P). We call P the pole of A, A+ the positive side of A, A− the negative side of A, and use the shorthand v ∈ A to denote v ∈ A+ and e ∈ A to denote e ∈ A−. Note that, while spaces and poles are quite abstract, we can always substitute the more concrete syntactic notions of the language of the sequent calculus for better intuition. For example, consider the (single-kinded) parametric µµ˜-calculus from Chapter V. The set of untyped commands from Figure 5.7, Command, is a perfectly fine computational pole, since the untyped operational reductions 7→µS µ˜SβSςS serve as a computational relation on commands. Likewise, the sets of untyped terms and co-terms from Figure 5.7, (Term,CoTerm), is a perfectly fine Command-interaction space, since we have the syntactic cut 〈v||e〉 formation of commands. This follows from the intuition that commands are the primary computational entities of the sequent calculus, whereas terms and co-terms provide a space for possible interactions (via cuts) that lead to computations. We could just as well limit our attention to closed programs (those without any free variables) as does Munch-Maccagnoni (2009) by considering the set of closed, untyped commands as a pole along with the sets of closed, untyped (co-)terms as an interaction space. Furthermore, if we are instead interested in strong normalization, then we can start with the sets of all strongly normalizing, untyped (co-)terms as an all-encompassing interaction space. Therefore, the appropriate space required for modeling programs really depends on what sort of outcome we are looking to achieve. Since interaction spaces are just a pair of sets, we can compare when one interaction space is contained inside another by considering the two pointwise: all the positive and negative elements of the contained space must also be positive and negative elements of the other space, respectively. 246 Definition 7.3 (Containment). Given two P-spaces A = (A+,A−) and B = (B+,B−), we say that A is inside B (written A v B) if and only if A+ ⊆ B+ and A− ⊆ B−. Equivalently, we say that B contains A (written B w A) if and only if B+ ⊇ A+ and B− ⊇ A−. Containment lets us specify when one interaction space is made up of parts of another. For example, the set of terms and co-terms of type A→ B is inside the set of untyped terms and co-terms, since every typed (co-)term is also an untyped (co-)term, but not vice versa. This relationship is important for setting up a large, encompassing space as an area of interest, wherein lie many smaller sub-spaces of interest. We are now ready to tackle the most fundamental operation on interaction spaces: orthogonality. Intuitively, orthogonality lets us pare down a large interaction space which may include some undesired interactions by selecting only the parts which pass some chosen criteria. To start with, we begin with some “plausibly well-behaved” but overly-permissive computational pole P and P-interaction space A which includes every interaction and computational behavior we might be interested in observing, but also may allow for undesired interactions and behaviors. From there, we select a sub-pole Q of P that serves as a safety condition and only includes the desired computational behavior that we are interested in, along with a sub-P-space C contained in A which serves as a specification laying out a set of criteria for evaluating the safety of elements in A. Together, Q and C can be seen as a test suite for performing quality control and determining which elements of A are acceptable: each positive element of A (intuitively, untested programs) must pass the Q test when paired with every negative element of C (intuitively, vetted use-cases), and dually each negative element of A (intuitively, untested use-cases) must pass the Q test when paired with every positive element of C (intuitively, vetted programs).1 Definition 7.4 (Orthogonality). Let P be a pole, Q ⊆ P be a sub-pole of P, A = (A+,A−) be a P-space, and C = (C+,C−) v A be a P-space inside A. The positive Q-orthogonal of C− inside A+, written C QA+ − , consists of the positive elements of A that form a Q element when cut with every negative element of C and is defined as: C QA+ − , {v ∈ A+ | ∀e ∈ C−, 〈v||e〉 ∈ Q} 1Traditionally, these operations are referred to as either C⊥ or C>, but here we use the generalized notation CQA which lets us vary both the safety condition Q as well as the encompassing space A of all potential programs in consideration. 247 Dually, the negative Q-orthogonal of C+ inside A−, written C QA− + , consists of all negative elements of A that form a Q element when cut with every positive element of C and is defined as: C QA− + , {e ∈ A− | ∀v ∈ C+, 〈v||e〉 ∈ Q} Taken together, the Q-orthogonal complement of C inside A, written CQA , is the Q- space given by both the positive and negative Q-orthogonals of C inside A: (C+,C−)Q(A+,A−) , (C QA+ − ,C QA− + ) Example 7.1. For example, suppose we are trying to reason about the execution of well-typed programs. In other words, we want to model type safety of the operational semantics. For an all-encompassing interaction space, we can consider all untyped (co-)terms U = (Term,CoTerm), which is centered around the pole Command containing all untyped commands. We would then need to design a pole that is a subset of all untyped commands, ‚ ⊆ Command, representing type safety to contain all valid states of type-safe execution that eventually leads to an acceptable result, and excludes stuck states that are caused by type errors. For example, ‚ would not include commands like 〈True||1 · []〉, 〈µ(x · α.c)||µ˜[(x, y) .c]〉, 〈ι1 (1)||µ˜[(x, y) .c]〉, and 〈µ(x · α.c)||pi1 [β]〉, since they are all stuck on irrecoverable miscommunications like missing case analysis or data/co-data mismatches. Instead, ‚ would include valid states where we may not have enough information to take the next step, but execution could potentially continue if we learn more. These would be states where we are stuck on a free variable, like 〈f ||1 · []〉 or 〈z||µ˜[(x, y) .c]〉, or on a free co-variable, like 〈True||α〉 or 〈µ(x · α.c)||β〉, and correspond to the “final commands” from Chapters III and IV. To complete type safety pole‚, we should also ensure that a command that eventually reaches a valid state in some number of steps is also valid. That is, if c′ is in ‚ and c 7→ c′ then c is also in ‚. This is commonly referred to as “closure under expansion” and is found in similar models of program evaluation. Now, we can consider what the orthogonality operations mean for the above description of our choice of the safety pole ‚ beginning with every (co-)term in the all-encompassing Command-space U. For instance, the negative orthogonal {()}‚U− selects every co-term that runs with the term (). This would include co-terms like µ˜[() .c] and µ˜ .c for commands c that are in ‚, because they both reduce to the 248 safe state c in one step. However, {()}‚U− would not include co-terms like 1 · [] or µ˜[ι1 (x) .c | ι2 (y) .c′] since the commands 〈()||1 · []〉 and 〈()||µ˜[ι1 (x) .c | ι2 (y) .c′]〉 are stuck on an irrecoverable type error which is excluded from ‚. As another example, {}‚U− would instead select every co-term, since the condition ∀v ∈ {}, 〈v||e〉 ∈ ‚ is vacuously true for any e. Note that this fact about {}⊥ (or {}‚U+ ) holds regardless of the definition of ‚, so that the ‚-orthogonal complement of the empty space (∅, ∅) inside U always gives back all of U, for any ‚ and U. End example 7.1. While we often have a particular purpose in mind (like the above example of type-safe execution), we can temporarily ignore the particular details and just leave the safety ‚ abstract for the time being. As we will see, the nature of orthogonality itself already gives us some interesting structure independent of our choices, without knowing anything about the particularities of terms and co-terms. Orthogonality and intuitionistic negation As an operation on interaction spaces, orthogonality has some inherently negating behavior: it selects a collection of positive elements (terms) with respect to a collection of negative elements (co-terms), and vice versa. We will see that this simple intuition reveals a fundamental connection between the orthogonality of interaction spaces and the negation connective in intuitionistic logic. As it turns out, basic properties of intuitionistic negation, both from a logical and computational perspective, are shared with the orthogonality operation. Furthermore, classical but non-intuitionistic properties of negation are invalid for orthogonality. Recall from Chapter II that in the intuitionistic logic of natural deduction, negation can be encoded in terms of implication and falsehood: ¬A = A→ ⊥. This encoding of negation is summarized by the following two derivations for ¬ introduction and elimination that are derived from the rules for ⊃ and ⊥: A x ....⊥ ¬A ¬Ix ¬A A ⊥ ¬E Using the above derived rules for negation, we can give some schematic proofs involving negation and implication that hold in intuitionistic logic. For example, we 249 have the contrapositive of an implication, (A ⊃ B) ⊃ (¬B ⊃ ¬A), ¬B k A ⊃ B f A x B ⊃E ⊥ ¬E ¬A ¬Ix (¬B) ⊃ (¬A) ⊃Ik (A ⊃ B) ⊃ ((¬B) ⊃ (¬A)) ⊃If double negation introduction, A ⊃ (¬¬A), ¬A k A x ⊥ ¬E ¬¬A ¬Ik A ⊃ (¬¬A) ⊃Ix and triple negation elimination, (¬¬¬A) ⊃ (¬A), ¬¬¬A w ¬A k A x ⊥ ¬E ¬¬A ¬Ik ⊥ ¬E ¬A ¬Ix (¬¬¬A) ⊃ (¬A) ⊃Ih Furthermore, each of these proofs can also be written as a corresponding term in the simply-typed λ-calculus as follows: Contra : (A→ B)→ (¬B → ¬A) Contra = λf : A→ B.λk : ¬B.λx : A.k (f x) DNI : A→ ¬¬A DNI = λx : A.λk : ¬A.k x TNE : ¬¬¬A→ ¬A TNE = λh : ¬¬¬A.λx : A.h (λk : ¬A.k x) Remark 7.1. The three terms Contra, DNI , and TNE have an important status for pure functional programming in languages like Haskell. In particular, they give us a definition of the continuation monad over the return type ⊥, Cont A = ¬¬A. 250 Double negation introduction,DNI , is the return (a.k.a. unit) function. Triple negation elimination, TNE , is the join function from Cont (Cont A) → Cont A with a more general type. And Contra is the contravariant mapping function for the underlying ¬ functor. We can get the Functor mapping function fmap by Contra-mapping a function twice, fmap f = Contra (Contra f). End remark 7.1. As it turns out, these three properties of contrapositive mapping, double negation introduction, and triple negation elimination correspond to similar properties of orthogonality. In particular, the orthogonal complement of an interaction space takes on the role of negation, and the containment relation takes on the role of implication. With this correspondence in mind, we get the following three well-known intuitionistic orthogonality properties: Property 7.1 (Intuitionistic orthogonality). For any two poles Q ⊆ P and P-spaces A, B, and C, a) contrapositive: A v B implies BQC v AQC , b) double orthogonal introduction: A v C implies A v AQCQC , and c) triple orthogonal elimination: A v C implies AQCQCQC = AQC . Proof. a) Suppose that v ∈ BQC , so that by the definition of orthogonality, we know that v ∈ C and 〈v||e〉 ∈ Q for all ∈ B. But since A is contained in B, it follows that 〈v||e〉 ∈ Q for all ∈ A, meaning that v ∈ AQC as well. Dually, e ∈ BQC implies that e ∈ AQC by the definition of orthogonality and the fact that A is contained in B. Therefore, BQC v AQC follows from A v B. b) Suppose that v ∈ A and e ∈ AQC . Since A v C it must be that v ∈ C, and by the definition of orthogonality, it must also be that 〈v||e〉 ∈ Q. But this also means that v ∈ AQCQC by the definition of orthogonality as well. Dually, given any e ∈ A, we also have that e ∈ C and 〈v||e〉 ∈ Q for all v ∈ AQC , meaning that e ∈ AQCQC as well. Therefore, A v AQCQC follows from A v C. c) First, we get the fact thatAQC v AQCQCQC as an immediate consequence of double orthogonal introduction (Property 7.1 (b)) because AQC v C by definition of orthogonality. Second, we get A v AQCQC from double orthogonal introduction (Property 7.1 (b)) again, from which AQCQCQC v AQC follows by contrapositive (Property 7.1 (a)). Therefore, AQCQCQC = AQC follows from A v C. 251 It is important to point out that when demonstrating the above three properties, we never needed to know anything specific about the makeup of the computational poles Q and P or the interaction spaces A, B, or B. No matter what choices me make, we get to use these intuitionistic reasoning principles when working with orthogonality. These are well-known properties of orthogonality (also noted by Munch-Maccagnoni (2009), for example). Example 7.2. Recall from Remark 3.5 that one difference between negation in intuitionistic logic versus classical logic is that double negation elimination, i.e. (¬¬A)→ A, is not assumed to hold generically for any A in the intuitionistic setting. To see why “double orthogonal elimination”, i.e. AQCQC v A, does not hold in general, let’s return to our example of type-safe execution from Example 7.1. For the moment, let’s assume a call-by-value V evaluation strategy, so that every co-term is a co-value and thus 〈µα.c||e〉 7→µV c {e/α} for any e. Recall that the orthogonal of the empty interaction space, (∅, ∅)‚U , is U. Now, suppose that the command c is in the type-safe pole ‚. Notice that 〈µ .c||e〉 7→ c, so that for an arbitrary co-term e, the command 〈µ .c||e〉 reduces in one step to a command in ‚. This means that the term µ .c must be in the double-orthogonal of the empty space, µ .c ∈ (∅, ∅)‚U‚U . But this also means that we’ve run into a situation where the double orthogonal of an interaction space (namely the empty one) includes elements that weren’t originally there. Therefore, in general we can’t say that the double orthogonal gives back the same space that we started with. Since taking the double orthogonal of a set of an interaction space can introduce new elements, we can view it as a closure operation. Furthermore, since taking the orthogonal thrice gives the same thing as just once (Property 7.1 (c)), flipping back and forth more than twice in this way is redundant: AQCQCQC = AQC and AQCQCQCQC = AQCQC , so only A, AQC and AQCQC are interesting. In this regard, AQCQC can be seen as the completion of A with respect to the possible candidates in C and the criteria imposed by the pole Q. End example 7.2. By adding more connectives into the mix, like conjunction (∧) and disjunction (∨) from Chapter II, we get additional properties of intuitionistic negation. In particular, we have the de Morgan law—used as the backbone of logical duality in Chapter III— that allows us to distribute negation over conjunction in both directions: (¬(A∨B))↔ ((¬A) ∧ (¬B)). This law is provable with the rules of NJ natural deduction as two 252 implications: ¬(A ∨B) k A x A ∨B ∨I1 ⊥ ¬E ¬A ¬I x ¬(A ∨B) k B y A ∨B ∨I2 ⊥ ¬E ¬B ¬Iy (¬A) ∧ (¬B) ∧I (¬(A ∨B)) ⊃ ((¬A) ∧ (¬B)) ⊃Ik A ∨B x (¬A) ∧ (¬B) k ¬A ∧E1 A y ⊥ ¬E ⊥ (¬A) ∧ (¬B) k ¬B ∧E2 B z ⊥ ¬E ⊥ ⊥ ∨Ey,z ⊥ ¬Ix ((¬A) ∧ (¬B)) ⊃ (¬(A ∨B)) ⊃Ik We can also write down the terms corresponding to the above proofs in the simply- typed λ-calculus from Section 2.2, expressing the above de Morgan law as two functions: PairNeg : ((¬A)× (¬B))→ (¬(A+B)) PairNeg = λk.λx. casexof ι1 (y)⇒ pi1(k) y | ι2 (z)⇒ pi2(k) z NegSum : (¬(A+B))→ ((¬A)× (¬B)) NegSum = λk.((λx.k (ι1 (x))), (λy.k (ι2 (y)))) There is another de Morgan law used for logical duality in Section 3.1 for distributing a negation over a conjunction in both directions: (¬(A ∧ B)) ↔ ((¬A)∨ (¬B)). However, in an intuitionistic setting, this law does not hold both ways. In particular, we can only assume that the right-to-left direction of this law holds in general: (¬(A ∧ B)) ← ((¬A) ∨ (¬B)). This implication is provable in intuitionistic 253 natural deduction: (¬A) ∨ (¬B) k ¬A q A ∧B x A ∧E1 ⊥ ¬E ¬B r A ∧B x B ∧E2 ⊥ ¬E ⊥ ⊥ ∨Eq,r ¬(A ∧B) ⊃Ix ((¬A) ∨ (¬B)) ⊃ (¬(A ∧B)) ⊃Ik And we also have simplty-typed λ-calculus function that corresponds to the one direction of the law. SumNeg : ((¬A) + (¬B))→ (¬(A×B)) SumNeg = λk.λx. case k of ι1 (q)⇒ q (pi1(x)) | ι2 (r)⇒ r (pi2(x)) We are unable to write the inverse function, NegPair : (¬(A×B))→ ((¬A) + (¬B)), since we don’t know up front which of ¬A or ¬B to return in general. Just like before, these three de Morgan laws correspond to similar properties of orthogonality. The following union and intersection operations on interaction spaces take on the roles of conjunction and disjunction, and they enjoy similar introduction and elimination properties as in the natural deduction logic of NJ. Definition 7.5 (Union and intersection). Given two P-spaces A = (A+,A−) and B = (B+,B−), the union of A and B, written A unionsq B, and the intersection of A and B, written A u B is defined as: (A+,A−) unionsq (B+,B−) , (A+ ∪ B+,A− ∪ B−) (A+,A−) u (B+,B−) , (A+ ∩ B+,A− ∩ B−) Property 7.2 (Union/intersection introduction/elimination). For any P-spaces A, B, and C, a) A v A unionsq B and B v A unionsq B, b) A unionsq B v C if and only if A v C and B v C, c) A u B v A and A u B v B, and d) C v A u B if and only if C v A and C v B. 254 Proof. Each property follows from the definition of unionsq and u in terms of the underlying set union and intersection operations. Furthermore, when coupled with orthogonality, union and intersection give the following intuitionistic de Morgan orthogonality properties. Property 7.3 (Spacial de Morgan laws). For any poles Q ⊆ P and P-spaces A, B, and C, a) (A unionsq B)QC = AQC u BQC , and b) (A u B)QC w AQC unionsq BQC . Proof. a) First, we show that (AunionsqB)QC v AQC uBQC . Suppose that v ∈ (AunionsqB)QC , so that by definition v ∈ C and 〈v||e〉 ∈ Q for all e ∈ AunionsqB. By the definition of unionsq, it follows that 〈v||e〉 ∈ Q for all e ∈ A and 〈v||e〉 ∈ Q for all e ∈ B separately, so we have that both v ∈ AQC and v ∈ BQC . Thus, v ∈ AQC u BQC . Dually, every e ∈ (A unionsq B)QC also leads to e ∈ AQC u BQC for similar reasons. Second, we show that (A unionsq B)QC w AQC u BQC . Suppose that v ∈ AQC u BQC . By the definition of u, it follows that both v ∈ AQC and v ∈ BQC , meaning that v ∈ C, 〈v||e〉 ∈ Q for all e ∈ A, and 〈v||e〉 ∈ Q for all e ∈ B. Thus, 〈v||e〉 ∈ Q for all e ∈ A unionsq B, so v ∈ (A unionsq B)QC . Dually, every e ∈ AQC u BQC also leads to e ∈ (A unionsq B)QC for similar reasons. b) Suppose that v ∈ AQC unionsqBQC , so that by the definition of unionsq and orthogonality we know v ∈ C and either 〈v||e〉 ∈ Q for all e ∈ A or 〈v||e〉 ∈ Q for all e ∈ B. For any e ∈ AuB, it follows that e ∈ A and e ∈ B as well, so it must be that 〈v||e〉 ∈ Q. Thus, v ∈ (A u B)QC . Dually, every e ∈ AQC unionsq BQC also leads to e ∈ (A u B)QC for similar reasons. Again, take notice that the de Morgan properties of orthogonality don’t depend on what particular elements inhabit the computational poles or interaction spaces. They are general laws that come out from the definition of orthogonality and the other basic operations and relations on interaction spaces. Example 7.3. To see why the de Morgan Property 7.3 (b) does not go both ways like Property 7.3 (a), let’s return again to type-safe execution from Example 7.1. Now, suppose we begin with two sets of terms, A+ = {True, ()} and B+ = {False, ()}, so that their intersection is A+ ∩ B+ = {()}. The negative ‚-orthogonal of this 255 intersection in U is (A+ ∩ B+)‚U− = {()}‚U− = {e | 〈()||e〉 ∈ ‚}. By the definition of ‚ from Example 7.1, given a command c in ‚ we have that the co-term µ˜[() .c] runs with () because 〈()||µ˜[() .c]〉 7→ c, so µ˜[() .c] is in (A+ ∩ B+)‚U− . However, both 〈True||µ˜[() .c]〉 and 〈False||µ˜[() .c]〉 are not in ‚ since they are stuck on a type error. This means that the co-term µ˜[() .c] is in neither A‚U−+ nor A ‚U− + , since µ˜[() .c] fails to be type safe when run with some of the terms in A+ and B+. Therefore, we’ve stumbled onto a situation where a co-term is in the space orthogonal to an intersection, but does not come from the union of the separate orthogonal spaces. In other words, taking the orthogonal of an intersection between two sets of terms permits more possible co-terms than just forming the orthogonal sets in isolation and putting them together. End example 7.3. Computation, Worlds, and Types With the basic building blocks of computational poles, interaction spaces, and orthogonality at hand, we can now set the stage for constructing models of programming languages using these concepts. In particular, we will be modeling some safety condition of the language represented by a pole ‚, which in the context of the sequent calculus will contain commands exhibiting the desired property. While we have a lot of leeway in choosing ‚, it cannot be arbitrary, however. Because the purpose of the programming language is to compute, a safety condition must respect computation. For this reason, the safety condition is made up of three poles: – The “top” pole ‚, which is unsafe, and corresponds to everything that can be written, – The “bottom” pole ‚, which is the safe subset of ‚, and corresponds to only those programs which pass our criteria, and – The “middle” pole ‚‚, which is partially safe and lies between ‚and ‚. The purpose of ‚‚ is to act as a waypoint toward safety: the elements of ‚‚ are not quite safe yet, however, we have the assurance that all the elements of ‚‚ that step into ‚ are safe. This is commonly known as closure under expansion, and in our notation is written as: for all c ∈ ‚‚, if c c′ ∈ ‚ then c ∈ ‚. Definition 7.6 (Safety condition). A safety condition S is a triple of poles ( ‚, ‚‚,‚) such that ‚ ⊆ ‚‚ ⊆ ‚and the following condition holds: 256 – Closure under expansion: for all c ∈ ‚‚, if c c′ ∈ ‚ then c ∈ ‚. We call ‚the unsafe pole, ‚‚ the demisafe pole, and ‚ the safe pole of the safety condition. The fact that we are allowed to choose a ‚‚ which is smaller than ‚for the purposes of constraining closure under expansion is important for some applications, but not others. For instance, in the following Section 7.5 we can just take ‚‚ = ‚ for the goal of proving logical consistency and type safety. However, to show strong normalization it is crucial that we constrain ‚‚ to include only the commands which may not be strongly normalizing themselves, but are formed by cutting together strongly normalizing (co-)terms. This restriction on ‚‚ gives us a key foothold for demonstrating the closure under expansion which would be impossible otherwise. In order to model the semantic meaning of types in terms of a chosen safety condition, we must delineate the world in which they reside. A world containing semantic types is represented as an interaction space which holds every possible element of all the types we’re interested in. Therefore, it may allow for undesired interactions by mixing elements that belong to different types, but each inhabitant of the world should act as a well-behaved member of some potential type. To phrase this requirement, worlds are made up of three interaction spaces that represent the impact of substitution strategy in the programming language: – The “untyped” interaction space U corresponds to everything that can be written without any restrictions, – The “value” interaction space V is contained within U and corresponds to the values and co-values of a substitution strategy, and – The “well-behaved” interaction space W is contained within U and corresponds to the elements which pass the minimum criteria needed to be considered elegible for belonging to a type. The definition of a world in this sense makes heavy use of (co-)value restriction, since in many places computation can only proceed with (co-)values and not general (co-)terms. For this purpose, we need to know what it means to restrict one interaction space by another. 257 Definition 7.7 (Restriction). Given two P-spaces A and B, the B-restriction of A, written A|B, is their intersection A u B. Likewise, we write A|B = A ∩ B for the B-restriction of A when A and B are sets. Note that in the notation of restriction A|‚WV = (A u V)‚W , whereas A‚W ∣∣∣ V = A‚W u V. The semantic notion of worlds can now be defined in terms of two criteria: saturation which forces a sufficient amount of elements from U into W by stating that the portion of U which steps to a safe command with all “benign” elements is well-behaved, and generation which states that any interaction space contained in W has only safe interaction if all the interactions with (co-)values are safe. Definition 7.8 (Worlds). Given a safety condition S = ( ‚, ‚‚,‚), an S-world is a triple T = (U,V,W) where both U and V are ‚-spaces andW is a ‚‚-space such that V v U, W v U, and the following conditions hold: – Saturation: for all v ∈ U, if 〈v||E〉 c for some c ∈ ‚ and all E ∈ W|‚WV ∣∣∣ V then v ∈W. Dually, for all e ∈ U, if 〈V ||e〉 c for some c ∈ ‚ and all V ∈ W|‚WV ∣∣∣ V then e ∈W. In other words, W|‚WV ∣∣∣‚1U V vW where ‚1 = {c ∈ ‚| c c′ ∈ ‚}. – Generation: for all ‚‚-spaces A vW, if A = A|‚WV then A = A‚W . We call U the untyped ‚-space, V the value ‚-space, andW the well-behaved ‚‚-space. As shorthand, for any ‚-space A v U, we write V ∈ A to denote V ∈ A|V and E ∈ A to denote E ∈ A|V. Note that the generation property is a rephrasing of Munch-Maccagnoni’s (2009) generation lemma, where we take it as an assumption instead of proving that it holds for a particular setting. In a particular world T = (U,V,W), we can say that a semantic type is any space A contained in the well-behaved W where every positive element of A is safe when paired with every negative value element of A, and vice versa. Definition 7.9 (Semantic types). Given a safety condition S = ( ‚, ‚‚,‚) and S- world T = (U,V,W), a T-type is any ‚-space A such that A = A|‚WV . We denote the set of all T-types as SemType(T). Note that by the definition of semantic types, each one must contain some minimum amount of “benign” elements that belong to every type that lives in its world. These benign elements are safe when paired with any well-behaved member of the world, and so they never cause any problems. 258 Lemma 7.1 (Type minimum). For any S-world T = ( ‚, ‚‚,‚) and T-type A,W‚W v W|‚WV v A. Proof. First, note that W|V vW. Also, because A is a T-type, we know thatA = A|‚WV , so that by definition of orthogonality, A vW and thus A|V v W|V vW. Finally, by contrapositive (Property 7.1 (a)) we get W‚W v W|‚WV v A|‚WV = A. It sometimes happens that W‚W is empty, and if that’s the case then there is not necessarily anything in the positive or negative sides of a semantic type. However, in some applications, like strong normalization, we find that things like (co-)variables are benign, and can be safely assumed to inhabit every possible type. The existence of this minimum of every type lets us prove the type expansion property, which says that any “untyped” term which steps to a safe place for all co-values of a type must belong to that type, and vice versa. Lemma 7.2 (Type expansion). For any safety condition S = ( ‚, ‚‚,‚), S-world T = (U,V,W), and T-type A, 1. v ∈ A if 〈v||E〉 c ∈ ‚ for all E ∈ A|V, and 2. e ∈ A if 〈V ||e〉 c ∈ ‚ for all V ∈ A|V. Proof. 1. Note that W|‚WV v A by Lemma 7.1, and so W|‚WV ∣∣∣ V v A|V by monotonicity (Property 7.4 (a)), so that v ∈ W by saturation of T because 〈v||E〉 ‚ for all E ∈ W|‚WV ∣∣∣ V v A|V. Thus, for all E ∈ A|V, 〈v||E〉 ∈ ‚‚ since W is a ‚‚-space, and so 〈v||E〉 ∈ ‚ by closure under expansion of S. Therefore, v ∈ A|‚WV = A because A is a T-type. 2. Analogous to part 1 by duality. Note the two steps of this proof, which forms a general procedure of justifying the presence of elements in a type. First, we must justify that we are dealing with something generally well-behaved that exists in the ‚‚-space W. Only then can we use closure under expansion of the safety condition to show that it is also safe with every (co-)value of the type. 259 The positive construction of types We now consider two dual methods of constructing particular types inside of a world. The first is the positive method, which builds a type around a chosen set of values. In particular, given some world T = (U,V,W), where V = (V+,V−) and W = (W+,W−), and a chosen set of well-behaved value elements Acons ⊆ W+|V+ serving as the primitive constructions, we have the positive construction of the T-type PosT(Acons), defined as follows: PosT(Acons) , ( Acons,A ‚W− cons )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V To show that PosT(Acons) is actually a T-type, we need to demonstrate that PosT(Acons) = PosT(Acons)|‚WV . To do so, we rely on some facts about restriction, and how they generalize the basic properties of the orthogonality operation (Property 7.1). Property 7.4. For all P-spaces A, B, and C, a) restriction monotonicity: A v B implies A|C v B|C, b) restriction containment: A|C v A, and c) restriction idempotency: A|C|C = A|C. Proof. Each property follows from the definition of restriction in terms of intersection, and the introduction and elimination facts in Property 7.2. In particular, A @ B implies A|C = A u C v B u C = A|C, A|C = A u C v A, and A|C|C = A u C u C v A u C = A|C. Property 7.5. For any two poles ⊆O P and P-spaces A, B, C, and D a) restricted orthogonal: AOC ∣∣∣ D = AOC|D , b) restricted contrapositive: A v B implies B|OCD ∣∣∣ D v A|OCD ∣∣∣ D , c) restricted double orthogonal introduction: A v C implies A|D v A|OCD ∣∣∣OC D ∣∣∣∣ D , and d) restricted triple orthogonal elimination: A v C implies A|OCD ∣∣∣OC D ∣∣∣∣OC D ∣∣∣∣∣ D = A|OCD ∣∣∣ D . Proof. The restricted orthogonal Property 7.5 (a) follows from the definitions of the restriction and orthogonality operations on interaction spaces. In particular, supposing 260 A = (A+,A−), C = (C+,C−) and D = (D+,D−), we have AOC ∣∣∣ D = AOC u D = (A+,A−)O(C+,C−) u (D+,D−) = ( A OC+ − ,A OC− + ) u (D+,D−) = ( A OC+ − ∩ D+,A OC− + ∩ D− ) = ({v ∈ C+ | ∀e ∈ A− 〈v||e〉 ∈ O} ∩ D+, {e ∈ C− | ∀v ∈ A+ 〈v||e〉 ∈ O} ∩ D−) = ({v ∈ C+ ∩ D+ | ∀e ∈ A− 〈v||e〉 ∈ O}, {e ∈ C− ∩ D− | ∀v ∈ A+ 〈v||e〉 ∈ O}) = ( A OC+∩D+ − ,A OC−∩D− + ) = (A−,A+) O(C+∩D+,C−∩D−) = AOCuD = AOC|D The other properties follow from Property 7.5 (a), the intuitionistic facts of orthogonality in Property 7.1, as well as the monotonicity of restriction (Property 7.4 (a)). For the restricted contrapositive, A v B implies A|D v B|D by the monotonicity Property 7.4 (a) which implies B|OCD v A|OCD by the contrapositive Property 7.1 (a) which implies B|OCD ∣∣∣ D v A|OCD ∣∣∣ D by the monotonicity of restriction again. For the restricted double orthogonal introduction, A v C implies A|D v C|D by monotonicity (Property 7.4 (a)) which implies A|D v A| OC|DOC|D D = A|OCD ∣∣∣OC D ∣∣∣∣ D by ordinary double orthogonal introduction (Property 7.1 (b)) and Property 7.5 (a). For the restricted triple orthogonal elimination, again A v C implies A|D v C|D by monotonicity (Property 7.4 (a)) which implies A|OCD ∣∣∣OC D ∣∣∣∣OC D ∣∣∣∣∣ D = A|OC|DOC|DOC|DD = A|OC|DD = A|OCD ∣∣∣ D by ordinary triple orthogonal elimination (Property 7.1 (c)) and Property 7.5 (a). Lemma 7.3 (Positive semantic types). For any safety condition S, W-world T = (U,V,W), and Acons ⊆ W+|V+, it must be that PosT(Acons) is a T-type. Proof. PosT(Acons)|‚WV = ( Acons,A ‚W− cons )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V ∣∣∣∣∣∣ ‚W V (Definition) = ( Acons, Acons|‚W−V+ )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V ∣∣∣∣∣∣ ‚W V (Property 7.4 (c)) =  Acons|‚W−V+ ∣∣∣∣‚W+ V− ∣∣∣∣∣ ‚W− V+ ∣∣∣∣∣∣ ‚W+ V− , Acons|‚W−V+ ∣∣∣∣‚W+ V− ∣∣∣∣∣ ‚W− V+  (Definition) 261 = Acons|‚W−V+ ∣∣∣∣‚W+ V− , Acons|‚W−V+ ∣∣∣∣‚W+ V− ∣∣∣∣∣ ‚W− V+  (Property 7.5 (d)) = ( Acons, Acons|‚W−V+ )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V (Definition) = ( Acons,A ‚W− cons )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V = PosT(Acons) (Property 7.4 (c)) The Pos construction of types, which involves three applications of orthogonality interspersed with value restrictions, is more complex than the traditional bi-orthogonal construction of types which needs only two applications of orthogonality. This is because we do not assume anything about the chosen substitution strategy, so that there may be both non-values and non-(co-)values making the value restriction necessary at every step and inducing an extra application of orthogonality to ensure that the restricted version triple orthogonal elimination principle (used for showing that Pos(Acons) is indeed a semantic type) applies. However, if we assume that the negative side of V is universal (corresponding to the call-by-value V substitution strategy where all co-terms are co-values), then we can greatly simplify the positive construction of types to be the more traditional bi-orthogonal definition. Lemma 7.4 (Positive bi-orthogonality). For any safety condition S, W-world T = (U,V,W), and Acons ⊆ W+|V+, if V− = U− then PosT(Acons) = (Acons,A ‚W− cons )‚W = (A‚W+‚W−cons ,A ‚W− cons ). Proof. Note that because PosT(Acons) must be a T-type (Lemma 7.3), we know that PosT(Acons) = PosT(Acons)‚W by generation of T (Definition 7.8), so we have the following equality: PosT(Acons) = ( Acons,A ‚W− cons )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V (Definition) = ( Acons|V+ , A ‚W− cons ∣∣∣∣ V− )‚W ∣∣∣∣∣∣ ‚W V (Definition) = ( Acons, A ‚W− cons ∣∣∣∣ V− )‚W ∣∣∣∣∣∣ ‚W V (Acons ⊆ V+) 262 = ( Acons,A ‚W− cons )‚W ∣∣∣∣∣ ‚W V (V− = U−) = ( A ‚W+‚W− cons ∣∣∣∣ V+ , A ‚W− cons ∣∣∣∣ V− )‚W (Definition) = ( A ‚W+‚W− cons ∣∣∣∣ V+ ,A ‚W− cons )‚W (V− = U−) = ( A ‚W+‚W− cons , A ‚W+‚W− cons ∣∣∣∣‚W− V+ ) (Definition) = ( A ‚W+‚W− cons , A ‚W+‚W− cons ∣∣∣∣‚W− V+ )‚W (PosT(Acons) = PosT(Acons)‚W) = ( A ‚W+‚W− cons ∣∣∣∣‚W+‚W− V+ ,A ‚W−‚W+‚W− cons ) (Definition) = ( A ‚W+‚W− cons ,A ‚W−‚W+‚W− cons ) (Previous) = ( A ‚W+‚W− cons ,A ‚W− cons ) (Property 7.1 (c)) = ( Acons,A ‚W− cons )‚W (Definition) The negative construction of types The dual to the positive method of constructing types is the negative method, which builds a type around a chosen set of co-values. In particular, given some world T = (U,V,W), where V = (V+,V−) and W = (W+,W−), and a chosen set of well- behaved co-value elements Aobs ⊆ W−|V− serving as the primitive observations, we have the negative construction of the T-type NegTAobs, defined as follows: NegT(Aobs) , ( A ‚W+ obs ,Aobs )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V Lemma 7.5 (Negative semantic types). For any safety condition S, W-world T = (U,V,W), and Aobs ⊆ W−|V−, it must be that NegT(Aobs) is a T-type. Proof. NegT(Aobs)|‚WV = ( A ‚W+ obs ,Aobs )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V ∣∣∣∣∣∣ ‚W V (Definition) 263 = ( Aobs|‚W+V− ,Aobs )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V ∣∣∣∣∣∣ ‚W V (Property 7.4 (c)) =  Aobs|‚W+V− ∣∣∣∣‚W− V+ ∣∣∣∣∣ ‚W+ V− , Aobs|‚W+V− ∣∣∣∣‚W− V+ ∣∣∣∣∣ ‚W+ V− ∣∣∣∣∣∣ ‚W+ V+  (Definition) =  Aobs|‚W+V− ∣∣∣∣‚W− V+ ∣∣∣∣∣ ‚W+ V− , Aobs|‚W+V− ∣∣∣∣‚W− V+  (Property 7.5 (d)) = ( Aobs|‚W+V− ,Aobs )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V (Definition) = ( A ‚W+ obs ,Aobs )∣∣∣∣‚W V ∣∣∣∣∣ ‚W V = NegT(Aobs) (Property 7.4 (c)) Similar to the fact that the positive construction of types Pos can be simplified to the traditional bi-orthogonal construction under certain assumptions about V, the same holds for the negative construction of types Neg. Unsurprisingly, this requires the dual assumption that the positive side of V is universal (corresponding to the call-by-name substitution strategy N where all terms are values). Lemma 7.6 (Negative bi-orthogonality). For any safety condition S, W-world T = (U,V,W), and Aobs ⊆ W−|V−, if V+ = U+ then NegT(Aobs) = (A ‚W+ obs ,Aobs)‚W = (A‚W+obs ,A ‚W−‚W+ obs ). Proof. Analogous to the proof of Lemma 7.4 by duality. Models We now build a parameterized model for the µµ˜-calculus in earnest by partially instantiating the notion of safety conditions and world. Since one of the applications of interest in Section 7.5 is the binary property of contextual equivalence, we will make a model out of pairs of commands and (co-)terms. More specifically, our model is parameterized by an arbitrary safety condition S = ( ‚, ‚‚,‚) as well as a world TS = (US ,VS ,WS) for every base kind S such that ‚= Command × Command US = (TermS × TermS ,CoTermS × CoTermS) VS = (ValueS × ValueS ,CoValueS × CoValueS) 264 from the well-kinded (but untyped) syntax of the µµ˜-calculus, and where the computation relation for the top pole ‚is defined as (c1, c2) (c′1, c′2) if (c1 7→ c′1) ∧ (c2 7→ c′2) (c1, c2) (c′1, c2) if c1 7→ c′1 (c1, c2) (c1, c′2) if c2 7→ c′2 and the cut operation for each ‚-space US is defined as 〈(v1, v2)||(e1, e2)〉 , (〈v1||e1〉 , 〈v2||e2〉) Therefore, the definitions of ‚‚, ‚, and WS for each strategy S is arbitrary, so long as they satisfy the criteria imposed by the safety condition ( ‚, ‚‚,‚) and worlds (US ,VS ,WS). Since we are dealing with binary relations, not just unary predicates, we will use the following shorthand: – c ‚ c′ means (c, c′) ∈ ‚ and c ‚‚ c′ means (c, c′) ∈ ‚‚, – v A v′ means (v, v′) ∈ A for any A v US , and – e A e′ means (e, e′) ∈ A for any A v US . To accomodate the size indexes Ix and Ord, the model is also parameterized by a size measurement, defined as follows. Definition 7.10. A size measurement is a set of ordinals O equipped with two constants 0,∞ ∈ O, a unary operation +1 : O → O, and a well-founded (partial) order < between elements of O such that the following conditions hold: 1. 0 is less than ∞: 0 <∞, 2. +1 is monotonic: M < N implies +1(M) < +1(N) for all M,N ∈ O, 3. +1 is strictly increasing: M < +1(M) for all M ∈ O, and 4. ∞ is a limit of +1: M <∞ implies +1(M) <∞ for all M ∈ O. 265 Types and Kinds First, we build a model for the kinds and sorts in the wholly static part of the higher-order µµ˜-calculus with structural recursion. Since the language of kinds includes functions and size indexes in addition to base kinds, we need to form the Universe containing all the semantic representations of the different kinds of syntactic types. In particular, base kinds are interpreted as the set of semantic types of the corresponding world, the kind of size type indexes are interpreted as the set of ordinals, and the kinds of type functions are interpreted as the set of partial functions (denoted by ⇀) between other members of the universe. Definition 7.11 (Universe). The Universe is the smallest set such that 1. O ∈ Universe, 2. SemType(TS) ∈ Universe for all base kinds, 3. (K⇀ L) ∈ Universe for all K,L ∈ Universe, 4. K ∈ Universe, for all K ⊆ L ∈ Universe. For any K,L ∈ Universe, A ∈ K, and B ∈ L, the partial function application A(B) is defined whenever there exists K1,K2 ∈ Universe such that A ∈ K1 → K2 and B ∈ K1, and is undefined otherwise. With the universe in place, we can now define the meaning of kinds as a relationship between the syntax of types and their semantics (Pitts, 1997) in the model. Definition 7.12 (Semantic kinds). A semantic kind K ∈ SemKind is a pair (D,R) where D ∈ Universe and R ⊆ (Type × D). We refer to D as the domain of K and R as the syntactic-semantic relationship of K. As shorthand, given K ∈ SemKind, A ∈ Type, and A ∈ Universe, we write A ∈ K to indicate A ∈ pi1(K) and A K A to indicate (A,A) ∈ pi2(K). First, we give an interpretation of sorts in terms of semantic kinds. The sort of non-erasable kinds, , is interpreted as the whole set of all semantic kinds, and the sort of erasable kinds, , adds the restriction that the syntactic-semantic relation is closed under β expansion of the syntactic component. JK , SemKind JK , {K ∈ SemKind | ∀A′ K A.∀Aβ A′.A K A} 266 A type substitution σ is a partial function from syntactic type variables and connective names to semantic entities in some kind in the universe, and the set of all type substitutions is TypeSubstitution , (TypeVariable ∪ Connective) ⇀⋃Universe The interpretation of syntactic kinds and types is then a (partial) function from type substitutions to semantic entities in SemKind and Universe, respectively. Jk ∈ KindK : TypeSubstitution ⇀ SemKindJA ∈ TypeK : TypeSubstitution ⇀ Universe This interpretation is mutually defined by structural induction over the syntax. JXKσ , σ(X)J0Kσ , 0J∞Kσ ,∞JN + 1Kσ , +1(JNKσ)r F( #»A) z σ , σ(F)( # »JAKσ)JλX:k.BKσ , λX ∈ pi1(JkKσ).JBKσ{X/X}JA BKσ , JAKσ(JBKσ) JSKσ , (SemType(TS),Type × SemType(TS))Jk → lKσ , (pi1(JkKσ)→ pi1(JlKσ), {(A,A) | ∀B JkKσ B.A B JlKσ A(B)})JIxKσ , (N,{(M,M) |M ∼N M})JOrdKσ , (O,{(M,M) | ∃M ′ ∈ Type.M β M ′ ∼O M′ ≤M})J< NKσ , ({M ∈ O |M < JNKσ} ,{(M,M) | ∃M ′ ∈ Type.M β M ′ ∼O M′ ≤M}) Note that N is defined as the smallest subset of O containing 0 and closed under +1, the relation ∼N is defined as the smallest subset of Type ×O such that 1. 0 ∼N 0, and 2. M + 1 ∼N +1(M) for all M ∼N M. 267 and the relation ∼O used in is defined as ∼N ∪{(∞,∞)}. Declarations Using the interpretation of kinds above, each (co-)data declaration can be interpreted as the semantic type representing the connective it defines. Jdecl ∈ DeclarationK : TypeSubstitution ⇀ Universe The interpretations revolve around structures: data types are interpreted as the positive type built around their constructions, and co-data types are interpreted as the negative type built around their observations. We consider each different form of (co-)data type declaration introduced previously in Chapters V and VI, first warming up with simple (co-)data types before moving on to higher-order (co-)data types and recursive (co-)data types. Simple (co-)data types To interpret a data type, we must interpret the meaning of each of its constructors. In particular, given the signature of a constructor, K : ( # » A : T ` F( #»X ) | # »B : R ) in a data type declaration, we define its interpretation as the relation between the possible term constructions it can build, where the constructors agree and the sub- (co-)terms are related.r K : ( # » A : T ` F( #»X ) | # »B : R )z σ , {( K( #»e , #»v ),K( #» e′ , #» v′ ) ) ∣∣∣ # »e JBKσ e′ , # »v JAKσ v′} The interpretation of a full simple data type declaration is then the function returning the positive type built around the union of all of its constructions as follows:uwvdata F( # » X : k) : Swhere # » Ki : ( # » Aij : Tijj ` F( #»X ) | # »Bij : Rijj )i }~ σ , # »λX ∈ pi1(JkKσ).PosTS ⋃ i  s Ki : ( # » Aij : Tijj ` F( #»X ) | # »Bij : Rijj ){ σ{ # »X/X}   268 Co-data types are dual to data types, and follow the opposite approach. First we interpret the meaning of each of a co-data type’s observers, given its signature O : ( # » A : T | G( #»X ) ` # »B : R ) , as the relation between the possible co-term observations it can build where the observers agree and the sub-(co-)terms are related.r O : ( # » A : T | G( #»X ) ` # »B : R )z σ , {( O[ #»v , #»e ],O[ #» v′ , #» e′ ] ) ∣∣∣ # »v JAKσ v′ , # »e JBKσ e′} The interpretation of a full simple co-data type declaration is then the function returning the negative type built around the union of all of its observations as follows:uwv codataG( # » X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # »Bij : Rijj )i }~ σ , # »λX ∈ pi1(JkKσ).NegTS ⋃ i  s Oi : ( # » Aij : Tijj | G( #»X ) ` # »Bij : Rijj ){ σ{ # »X/X}   Higher-order (co-)data types The only difference between simple and higher-order (co-)data types is that higher- order (co-)data types can also include hidden quantified types within their structures. Therefore, when interpreting the meaning of their constructors and observers, we must also quantify over the possible types that might be included. For a quantified type of kind l, we extend the relation to quantify over l[σ] twice, choosing a pair of syntactic types C and C ′ which are related to exactly the same semantic type C. The two syntactic types are used in syntactic term and co-term structures, whereas the single semantic type is used for to interpret the types of the remaining components as follows:r K : ( # » A : T ` # »Y :l F( #»X ) | # »B : R )z σ , {( K #» C ( #»e , #»v ),K # » C′ ( #» e′ , #» v′ ) ) ∣∣∣∣ # »C JlKσ C, # »C ′ JlKσ C, # »e JBKσ{ # »C/Y } e′ , # »v JAKσ{ # »C/Y } v′}r O : ( # » A : T | G( #»X ) ` # »Y :l # »B : R )z σ , {( O #» C [ #»v , #»e ],O # » C′ [ #» v′ , #» e′ ] ) ∣∣∣∣ # »C JlKσ C, # »C ′ JlKσ C, # »v JAKσ{ # »C/Y } v′ , # »e JBKσ{ # »C/Y } e′} With this extended interpretation of single constructors and observers, the interpretation of higher-order (co-)data types is effectively the same as the simple 269 case, defined as follows:uwvdata F( # » X : k) : Swhere # » Ki : ( # » Aij : Tijj ` # » Yij :lij j F( #»X ) | # »Bij : Rijj )i }~ σ , # »λX ∈ pi1(JkKσ).PosTS ⋃ i  s Ki : ( # » Aij : Tijj ` # » Yij :lij j F( #»X ) | # »Bij : Rijj ){ σ{ # »X/X}   uwv codataG( # » X : k) : Swhere # » Oi : ( # » Aij : Tijj | G( #»X ) ` # » Yij :lij j # » Bij : Rijj )i }~ σ , # »λX ∈ pi1(JkKσ).NegTS ⋃ i  s Oi : ( # » Aij : Tijj | G( #»X ) ` # » Yij :lij j # » Bij : Rijj ){ σ{ # »X/X}   Primitive recursive (co-)data types Interpreting recursively-defined (co-)data types is more interesting, since the interpretation itself must also be recursive. As it turns out, we can use the same recursion principle corresponding to the program to define the connective semantically. In particular, we can interpret primitive-recursive data type asuwwwwwv data F(i:Ix, # »X:k) : S by primitive recursion on i where i = 0 # » Ki : ( # » Ai:Ti ` # » Yi:li F(0, #»X ) | # »Bi:Ri )i where i = j+1 # » K′i : ( # » A′i:T ′i ` # » Y ′i :l′i F(j+1, #»X ) | # »B′i:R′i )i }~ σ , λJ ∈ N.FJσ where the family of FMσ is defined by primitive recursion on M ∈ N: F0σ , # » λX ∈ pi1(JkK). PosTS ⋃ i  s # » Ki : ( # » Ai:Ti ` # » Yi:li F(0, #»X ) | # »Bi:Ri )i{ σ{ # »X/X}   F+1(M)σ , # » λX ∈ pi1(JkK). PosTS ⋃ i  t # » K′i : ( # » A′i:T ′i ` # » Y ′i :l′i F(0, #»X ) | # »B′i:R′i )i| σ{ # »X/X,(λJ∈N.FMσ )/F}   270 Note that this is well-defined by primitive recursion whenever the declaration of F is well-formed, because the signature of the constructors K′i can only F(j, #» C ), which is defined by the previous F. Dually, we can interpret a primitive-recursive co-data type asuwwwwwv codataG(i:Ix, # »X:k) : S by primitive recursion on i where i = 0 # » Oi : ( # » Ai:Ti | G(0, #»X ) ` # » Yi:li # »Bi:Ri )i where i = j+1 # » O′i : ( # » A′i:T ′i | G(j+1, #» X ) ` # » Y ′i :l′i # » B′i:R′i )i }~ σ , λJ ∈ N.GJσ where the family of GMσ is defined by primitive recursion on M ∈ N: G0σ , # » λX ∈ pi1(JkK). NegTS ⋃ i  s # » Oi : ( # » Ai:Ti | G(0, #»X ) ` # » Yi:li # »Bi:Ri )i{ σ{ # »X/X}   G+1(M)σ , # » λX ∈ pi1(JkK). NegTS ⋃ i  t # » O′i : ( # » A′i:T ′i | G(0, #» X ) ` # » Y ′i :l′i # » B′i:R′i )i| σ{ # »X/X,(λJ∈N.GMσ )/G}   Noetherian recursive (co-)data types Data types defined by noetherian recursion are interpreted by the same recursion principle, so that we have the following interpretationt data F(i : Ord, # »X : k) : S by noetherian recursion on i where K : ( # » A : T ` # »Y :l F(i, #»X ) | # »B : R ) | σ , λJ ∈ O.FJσ where the family of FMσ is defined by noetherian recursion on M ∈ O: FMσ , # » λX ∈ pi1(JkK). PosTS ⋃ i  s # » Ki : ( # » Ai:Ti ` # » Yi:li F(i, #»X ) | # »Bi:Ri )i{ σ{ # »X/X,M/i,(λJ∈() alternatives. The multiplicative (co-)data types capture the use of multiple components within structures or observations, by giving a combination of two (⊗ and `) or no (1 and ⊥) parts. And finally the negation (co-)data types, which capture the ability for data structures to contain co-terms and co-data observations to contain terms. We might think that we have some flexibility in choosing the evaluation strategies for the declarations in Figure 8.1. But as it turns out, since we want to use these (co-)data types as the backbone of faithful encodings, our hand is forced. Intuitively, each of these declarations follows a simple rule of thumb for choosing the kinds for types: every type to the left (of `) is V and every type to the right is N , except for the active type whose kind is the reverse. This rule of thumb has a few consequences. The first is that every data type is call-by-value and every co-data type is call-by-name, which follows the general wisdom of polarization in computation (Zeilberger, 2009; Munch-Maccagnoni, 2013). The second consequence is that every data type constructor 311 Additive (co-)data types data (X : V)⊕ (Y : V) : V where ι1 : (X : V ` X ⊕ Y | ) ι2 : (Y : V ` X ⊕ Y | ) codata (X : N ) & (Y : N ) : N where pi1 : ( | X & Y ` X : N ) pi2 : ( | X & Y ` Y : N ) data 0 : V where codata> : N where Multiplicative (co-)data types data (X : V)⊗ (Y : V) : V where ( , ) : (X : V , Y : V ` X ⊗ Y | ) codata (X : N )` (Y : N ) : N where [ , ] : ( | X ` Y ` X : N , Y : N ) data 1 : V where () : ( ` 1 | ) codata⊥ : N where [] : ( | ⊥ ` ) Involutive negation (co-)data types data∼(X : N ) : V where ∼ : ( ` ∼X | X : N ) codata¬(X : V) : N where ¬ : (X : V | ¬X ` ) FIGURE 8.1. Declarations of the primitive polarized data and co-data types. builds on V types and every co-data type constructor builds on N types, except for the negation constructors which are reversed because their underlying (co-)terms are reversed. The last consequence is that the notion of data type values and co-data type co-values are hereditarily as restrictive as possible, where a structure or observation is only a (co-)value if it contains components that are (co-)values in the most restrictive sense. The basic (co-)data types from Figure 8.1 are still incomplete, though, for our purpose of encoding all (co-)data types expressible in the source language. In particular, how could we possibly represent a type like the call-by-name pair A×N B? The ⊗ data type constructor won’t do since it operates over the wrong kind of types. Therefore, we need a mechanism for plainly “shifting” between N and V kinds of types, and to do that we must break our rule of thumb. One way to do the conversion is with singleton (co-)data types, declared as follows, that wraps a component of the 312 data ↓S(X : S) : V where ↓S : (X : S ` ↓SX | ) codata ↑S(X : S) : N where ↑S : ( | ↑SX ` X : S) data S⇑(X : V) : Swhere S⇑ : (X : V ` S⇑X | ) codata S⇓(X : N ) : Swhere S⇓ : ( | S⇓X ` X : N ) FIGURE 8.2. Declarations of the shifts between strategies as data and co-data types. other strategy: data ↓(X : N ) : V where ↓ : (X : N ` ↓X | ) codata ↑(X : V) : N where ↑ : ( | ↑X ` X : V) The other possibility is a singleton (co-)data type that is of the other strategy, declared as follows: codata ⇓(X : N ) : V where ⇓ : ( | ⇓X ` X : N ) data ⇑(X : V) : N where ⇑ : (X : V ` ⇑X | ) As it turns out, we will use both styles of shifts because they are each useful in different situations for encoding complex (co-)data types. As a technical device, we will use a whole family of shifts parameterized by a kind of strategy as defined in Figure 8.2, with the above as defaults. The idea is that ↓S and ↑S shift to V and N (respectively) from S, whereas S⇑ and S⇓ shift from V and N (respectively) to S. These parameterized shifts include some redundancy (as we will see in Section 8.4), but they are useful notationally for generically manipulating types. By combining the polarized types from Figure 8.1 with the shifts from Figure 8.2, we get the polarized basis P for all user-defined (co-)data types. In particular, the polarized basis is expressive enough to translate programs using any collection G of user-defined (co-)data types as shown in Figure 8.3, so that if c : ( Γ `ΘG ∆ ) thenJcKG : (JΓKG `ΘP J∆KG) (where JΓKG and J∆KG are defined pointwise). We informally use deep pattern matching to aid writing the translation, with the understanding that it is desugared into several shallow patterns in the obvious way, and to express the repeated composition of the binary connectives, we define the (“big”) versions of the polarized additive, multiplicative, and quantifier connectives over n-ary vectors of 313 types and type variables as follows: ⊕  , 0 ⊕ (A, #»B) , A⊕ (⊕ #» B ) ⊗  , 1 ⊗ (A, #»B) , A⊗ (⊗ #» B ) ¯  , > ¯ (A, #»B) , A& (¯ #» B ) ¸  , ⊥ ¸ (A, #»B) , A` (¸ #»B) ιi (v) , ι2 ( i. . .ι1 (v)) pii [e] , pi2 [ i. . .pi1 [e]] (vn, . . . , v1) , (vn, (. . ., (v1, ()))) [e1, . . . , en] , [e1, [. . ., [en, []]]] This polarizing encoding is sound so that equalities in the source, including η, are preserved in the target. Theorem 8.1 (Polarization soundness). For any (composite) strategy S including V and N , and i = 1, 2: a) if ci : (Γ `ΘG ∆) and c1 =µS µ˜Sηµηµ˜βGηG c2 then JciKG : (JΓKG `ΘP J∆KG) andJc1KG =µS µ˜Sηµηµ˜βPηP Jc2KG, b) if Γ `ΘG vi : A|∆ v1 =µS µ˜Sηµηµ˜βGηG v2 then JΓKG `ΘP JviKG : JAKG|J∆KGJv1KG =µS µ˜Sηµηµ˜βPηP Jv2KG, and c) if Γ|ei : A `ΘG ∆ and e1 =µS µ˜Sηµηµ˜βGηG e2 then JΓKG|JeiKG : JAKG `ΘP J∆KG andJe1KG =µS µ˜Sηµηµ˜βPηP Je2KG. Proof. The polarizing encoding of (co-)data types as shown in Figure 8.3 is stated in terms of deep pattern matching on data structures and co-data observations, which avoids the terrifying bureaucracy of the many levels of shallow patterns needed to implement the translation. Thankfully, these deep patterns fit a certain form which makes them much easier to desugar compared to fully general patterns. In particular, every pattern used in the encoding begins with a match on a S⇑ or S⇓ shift, then several nested matches on the additive structure of type A⊕ B or A& B, and then 314 JXKG , XJ〈v||e〉KG , 〈JvKG||JeKG〉JxKG , x JαKG , αJµα.cKG , µα.JcKG Jµ˜x.cKG , µ˜x.JcKG Given data F(Θ) : Swhere # » Ki : ( # » Ai1 : Tijj ` F(Θ) | # »Bij : Rijj )i ∈ G: r F( #»C ) z G , S⇑ ⊕ # »⊗( # »∼(↑RijJBijKGθ)j, # »↓TijJAijKGθj)i  where θ = # »{JCKG/X}q Ki( # »eijj, # »vijj) y G , S⇑(ιi( # »∼(↑Rij [JeijKG])j, # »↓Tij(JvijKG)j))s µ˜[ # » Ki( # »αijj, # »xijj).ci i ] { G , µ˜[ # » S⇑(ιi( # »∼(↑Rij [αij]) j , # »↓Tij(xij) j )).JciKGi] Given codataG(Θ) : Swhere # » Oi : ( # » Aij : Tijj | G(Θ) ` # »Bij : Rijj )i ∈ G: r G( #»C ) z G , S⇓ ˘ # »˙( # »¬(↓TijJAijKGθ)j # »↑RijJBijKGθj)i  where θ = # »{JCKG/X}q Oi[ # »vijj, # »eijj] y G , S⇓[pii[ # »¬(↓Tij(JvijKG))j, # »↑Ri1 [JeijKG]j]]s µ( # » Oi[ # »xijj, # »αijj].ci i ) { G , µ( # » S⇓[pii[ # »¬[↓Tij(xij)] j , # »↑Tij [αij] j ]].JciKG i) FIGURE 8.3. A polarizing translation from G into P . 315 concludes with a match on the multiplicative structure of the following form: p ∈ Pattern ::= S⇑ ( p+ ) p+ ∈ AddPattern ::= p× | ι1 ( p+ ) | ι2 ( p+ ) p× ∈ MultPattern ::= x | () | ( x, p× ) | ∼ ( q× ) | ↓S(x) q ∈ CoPattern ::= S⇓ [ q+ ] q+ ∈ AddCoPattern ::= q× | pi1 [ q+ ] | pi2 [ q+ ] q× ∈ MultCoPattern ::= α | [] | [ α, q× ] | ¬p× | ↑S [α] We can then easily desugar (co-)patterns of this form by just un-nesting the pattern one level at a time within the alternatives of every pattern matching (co-)term as follows: µ˜ [ # » S⇑ ( p+i ) .ci i ] , µ˜ [ S⇑x. 〈 x ∣∣∣∣∣∣∣∣µ˜[ # »p∃i .cii]〉] µ˜  # » ι1 ( p+i ) .ci i # » ι2 ( p′+i ) .c′i i  , µ˜ ι1 (x). 〈 x ∣∣∣∣∣∣∣∣µ˜[ # »p+i .cii]〉 ι2 (x). 〈 x ∣∣∣∣∣∣∣∣µ˜[ # »p′+i .c′ii]〉  µ˜[x.c] , µ˜x.c µ˜ [( y, p× ) .c ] , µ˜ [ (y, x). 〈 x ∣∣∣∣∣∣µ˜[p×.c]〉] µ˜ [ ∼ ( q× ) .c ] , µ˜ [ ∼ (α). 〈 µ ( q×.c )∣∣∣∣∣∣α〉] µ ( # » S⇓ [ q+i ] .ci i ) , µ ( S⇓[α]. 〈 µ ( # » q∀i .ci i)∣∣∣∣∣∣∣∣α〉) µ  # » pi1 [ q+i ] .ci i # » pi2 [ q′+i ] .c′i i  , µ pi1 [α]. 〈 µ ( # » q+i .ci i )∣∣∣∣∣∣∣∣α〉 pi2 [α]. 〈 µ ( # » q′+i .c ′ i i )∣∣∣∣∣∣∣∣α〉  µ(α.c) , µα.c µ ([ β, q× ] .c ) , µ ( [β, α]. 〈 µ ( q×.c )∣∣∣∣∣∣α〉) µ ( ¬ [ p× ] .c ) , µ ( ¬ [x]. 〈 x ∣∣∣∣∣∣µ˜[p×.c]〉) Additionally, in order to prove the soundness of the η law for (co-)data types with respect to the encoding, we use a couple helpful tricks with η. First, note that 316 the seemingly stronger version of the η law for co-data types which applies to values (or the stronger η law for data types that applies to co-values) (ηFS) E : F( #» C ) = µ˜ [ # »K( #»α , #»x ).〈K( #»α , #»x )||E〉] (ηGS ) V : G( #» C ) = µ ( # »O[ #»x , #»α ].〈V ||O[ #»x , #»α ]〉) can be derived from the η law on (co-)variables by combining with the ηµ and ηµ˜ rules for µ- and µ˜-abstractions as follows: E : F( #»C ) =ηµηµ˜ µ˜y:F( #» C ). 〈 µβ:F( #»C ). 〈y||β〉 ∣∣∣∣∣∣E〉 =ηF µ˜y:F( #» C ). 〈 µβ:F( #»C ). 〈 y ∣∣∣∣∣∣µ˜[ # »K( #»α , #»x ).〈K( #»α , #»x )||β〉]〉∣∣∣∣∣∣E〉 =µ µ˜y:F( #» C ). 〈 y ∣∣∣∣∣∣µ˜[ # »K( #»α , #»x ).〈K( #»α , #»x )||E〉]〉 =ηµ˜ µ˜ [ # »K( #»α , #»x ).〈K( #»α , #»x )||E〉] V : G( #»C ) =ηµηµ˜ µβ:G( #» C ). 〈 V ∣∣∣∣∣∣µ˜y:G( #»C ). 〈y||β〉〉 =ηG µβ:G( #» C ). 〈 V ∣∣∣∣∣∣µ˜y:G( #»C ). 〈µ( # »O[ #»x , #»α ].〈y||O[ #»x , #»α ]〉)∣∣∣∣∣∣β〉〉 =µ˜ µβ:G( #» C ). 〈 µ ( # »O[ #»x , #»α ].〈V ||O[ #»x , #»α ]〉)∣∣∣∣∣∣β〉 =ηG µ ( # »O[ #»x , #»α ].〈V ||O[ #»x , #»α ]〉) Second, note that we have the following equalities (µ˜ηFS) c = 〈 z ∣∣∣∣∣∣µ˜[ # »K( #»α , #»x ).c {K( #»α , #»x )/z}]〉 (µ˜z.c : F( #»C ) ∈ CoV alueS) (µ˜ηGS ) c = 〈 µ ( # »H[ #»x , #»α ].c {H[ #»x , #»α ]/γ})∣∣∣∣∣∣γ〉 (µγ.c : G( #»C ) ∈ V alueS) the first of which is derived from the η law of the F as follows: c =µ˜S 〈z||µ˜z.c〉 =ηFS 〈 z ∣∣∣∣∣∣µ˜[ # »K( #»α , #»x ).〈K( #»α , #»x )||µ˜z.c〉]〉 (µ˜z.c : F( #»C ) ∈ CoV alueS) =µ˜S 〈 z ∣∣∣∣∣∣µ˜[ # »K( #»α , #»x ).c {K( #»α , #»x )/z}]〉 and the second of which is likewise derived from the η law of G as follows: c =µS 〈µγ.c||γ〉 =ηGS 〈 µ ( # »H[ #»x , #»α ].〈µγ.c||H[ #»x , #»α ]〉)∣∣∣∣∣∣γ〉 (µγ.c : G( #»C ) ∈ V alueS) 317 =µS 〈 µ ( # »H[ #»x , #»α ].c {H[ #»x , #»α ]/γ})∣∣∣∣∣∣γ〉 As examples, the particular instances of this rule for the polarized data types are: (µ˜η⊗V ) c = 〈z||µ˜[(x, y).c {(x, y)/z}]〉 (z : A⊗B) (µ˜η⊕V ) c = 〈z||µ˜[ι1 (x).c {ι1 (x)/z}|ι2 (y).c {ι2 (y)/z}]〉 (z : A⊕B) (µ˜η1V) c = 〈z||µ˜[().c {()/z}]〉 (z : 1) (µ˜η0V) c = 〈z||µ˜[]〉 (z : 0) (µ˜η∼V ) c = 〈z||µ˜[∼ (α).c {∼ (α)/z}]〉 (z : ∼A) and the instances for the polarized co-data types are: (µ˜ηN` ) c = 〈µ([α, β].c {[α, β]/γ})||γ〉 (γ : A`B) (µ˜η&N ) c = 〈µ(pi1 [α].c {pi1 [α]/γ}|pi2 [β].c {pi2 [β]/γ})||γ〉 (γ : A&B) (µ˜η⊥N ) c = 〈µ([].c {[]/γ})||γ〉 (γ : ⊥) (µ˜η>N ) c = 〈µ()||γ〉 (γ : >) (µ˜η¬N ) c = 〈µ(¬ [x].c {¬ [x]/γ})||γ〉 (γ : ¬A) With the above observations about pattern matching and extensionality, we are now ready to prove that the translation is sound. The fact that well-typed commands and (co-)terms have the associated translated type follows straightforwardly by (mutual) induction on their typing derivations. More interesting is the translation of equalities across the encoding. Note that since the translation is compositional and hygienic, the reflexive, symmetric, transitive, and (importantly) congruent closure of the equational theory is guaranteed (Downen & Ariola, 2014a). Therefore, we only need to check that each axiom is preserved by the translation. In that regard, it is important to note the fact that (1) that (co-)values translate to (co-)values and (2) substitution distributes over translation (that is, JcKG {JV KG/x} =α Jc {V/x}KG, etc.), both of which can be confirmed by induction on the syntax of (co-)terms. The substitution axioms translate directly without change because of the above mentioned two facts about (co-)values and substitution, like so: (ηµ) µα. 〈v||α〉 = v translates to Jµα. 〈v||α〉KG , µα.〈JvKG||α〉 =ηµ JvKG (ηµ˜) µ˜x. 〈x||e〉 = e translates to Jµ˜x. 〈x||e〉KG , µ˜x.〈x||JeKG〉 =ηµ˜ JeKG 318 (µ) 〈µα.c||E〉 = c {E/α} translates to J〈µα.c||E〉K , 〈µα.JcKG∣∣∣∣∣∣JEKG〉 =µ JcKG {JEKG/α} =α Jc {E/α}KG (µ˜) 〈V ||µ˜x.c〉 = c {V/x} translates to J〈V ||µ˜x.c〉K , 〈JV KG∣∣∣∣∣∣µ˜x.JcKG〉 =µ˜ JcKG {JV KG/x} =α Jc {V/x}KG Given data F(Θ) : Swhere # » Ki : ( # » Ai1 : Tijj ` F(Θ) | # »Bij : Rijj )i ∈ G we have: (βF) 〈 Ki( # »eijj, # »vijj) ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K # » Y :k i ( # »αijj, # »xijj).ci i ]〉 = 〈 µ # »αij j. 〈 # »vij ∣∣∣∣∣∣µ˜ # »xijj.ci〉∣∣∣∣∣∣ # »eijj〉 translates by induction on the pattern S⇑ ( ιi ( # » ¬ [ ↑Rij [αij] ]j , # »↓Tij(xij) j )) to: s〈 Ki( # »eijj , # »vijj) ∣∣∣∣∣∣∣∣µ˜[ # »Ki( # »αijj , # »xijj).cii]〉 { G , 〈 S⇑ ( ιi ( # » ¬ [ ↑Rij [JeijKG]]j, # »↓Tij(JvijKG)j )) ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣µ˜  # »S⇑ ( ιi ( # » ¬ [ ↑Rij[αij ] ]j , # »↓Tij (xij) j )) .JciKG i 〉 =βS⇑ηµ˜ 〈 ιi ( # » ¬ [ ↑Rij [JeijKG]]j, # »↓Tij(JvijKG)j )∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣µ˜  # »ιi ( # » ¬ [ ↑Rij [αij ] ]j , # »↓Tij (xij) j ) .JciKG i 〉 =β⊕ηµ˜ 〈( # » ¬ [ ↑Rij [JeijKG]]j , # »↓Tij(JvijKG)j )∣∣∣∣∣ ∣∣∣∣∣µ˜ [( # » ¬ [ ↑Rij [αij ] ]j , # »↓Tij (xij) j ) .JciKG]〉 , 〈( ¬ [ ↑Ri1 [Jei1KG]] , . . . ,¬ [↑Rim[JeimKG]] , # »↓Tij(JvijKG)j ) ∣∣∣∣∣∣∣∣µ˜[(¬ [↑Ri1 [αi1]] , . . . ,¬ [↑Rim [αim]] , # »↓Tij (xij)j).JciKG]〉 =β⊗ηµ˜ 〈 ¬ [ ↑Ri1 [Jei1KG]]∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  ¬ [↑Ri1 [αi1]]. . . .〈 ¬ [ ↑Rim [JeimKG]]∣∣∣∣∣∣∣∣µ˜ [ ¬ [↑Rim [αim]]. 〈( # » ↓Tij (JvijKG)j )∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # »↓Tij (xij) j .JciKG]〉]〉  〉 319 =β¬ηµ 〈 ↑Ri1 [Jei1KG]∣∣∣∣∣∣∣∣∣∣∣µ˜  ↑Ri1 [αi1]. . . . 〈 ↑Rim [JeimKG]∣∣∣∣∣∣∣∣µ˜ [ ↑Rim [αim]. 〈( # » ↓Tij (JvijKG)j )∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # »↓Tij (xij) j .JciKG]〉]〉  〉 =β↑ηµ 〈 µαi1. . . . 〈 µα1m. 〈( # » ↓Tij (JvijKG)j )∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # »↓Tij (xij) j .JciKG]〉 ∣∣∣∣∣ ∣∣∣∣∣JeimKG 〉∣∣∣∣∣ ∣∣∣∣∣Jei1KG 〉 , 〈 µ # »αij j . 〈( # » ↓Tij (JvijKG)j )∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # »↓Tij (xij) j .JciKG]〉 ∣∣∣∣∣ ∣∣∣∣∣ # »JeijKGj 〉 =β⊗β1ηµ˜ 〈 µ # »αij j . 〈 ↓Ti1 (Jvi1KG)∣∣∣∣∣∣µ˜[↓Ti1(xi1).. . .〈↓Tin(JvinKG)∣∣∣∣∣∣µ˜[↓Tin(xin).JciKG]〉]〉∣∣∣∣∣∣∣∣ # »JeijKGj〉 =β↓ηµ˜ 〈 µ # »αij j . 〈Jvi1KG∣∣∣∣∣∣µ˜xi1. . . .〈JvinKG∣∣∣∣∣∣µ˜xin.JciKG〉〉∣∣∣∣∣∣∣∣ # »JeijKGj〉 , 〈 µ # »αij j . 〈 # »JvijKGj∣∣∣∣∣∣∣∣µ˜ # »xijj .JciKG〉∣∣∣∣∣∣∣∣ # »JeijKGj〉 , r〈 µ # »αij j . 〈 # »vij ∣∣∣∣∣∣µ˜ # »xijj .ci〉∣∣∣∣∣∣ # »eijj〉zG (ηF) β : F( #»C ) = µ˜ [ # » Ki( # »αijj, # »xijj). 〈 Ki( # »αijj, # »xijj) ∣∣∣∣∣∣β〉i] translates by induction on the pattern S⇑ ( ιi ( # » ¬ [ ↑Rij [αij] ]j , # »↓Tij(xij) j )) to: t µ˜ [ # » Ki( # »αijj , # »xijj). 〈 Ki( # »αijj , # »xijj) ∣∣∣∣∣∣β〉i] | G , µ˜  # » S⇑ ( ιi ( # » ∼ ( ↑Rij [αij ] )j , # »↓Tij (xij) j )) .〈 S⇑ ( ιi ( # » ∼ ( ↑Rij [αij ] )j , # »↓Tij (xij) j ))∣∣∣∣∣ ∣∣∣∣∣β 〉 i = µ˜η ↓ V µ˜  # »S⇑ ( ιi ( # » ∼ ( ↑Rij [αij ] )j , # »xij j )) . 〈 S⇑ ( ιi ( # » ∼ ( ↑Rij [αij ] )j , # »xij j ))∣∣∣∣∣ ∣∣∣∣∣β 〉i = µ˜η ↑ V µ˜  # »S⇑(ιi ( # »∼ (αij)j , # »xijj)).〈S⇑(ιi ( # »∼ (αij)j , # »xijj))∣∣∣∣∣∣∣∣β〉i  =µ˜η∼V µ˜ [ # » S⇑ ( ιi ( # »yij j , # »xij j )) . 〈 S⇑ ( ιi ( # »yij j , # »xij j ))∣∣∣∣∣∣β〉i] =µ˜η⊗V η1V µ˜ [ # » S⇑(ιi (x)).〈S⇑(ιi (x))||β〉 i ] 320 =µ˜η⊕V µ˜[S⇑(x).〈S⇑(x)||β〉] =η⇑ β Given codataG(Θ) : Swhere # » Oi : ( # » Aij : Tijj | G(Θ) ` # »Bij : Rijj )i ∈ G we have: (βG) is analogous to the translation of βF by duality. (ηG) is analogous to the translation of ηF by duality. But is the converse statement of completeness, that if the encodings of two commands or (co-)terms are equal then they are equal to begin with, also true? Unfortunately not so directly; the polarizing encoding has the effect of “anonymizing” types by moving away from a nominal style, where the different declarations lead to distinct types, to a more structural style, where differently declared types can be collapsed if they share a common underlying pattern of (co-)term structures. This collapse of types doesn’t mean that all hope is lost, however, because the (co-)terms are only collapsed between types not within types; there is still a one- for-one correspondence between typed (co-)terms of the same type in the source with the encoded (co-)terms in the target. To argue this case, we turn to applying the idea of isomorphisms between types (Di Cosmo, 1995). Type Isomorphisms Usually, we can say that two types are isomorphic when there are mappings to and from both of them whose composition is the identity. In the sequent setting, we interpret “mappings” as open commands with a free variable and co-variable, and the “identity” mapping is the simple command 〈x||α〉 connecting its free (co-)variables. Definition 8.1 (Type isomorphism). Two closed types A and B are isomorphic, written A ≈ B, if and only if there exist commands c : (x : A ` β : B) and c′ : (y : B ` α : A) for any x, y, α, β such that the following equalities hold: 〈µβ.c||µ˜y.c′〉 = 〈x||α〉 : (x : A ` α : A) 〈µα.c′||µ˜x.c〉 = 〈y||β〉 : (y : B ` β : B) Moreover, two open types A and B with free type variables # »X : S are isomorphic, written # »X : S  A ≈ B, if and only if for all types # »C : S , it follows that A # »{C/X} ≈ B # »{C/X}. 321 Note that this definition of isomorphism between types is equivalent to a more traditional presentation in terms of inverse functions within the language. In particular, two types A : S and B : S are isomorphic in the sense of Definition 8.1 if and only if there are two closed function values V : A →S B and V ′ : B →S A such that V ′ ◦ V = id : A →S A and V ◦ V ′ = id : B →S B, because we can always abstract over the open commands to get a pair of closed functions, or call the functions to retrieve a pair of open commands, where one is inverse whenever the other is. However, Definition 8.1 has the advantage of not assuming that our language has a primitive function type (since they are just user-defined co-data types like any other), and of avoiding the awkwardness of mapping between the different kinds of types that might be isomorphic to one another. Having defined what is an isomorphism between types, we should ask if it is actually an equivalence relation as expected; are type isomorphisms closed under reflexivity, symmetry, and transitivity? The reflexivity and symmetry of the isomorphism relation between types is rather straightforward to show. Theorem 8.2 (Reflexivity and Symmetry). For all types A and B, (a) A ≈ A, and (b) A ≈ B implies B ≈ A. Proof. The symmetry of type isomorphism follows immediately from its symmetric definition. More interestingly, we can establish the reflexive isomorphism of any type with the extensionality laws of µ- and µ˜-abstractions. In particular, for a given A, we have the command 〈x||α〉 : (x : A ` α : A) which serves as both open commands of the isomorphism A ≈ A. The fact that the self-composition of this command is equal to itself comes from the ηµ and ηµ˜ axioms: 〈µα. 〈x||α〉||µ˜x. 〈x||α〉〉 =ηµ 〈x||µ˜x. 〈x||α〉〉 =ηµ˜ 〈x||α〉. In contrast, transitivity of type isomorphisms is tricker, and in fact it is not guaranteed to hold in every possible situation. In particular, the transitivity of isomorphism relies on the exchange of µ- and µ˜-bindings, which reassociates the composition of commands, but this not always valid in the multiple-strategy scenario. Specifically, given two any two (co-)terms of different kinds (recall the system for distinguishing between multiple base kinds in Figures 5.17 and 5.18 from Section 5.5), v :: S and e :: T , the exchange law χS`T is: (χS`T ) 〈v||µ˜x::S. 〈µα::T .c||e〉〉 = 〈µα::T . 〈v||µ˜x::S.c〉||e〉 322 And when exchanging bindings of the same S, we just write χS for χS`S . Thankfully, even though exchange is not necessarily guaranteed, it is still valid for many combinations of strategies. For any S, χN`S is derivable from the universal strength of the µ˜N axiom and likewise χS`V is derivable from the µV axiom. So for all combinations ofN and V , each of χN`N , χV`V , and χN`V hold, but χV`N is invalidated by the counter example:3 〈µ ::V .c1||µ˜x::V . 〈µα::N .c||µ˜ :B:N .c2〉〉 =µV c1 6= c2 =µ˜N 〈µα::N . 〈µ ::V .c1||µ˜x::V .c〉||µ˜ ::N .c2〉 As it turns out, transitivity of type isomorphisms can be built on χ. The consequence of χV`N being invalid is that, in general, isomorphisms between types of the same kind S are always transitive because χS holds, but isomorphisms between different kinds of types might not be because we can’t rely on both χS`T and χT `S . Theorem 8.3 (Homogeneous transitivity). For all strategies S such that χS holds and types A : S, B : S, and C : S, if A ≈ B and B ≈ C then A ≈ C. Proof. Let c1 : (x : A ` β : B) and c2 : (y : B ` α : A) be the commands from the isomorphism A ≈ B, and let c3 : (y′ : B ` γ : C) and c4 : (z : C ` β′ : B) be the commands from the isomorphism B ≈ C. We now establish the isomorphism A ≈ C by composing c1 and c3 to get c5, and composing c2 and c4 to get c6: c5 , 〈µβ.c1||µ˜y′.c3〉 : (x : A ` γ : C) c6 , 〈µβ′.c4||µ˜y.c2〉 : (z : C ` α : A) With the help of χS , we get that the composition of c5 and c6 is the identity command 〈x||α〉 : (x : A ` α : A): 〈µγ.c5||µ˜z.c6〉 , 〈µγ. 〈µβ.c1||µ˜y′.c3〉||µ˜z. 〈µβ′.c4||µ˜y.c2〉〉 =χS 〈µβ′. 〈µγ. 〈µβ.c1||µ˜y′.c3〉||µ˜z.c4〉||µ˜y.c2〉 =χS 〈µβ′. 〈µβ.c1||µ˜y′. 〈µγ.c3||µ˜z.c4〉〉||µ˜y.c2〉 =Iso 〈µβ′. 〈µβ.c1||µ˜y′. 〈y′||β′〉〉||µ˜y.c2〉 =ηµ˜ 〈µβ′. 〈µβ.c1||β′〉||µ˜y.c2〉 3The invalidity of χV`N exactly corresponds to the failure of associativity in categorical models of polarity Munch-Maccagnoni (2013). 323 =ηµ 〈µβ.c1||µ˜y.c2〉 =Iso 〈x||α〉 In the other direction, the composition of c6 and c5 is the identity command 〈z||γ〉 : (z : C ` γ : C): 〈µα.c6||µ˜x.c5〉 , 〈µα. 〈µβ′.c4||µ˜y.c2〉||µ˜x. 〈µβ.c1||µ˜y′.c3〉〉 =χS 〈µβ. 〈µα. 〈µβ′.c4||µ˜y.c2〉||µ˜x.c1〉||µ˜y′.c3〉 =χS 〈µβ. 〈µβ′.c4||µ˜y. 〈µα.c2||µ˜x.c1〉〉||µ˜y′.c3〉 =Iso 〈µβ. 〈µβ′.c4||µ˜y. 〈y||β〉〉||µ˜y′.c3〉 =ηµ˜ 〈µβ. 〈µβ′.c4||β〉||µ˜y′.c3〉 =ηµ 〈µβ′.c4||µ˜y′.c3〉 =Iso 〈z||γ〉 So isomorphisms between types of the same kind are an equivalence relation. But do they give us the right sense of a one-for-one correspondence between (co-)terms of those types? As it turns out, an isomorphism A ≈ B of S-kinded types provides just enough structure to convert all equalities between A-typed (co-)terms to B-typed (co-)terms, and vice versa, which also relies on the χS axiom to exchange (co-)variable bindings. Theorem 8.4. For all strategies S such that χS, types A : S ≈ B : S, and environments Γ and ∆, there are contexts C and C ′ such that if Γ `G vi : A | ∆ and Γ `G ei : A | ∆ (for i = 1, 2), then Γ `G C[vi] : B | ∆ and Γ `G C ′[ei] : B | ∆ (for i = 1, 2), v1 = v2 if and only if C[v1] = C[v2], and e1 = e2 if and only if C ′[e1] = C ′[e2]. Proof. Let c : (x : A ` β : B) and c′ : (y : B ` α : A) be the commands from the isomorphism A ≈ B, where x, y /∈ Γ and α, β /∈ ∆ (renaming c and c′ as necessary). The contexts are then C , µβ. 〈||µ˜x.c〉 and C ′ , µ˜y. 〈µα.c′||〉. C[v1] = C[v2] follows from v1 = v2 and C ′[e1] = C ′[e2] follows from e1 = e2 by just applying the same equalities within the larger context. More interestingly, we can derive v1 = v2 from C[v1] = C[v2] from the definition of the isomorphism by placing them in a larger context, so we have the following equality via χS : µα. 〈C[vi]||µ˜y.c′〉 , µα. 〈µβ. 〈vi||µ˜x.c〉||µ˜y.c′〉 =χS µα. 〈vi||µ˜x. 〈µβ.c||µ˜y.c〉〉 =Iso µα. 〈vi||µ˜x. 〈x||α〉〉 =ηµηµ˜ vi 324 And since we assumed C[v1] = C[v2], we have v1 = µα. 〈C[v1]||µ˜y.c′〉 = µα. 〈C[v2]||µ˜y.c′〉 = v2 e1 = e2 follows similarly from C ′[e1] = C ′[e2] because of the fact that µ˜x. 〈µβ.c||C[ei]〉 = ei. Finally, we extend the idea of isomorphisms between types to isomorphisms between (co-)data declarations. Definition 8.2 (Declaration isomorphism). We say that two data declarations are isomorphic, written4 data F(Θ) : Swhere # » K : ( Γ `Θ′ F(Θ) | ∆ ) ≈ data F′(Θ) : S ′where# » K′ : ( Γ `Θ′′ F′(Θ) | ∆ ) if and only if Θ  F(Θ) ≈ F′(Θ). Dually, we say that two co-data declarations are isomorphic, written codataG(Θ) : Swhere # » O : ( Γ | G(Θ) `Θ′ ∆ ) ≈ codataG′(Θ) : S ′where# » O′ : ( Γ′ | G′(Θ) `Θ′′ ∆′ ) if and only if Θ  G(Θ) ≈ G′(Θ). Theorem 8.5 (Declaration isomorphism equivalence). The (co-)data declaration isomorphism relation is (a) reflexive, (b) symmetric, and (c) transitive for any (co-)data types of the same strategy S, such that χS holds. Proof. Follows from the reflexivity (Theorem 8.2 (a)), symmetry (Theorem 8.2 (b)), and transitivity (Theorem 8.3) of the type isomorphism relation underlying Definition 8.2. This more specific notion of type-based isomorphism is the backbone of the syntactic theory that we will develop for the purpose of reasoning more easily about (co-)data types in general, the polarized basis P of (co-)data types, and eventually the faithfulness of the polarization translation. 4Note that we reuse Γ and ∆ as shorthand for the list of types # »A : T and # »B : R within the signatures of constructors and observers. 325 A Syntactic Theory of (Co-)Data Type Isomorphisms Before turning to our main result—that every user-defined (co-)data type can be represented by an isomorphic type composed solely from the polarized basic connectives—we first explore a theory for type isomorphisms based on data and co-data declarations. The advantage of using (co-)data type declarations for studying type isomorphisms is that the declarations themselves provide a larger context for localized manipulations surrounded by extra alternatives (of other constructors and observers) and extra components (within the same constructor or observer). The end result is that we only need to manually verify a few fundamental (co-)data type isomorphisms by hand, while the particular isomorphisms of interest can be easily composed out of these basic building blocks. Structural laws of declarations We present an isomorphism theory for the structural laws of data and co-data types in Figures 8.4 and 8.5, which are exactly dual to one another and capture several facts about isomorphic ways to declare (co-)data types. – Commute: The first group of laws state that the parts of any declaration may be reordered, including (1) the order of components within the signature of a constructor or observer, and (2) the order of constructor or observer alternatives within the declaration. These axioms are useful to show that the listed orders of the various parts of a declaration don’t matter. – Mix : The second group of laws states how two isomorphism between (co-)data type declarations may be combined together. In particular, there are two ways to mix declaration isomorphisms: (1) an isomorphic pair of single-alternative declarations can have the components of their single constructor or observer mixed into the signature of all the alternatives of another declaration isomorphism to form a larger declaration isomorphism, and (2) a pair of declaration isomorphism can have their respective alternatives mixed together to form a larger isomorphism. These inference rules let us use localized reasoning within a small (co-)data type declaration, and then compose the results together into a large declaration isomorphism that does everything. 326 – Shift: The third group of laws state that every call-by-value (V) data declaration isomorphism and every call-by-name (N ) co-data declaration isomorphism may be generalized to (co-)data types of any strategy. – Interchange: The fourth group of laws show how isomorphisms between data type declarations and co-data type declarations can be interchanged one-for-one with one another, so long as the data type is call-by-value (V) and the co-data type is call-by-name (N ). – Compatibility: The final group of laws state that an isomorphism between types can be lifted into an isomorphism between data and co-data type declarations with constructors and observations containing a component of that type as either an input or an output. These laws let us derive other facts about isomorphisms between (co-)data types. As a simple example, applying the shift laws to the trivial cases of the commute laws for data declarations lets us rename constructor and type names, which effectively tells us that there is only one empty and unit type for any strategy S: data F(Θ) : SwhereK : ( ` F(Θ) | ) ≈ data F′(Θ) : SwhereK′ : ( ` F′(Θ) | ) data F(Θ) : Swhere ≈ data F′(Θ) : Swhere Additionally, the mix laws let us extend an existing isomorphism by combining it with a reflexive isomorphism of any declaration, letting us add on arbitrary other alternatives or components to two isomorphic data declarations: data F1(Θ) : Vwhere # »K1 : (Γ1 ` F(Θ) | ∆1) ≈ data F′1(Θ) : Vwhere # » K′1 : ( Γ′1 ` F′(Θ) | ∆′1 ) data F2(Θ) : Vwhere # »K2 : (Γ2 ` F(Θ) | ∆2) ≈ data F2(Θ) : Vwhere # »K2 : (Γ2 ` F(Θ) | ∆2) data F(Θ) : Vwhere # »K1 : (Γ1 ` F(Θ) | ∆1) # »K2 : (Γ2 ` F(Θ) | ∆2) ≈ data F′(Θ) : Vwhere # » K′1 : ( Γ′1 ` F′(Θ) | ∆′1 ) # »K2 : (Γ2 ` F(Θ) | ∆2) 327 data F1(Θ) : Vwhere # »K1 : (Γ1 ` F(Θ) | ∆1) ≈ data F′1(Θ) : Vwhere # » K′1 : ( Γ′1 ` F′(Θ) | ∆′1 ) data F2(Θ) : Vwhere K2 : (Γ2 ` F(Θ) | ∆2) ≈ data F2(Θ) : Vwhere K2 : (Γ2 ` F(Θ) | ∆2) data F(Θ) : Vwhere # »K1 : (Γ2,Γ1 ` F(Θ) | ∆1,∆1) ≈ data F′(Θ) : Vwhere # » K′1 : ( Γ2,Γ′1 ` F′(Θ) | ∆′1,∆2 ) We can justify the laws in Figures 8.4 and 8.5 in terms of the definitions of type and (co-)data declaration isomorphisms. In particular, we can calculate when specific instances of two (co-)data types happen to be isomorphic, so that the laws for declaration isomorphisms are sound when the specific instances hold in general for any matching choice of types. These calculations can be done for the commute, mix, compatibility, interchange and shift laws for data declarations as follows. Lemma 8.1 (Data commute instance). For any types # »C : S , # »C ′ : S ′, F( #»C ) ≈ F′( # »C ′) for the declarations a) data F( # »X : S ) : V where K : ( Γ2,Γ1 ` F( #»X ) | ∆1,∆2 ) and data F′( # » X ′ : S ′) : V where K′ : ( Γ′1,Γ′2 ` F′( # » X ′) | ∆′2,∆′1 ) such that Γ1θ = Γ′1θ′, Γ2θ = Γ′2θ′, ∆1θ = ∆′1θ′, and ∆1θ = ∆′1θ′ where θ = # »{C/X} and θ′ = # »{C ′/X ′} b) data F( # »X : S ) : V where # » K1 : ( Γ1 ` F( #»X ) | ∆1 ) # » K2 : ( Γ2 ` F( #»X ) | ∆2 ) and data F′( # » X ′ : S ′) : V where # » K′2 : ( Γ′2 ` F′( # » X ′) | ∆′2 ) # » K′1 : ( Γ′1 ` F′( # » X ′) | ∆′1 ) such that # » Γ1θ = Γ′1θ′, # » Γ2θ = Γ′2θ′, # » ∆1θ = ∆′1θ′, and # » ∆2θ = ∆′2θ′ where θ = # »{C/X} and θ′ = # »{C ′/X ′} Proof. The isomorphisms between F( #»C ) and F′( # »C ′) are established by the commands c : (x : F( #»C ) ` α′ : F′( # »C ′)) and c′ : (x′ : F′( # »C ′) ` α : F( #»C )) as follows: a) c , 〈x||µ˜[K( #»β1 , #»β2 , #»y2 , #»y1).〈K′( #»β2 , #»β1 , #»y1 , #»y2)||α′〉]〉 c′ , 〈x′||µ˜[K′( #»β2 , #»β1 , #»y1 , #»y2).〈K( #»β1 , #»β2 , #»y2 , #»y1)||α〉]〉 b) c , 〈 x ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K1( #»β1 , #»y1).〈K′1( #»β1 , #»y1)||α′〉 K2( #» β2 , #»y2).〈K′2( #» β2 , #»y2)||α′〉 〉 c′ , 〈x′ ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K′2( #»β2 , #»y2).〈K2( #»β2 , #»y2)||α〉 K′1( #» β1 , #»y1).〈K1( #»β1 , #»y1)||α〉 〉 328 Data Commute data F(Θ) :Vwhere K : (Γ2,Γ1 ` F(Θ)|∆1,∆2) ≈ data F ′(Θ) :Vwhere K′ : ( Γ1,Γ2 ` F′(Θ)|∆2,∆1 ) data F(Θ) :Vwhere # »K1 : (Γ1 ` F(Θ)|∆1) # »K2 : (Γ2 ` F(Θ)|∆2) ≈ data F′(Θ) :Vwhere # » K′2 : ( Γ2 ` F′(Θ)|∆2 ) # » K′1 : ( Γ1 ` F′(Θ)|∆1 ) Data Mix data F1(Θ) :Vwhere # »K1 : (Γ1 ` F1(Θ)|∆1) ≈ data F′1(Θ) :Vwhere # » K′1 : ( Γ′1 ` F′1(Θ)|∆′1 ) data F2(Θ) :Vwhere K2 : (Γ2 ` F2(Θ)|∆2) ≈ data F ′ 2(Θ) :Vwhere K′2 : ( Γ′2 ` F′2(Θ)|∆′2 ) data F(Θ) :Vwhere # »K : (Γ2,Γ1 ` F(Θ)|∆1,∆2) ≈ data F′(Θ) :Vwhere # » K′ : ( Γ′2,Γ′1 ` F′(Θ)|∆′1,∆′2 ) data F1(Θ) :Vwhere # »K1 : (Γ1 ` F1(Θ)|∆1) ≈ data F′1(Θ) :Vwhere # » K′1 : ( Γ′1 ` F′1(Θ)|∆′1 ) data F2(Θ) :Vwhere # »K2 : (Γ2 ` F2(Θ)|∆2) ≈ data F′2(Θ) :Vwhere # » K′2 : ( Γ′2 ` F′2(Θ)|∆′2 ) data F(Θ) :Vwhere # »K1 : (Γ1 ` F(Θ)|∆1) # »K2 : (Γ2 ` F(Θ)|∆2) ≈ data F′(Θ) :Vwhere # »K′1 : ( Γ′1 ` F′(Θ)|∆′1 ) # » K′2 : ( Γ′2 ` F′(Θ)|∆′2 ) Data Shift data F(Θ) :Vwhere # »K : (Γ` F(Θ)|∆) ≈ data F′(Θ) :Vwhere # »K′ : (Γ′ ` F′(Θ)|∆′) data F(Θ) :Swhere # »K : (Γ` F(Θ)|∆) ≈ data F′(Θ) :S ′where # »K′ : (Γ′ ` F′(Θ)|∆′) Co-data-Data Interchange codataG(Θ) :N where # » O : ( Γ|G( #»X )`∆ ) ≈ codataG′(Θ) :N where # » O′ : ( Γ′|G′( # »X ′)`∆′ ) data F(Θ) :Vwhere # »K : (Γ` F(Θ)|∆) ≈ data F′(Θ) :Vwhere # » K′ : ( Γ′ ` F′( # »X ′)|∆′ ) Data Compatibility Θ`A : S Θ`B : S Θ  A ≈ B data F(Θ) :VwhereK : (A :S ` F(Θ)|) ≈ data F′(Θ) :VwhereK′ : (B :S ` F′(Θ)|) Θ`A : S Θ`B : S Θ  A ≈ B data F(Θ) :VwhereK : (` F(Θ)|A :S) ≈ data F′(Θ) :VwhereK′ : (` F′(Θ)|B :S) FIGURE 8.4. A theory for structural laws of data type declaration isomorphisms. 329 Co-data Commute codataG(Θ) :N where O : (Γ2,Γ1|G(Θ)`∆1,∆2) ≈ codataG ′(Θ) :N where O′ : ( Γ1,Γ2|G′(Θ)`∆2,∆1 ) codataG(Θ) :N where # »O1 : (Γ1|G(Θ)`∆1) # »O2 : (Γ2|G(Θ)`∆2) ≈ codataG′(Θ) :Swhere # » O′2 : ( Γ2|G′(Θ)`∆2 ) # » O′1 : ( Γ1|G′(Θ)`∆1 ) Co-data Mix codataG1(Θ) :N where # »O1 : (Γ1|G1(Θ)`∆1) ≈ codataG′1(Θ) :N where # » O′1 : ( Γ′1|G′1(Θ)`∆′1 ) codataG2(Θ) :N where O2 : (Γ2|G2(Θ)`∆2) ≈ dataG ′ 2(Θ) :N where O′2 : ( Γ′2|G′2(Θ)`∆′2 ) codataG(Θ) :N where # »O : (Γ2,Γ1|G(Θ)`∆1,∆2) ≈ codataG′(Θ) :N where # » O′ : ( Γ′2,Γ′1|G′(Θ)`∆′1,∆′2 ) codataG1(Θ) :N where # »O1 : (Γ1|G1(Θ)`∆1) ≈ codataG′1(Θ) :N where # » O′1 : ( Γ′1|G′1(Θ)`∆′1 ) codataG2(Θ) :N where O2 : (Γ2|G2(Θ)`∆2) ≈ dataG ′ 2(Θ) :N where O′2 : ( Γ′2|G′2(Θ)`∆′2 ) codataG(Θ) :N where # »O1 : (Γ1|G(Θ)`∆1) # »O2 : (Γ2|G(Θ)`∆2) ≈ codataG′(Θ) :N where # »O′1 : ( Γ′1|G′(Θ)`∆′1 ) # » O′2 : ( Γ′2|G′(Θ)`∆′2 ) Co-data Shift codataG(Θ) :N where # »O : (Γ|G(Θ)`∆) ≈ codataG′(Θ) :N where # »O′ : (Γ′|G′(Θ)`∆′) codataG(Θ) :Swhere # »O : (Γ|G(Θ)`∆) ≈ codataG′(Θ) :S ′where # »O′ : (Γ′|G′(Θ)`∆′) Data-Co-data Interchange data F(Θ) :Vwhere # »K : (Γ` F(Θ)|∆) ≈ data F′(Θ) :Vwhere # » K′ : ( Γ′ ` F′( # »X ′)|∆′ ) codataG(Θ) :N where # » O : ( Γ|G( #»X )`∆ ) ≈ codataG′(Θ) :N where # » O′ : ( Γ′|G′( # »X ′)`∆′ ) Co-data Compatibility Θ`A : S Θ`B : S Θ  A ≈ B codataG(Θ) :N whereO : (|G(Θ)`A :S) ≈ codataG′(Θ) :N whereO′ : (|G′(Θ)`B :S) Θ`A : S Θ`B : S Θ  A ≈ B codataG(Θ) :N whereO : (A :S|G(Θ)` ) ≈ codataG′(Θ) :N whereO′ : (B :S|G′(Θ)` ) FIGURE 8.5. A theory for structural laws of co-data type declaration isomorphisms. 330 For part (a), the composition of c′ and c along α and x of type F( #»C ) is equal to the identity command 〈x′||α′〉 via the βF and ηF′ axioms as follows: 〈 µα.c′ ∣∣∣∣µ˜x.c〉 , 〈 µα.〈x′||µ˜[K′( #»β2 , #»β1 , #»y1 , #»y2).〈K( #»β1 , #»β2 , #»y2 , #»y1)||α〉]〉∣∣∣∣∣∣µx.〈x||µ˜[K( #»β1 , #»β2 , #»y2 , #»y1).〈K′( #»β2 , #»β1 , #»y1 , #»y2)||α′〉]〉〉 =ηµ˜ 〈 µα.〈x′||µ˜[K′( #»β2 , #»β1 , #»y1 , #»y2).〈K( #»β1 , #»β2 , #»y2 , #»y1)||α〉]〉∣∣∣∣∣∣µ˜[K( #»β1 , #»β2 , #»y2 , #»y1).〈K′( #»β2 , #»β1 , #»y1 , #»y2)||α′〉]〉 =µV 〈 x′ ∣∣∣∣∣∣µ˜[K′( #»β2 , #»β1 , #»y1 , #»y2).〈K( #»β1 , #»β2 , #»y2 , #»y1)||µ˜[K( #»β1 , #»β2 , #»y2 , #»y1).〈K′( #»β2 , #»β1 , #»y1 , #»y2)||α′〉]〉]〉 =βFµαµ˜x 〈x′||µ˜[K′( #» β2 , #» β1 , #»y1 , #»y2).〈K′( #»β2 , #»β1 , #»y1 , #»y2)||α′〉]〉 =ηF′ 〈 x′ ∣∣∣∣α′〉 And the reverse composition of c and c′ along α′ and x′ of type F′( # »C ′) is similarly equal to the identity command 〈x||α〉 via the βF′ and ηF. For part (b), the composition of c′ and c along α and x of type F( #»C ) is equal to the identity command 〈x′||α′〉 via the βF and ηF′ axioms as follows: 〈 µα.c′ ∣∣∣∣µ˜x.c〉 , 〈 µα. 〈 x′ ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K′2( #»β2 , #»y2).〈K2( #»β2 , #»y2)||α〉 K′1( #» β1 , #»y1).〈K1( #»β1 , #»y1)||α〉 〉∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜x. 〈 x ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K1( #»β1 , #»y1).〈K′1( #»β1 , #»y1)||α′〉 K2( #» β2 , #»y2).〈K′2( #» β2 , #»y2)||α′〉 〉〉 =ηµ˜ 〈 µα. 〈 x′ ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K′2( #»β2 , #»y2).〈K2( #»β2 , #»y2)||α〉 K′1( #» β1 , #»y1).〈K1( #»β1 , #»y1)||α〉 〉∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K1( #»β1 , #»y1).〈K′1( #»β1 , #»y1)||α′〉 K2( #» β2 , #»y2).〈K′2( #» β2 , #»y2)||α′〉 〉 =µV 〈 x′ ∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  K′2( #» β2 , #»y2). 〈 K2( #» β2 , #»y2) ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K1( #»β1 , #»y1).〈K′1( #»β1 , #»y1)||α′〉 K2( #» β2 , #»y2).〈K′2( #» β2 , #»y2)||α′〉 〉 K′1( #» β1 , #»y1). 〈 K1( #» β1 , #»y1) ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K1( #»β1 , #»y1).〈K′1( #»β1 , #»y1)||α′〉 K2( #» β2 , #»y2).〈K′2( #» β2 , #»y2)||α′〉 〉  〉 =βFµαµ˜x 〈 x′ ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜ K′2( #»β2 , #»y2).〈K′2( #»β2 , #»y2)||α′〉 K′1( #» β1 , #»y1).〈K′1( #» β1 , #»y1)||α′〉 〉 =ηF′ 〈x′∣∣∣∣α′〉 And the reverse composition of c and c′ along α′ and x′ of type F′( # »C ′) is equal to the identity command 〈x||α〉 via the βF′ and ηF. 331 Lemma 8.2 (Data mix instance). For any data types declared as data F1( # » X :S ) :V where # » K1 : ( Γ1 ` F1( #»X ) |∆1 ) data F ′ 1( # » X ′ :S ′) :V where # » K′1 : ( Γ′1 ` F′1( # » X ′) |∆′1 ) data F2( # » X :S ) :V where # » K2 : ( Γ2 ` F2( #»X ) |∆2 ) data F ′ 2( # » X ′ :S ′) :V where # » K′2 : ( Γ′2 ` F′2( # » X ′) |∆′2 ) data F3( # » X :S ) :V where K3 : ( Γ3 ` F3( #»X ) |∆3 ) data F′3( # » X ′ :S ′) :V where K′3 : ( Γ′3 ` F′3( # » X ′) |∆′3 ) data F( # »X :S ) :V where # » K4 : ( Γ3,Γ1 ` F( #»X ) |∆1,∆3 ) # » K5 : ( Γ3,Γ2 ` F( #»X ) |∆2,∆3 ) data F′( # » X ′ :S ) :V where # » K′4 : ( Γ′3,Γ′1 ` F′( # » X ′) |∆′1,∆′3 ) # » K′5 : ( Γ′3,Γ′2 ` F′( # » X ′) |∆′2,∆′3 ) and types # »C : S , # »C ′ : S ′, if F1( #»C ) ≈ F′1( # » C ′), F2( #» C ) ≈ F′2( # » C ′), and F3( #» C ) ≈ F′3( # » C ′), then F( #»C ) ≈ F′( # »C ′). Lemma 8.3 (Data compatibility instance). For types A : T , A′ : T , # »C : S , # »C ′ : S ′, if A # »{C/X} ≈ A′ # »{C ′/X ′} then F( #»C ) ≈ F′( # »C ′) for the following declarations: a) data F( # »X : S ) : V where K : ( A : T ` F( #»X ) | ) and data F′( # » X ′ : S ′) : V where K′ : ( A′ : T ` F′( # »X ′) | ) b) data F( # »X : S ) : V where K : ( ` F( #»X ) | A : T ) and data F′( # » X ′ : S ′) : V where K′ : ( ` F′( # »X ′) | A′ : T ) Proof. Suppose that the isomorphisms F1( #» C ) ≈ F′1( # » C ′), F2( #» C ) ≈ F′2( # » C ′), and F3( #» C ) ≈ F′3( # » C ′) are witnessed by the commands c1 : (x1 : F1( #» C ) ` α′1 : F′1( # » C ′)) c′1 : (x′1 : F′1( # » C ′) ` α1 : F1( #»C )) c2 : (x2 : F2( #» C ) ` α′2 : F′2( # » C ′)) c′2 : (x′2 : F′2( # » C ′) ` α2 : F2( #»C )) c3 : (x3 : F3( #» C ) ` α′3 : F′3( # » C ′)) c′3 : (x′3 : F′3( # » C ′) ` α3 : F3( #»C )) 332 respectively. Then isomorphisms between F( #»C ) and F′( # »C ′) are established by the commands c : (x : F( #»C ) ` α′ : F′( # »C ′)) and c′ : (x′ : F′( # »C ′) ` α : F( #»C )) as follows: c , 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , #» β3 , #»y3 , # »y1i).〈 v′1i ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜  # »K′1i( # »β′1j , # »y′1j ).〈v′3∣∣∣∣∣∣∣∣µ˜[K′3( #»β′3 , #»y′3).〈K′4j( # »β′1j , #»β′3 , #»y′3 , # »y′1j )∣∣∣∣∣∣∣∣α′〉]〉j 〉 i # » K5i( # » β2i , #» β3 , #»y3 , # »y2i).〈 v′2i ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜  # »K′2i( # »β′2j , # »y′2j ).〈v′3∣∣∣∣∣∣∣∣µ˜[K′3( #»β′3 , #»y′3).〈K′5j( # »β′2j , #»β′3 , #»y′3 , # »y′2j )∣∣∣∣∣∣∣∣α′〉]〉j 〉 i  〉 c′ , 〈 x′ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K′4i( # » β′1i , #» β′3 , #» y′3 , # » y′1i).〈 v3 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ K3( #» β3 , #»y3). 〈 v1i ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K1j( # » β1j , # »y1j ). 〈 K4j( # » β1j , #» β3 , #»y3 , # » β1j ) ∣∣∣∣∣∣α〉j]〉]〉 i # » K′5i( # » β′2i , #» β′3 , #» y′3 , # » y′2i).〈 v3 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ K3( #» β3 , #»y3). 〈 v2i ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K2j( # » β2j , # »y2j ). 〈 K5j( # » β1j , #» β3 , #»y3 , # » β1j ) ∣∣∣∣∣∣α〉j]〉]〉 i  〉 where we make use of the following shorthand: v1i , µα1. 〈 K′1i( # » β′1i , # » y′1i) ∣∣∣∣∣∣µ˜x′1.c′1〉 v′1i , µα′1. 〈K1i( # »β1i , # »y1i)∣∣∣∣∣∣µ˜x1.c1〉 v2i , µα2. 〈 K′2i( # » β′2i , # » y′2i) ∣∣∣∣∣∣µ˜x′2.c′2〉 v′2i , µα′2. 〈K2i( # »β2i , # »y2i)∣∣∣∣∣∣µ˜x2.c2〉 v3 , µα3. 〈 K′3( #» β′3 , #» y′3) ∣∣∣∣∣∣µ˜x′3.c′3〉 v′3 , µα′3. 〈K3( #»β3 , #»y3)∣∣∣∣∣∣µ˜x3.c3〉 The composition of c and c′ along α′ and x′ of type F′( # »C ′) is equal to the identity command 〈x||α〉 via the combined strength of the µ˜ and η axioms for the call-by-value data types F′1, F′2, and F′3, as previously discussed in Section 8.1, as well as the call-by- value χ axiom to reassociate the bindings to bring the isomorphisms for those data types together, as follows: 〈µα′.c||µ˜x′.c′〉 333 =ηµ˜ 〈 µα′. 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈 v′1i ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜  # »K′1i( # »β′1j , # »y′1j ).〈v′3∣∣∣∣∣∣∣∣µ˜[K′3( # »β′3 , #»y′3).〈K′4j( # »β′1j , # »β′3 , #»y′3 , # »y′1j )∣∣∣∣∣∣∣∣α′〉]〉j 〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈 v′2i ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜  # »K′2i( # »β′2j , # »y′2j ).〈v′3∣∣∣∣∣∣∣∣µ˜[K′3( # »β′3 , #»y′3).〈K′5j( # »β′2j , # »β′3 , #»y′3 , # »y′2j )∣∣∣∣∣∣∣∣α′〉]〉j 〉 i  〉 ∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K′4i( # » β′1i , # » β′3 , #» y′3 , # » y′1i). 〈 v3 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ K3( # » β3 , #»y3). 〈 v1i ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K1j( # » β1j , # »y1j ). 〈 K4j( # » β1j , # » β3 , #»y3 , # » β1j ) ∣∣∣∣∣∣α〉j]〉]〉i # » K′5i( # » β′2i , # » β′3 , #» y′3 , # » y′2i). 〈 v3 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ K3( # » β3 , #»y3). 〈 v2i ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K2j( # » β2j , # »y2j ). 〈 K5j( # » β1j , # » β3 , #»y3 , # » β1j ) ∣∣∣∣∣∣α〉j]〉]〉i  〉 =µαµ˜xβF′ 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i). 〈v′1i|∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K′1i( # » β′1j , # » y′1j ).〈v′3|∣∣∣∣∣∣∣µ˜ K ′ 3( # » β′3 , #» y′3).〈v3|∣∣∣∣µ˜[K3( # »β3 , #»y3).〈v1j ||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉]〉 〉 j  〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i). 〈v′2i|∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K′2i( # » β′2j , # » y′2j ).〈v′3|∣∣∣∣∣∣∣µ˜ K ′ 3( # » β′3 , #» y′3).〈v3|∣∣∣∣µ˜[K3( # »β3 , #»y3).〈v2j ||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉]〉 〉 j  〉 i  〉 = µ˜Vη F′3 V 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′1i|∣∣∣∣∣∣∣∣µ˜  # » K′1i( # » β′1j , # » y′1j ).〈µα′3.c3|∣∣∣∣µ˜x′3.〈µα3.c′3||µ˜[K3( # »β3 , #»y3).〈v1j ||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉]〉〉 j 〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′2i|∣∣∣∣∣∣∣∣µ˜  # » K′2i( # » β′2j , # » y′2j ).〈µα′3.c3|∣∣∣∣µ˜x′3.〈µα3.c′3||µ˜[K3( # »β3 , #»y3).〈v2j ||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉]〉〉 j 〉 i  〉 334 =χV 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′1i|∣∣∣∣∣∣∣∣µ˜  # » K′1i( # » β′1j , # » y′1j ).〈µα3. 〈µα′3.c3||µ˜x′3.c3〉|∣∣∣∣µ˜[K3( # »β3 , #»y3).〈v1j ||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉]〉 j 〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′2i|∣∣∣∣∣∣∣∣µ˜  # » K′2i( # » β′2j , # » y′2j ).〈µα3. 〈µα′3.c3||µ˜x′3.c′3〉|∣∣∣∣µ˜[K3( # »β3 , #»y3).〈v2j ||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉]〉 j 〉 i  〉 =Isoηµ 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′1i| |µ˜[ # » K′1i( # » β′1j , # » y′1j ).〈x3||µ˜[K3( # » β3 , #»y3).〈v1j ||µ˜[ # » K1j( # » β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉 k ]〉]〉 j ]〉〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈K3( # »β3 , #»y3)||µ˜x3. 〈v′1i| |µ˜[ # » K′2i( # » β′2j , # » y′2j ).〈x3||µ˜[K3( # » β3 , #»y3).〈v2j ||µ˜[ # » K2j( # » β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉 k ]〉]〉 j ]〉〉 i  〉 =µ˜xµαβF3 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i). 〈v′1i|∣∣∣∣∣∣µ˜[ # » K′1i( # » β′1j , # » y′1j ).〈v1j ||µ˜[ # » K1j( # » β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉 k ]〉 j ] 〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i). 〈v′2i|∣∣∣∣∣∣µ˜[ # » K′2i( # » β′2j , # » y′2j ).〈v2j ||µ˜[ # » K2j( # » β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉 k ]〉 j ] 〉 i  〉 = µ˜Vη F′1 V η F′2 V 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i). 〈K1i( # »β1i , # »y1i)|∣∣∣∣µ˜x1.〈µα′1.c1||µ˜x′1.〈µα1.c′1||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉〉〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i). 〈K2i( # »β2i , # »y2i)|∣∣∣∣µ˜x2.〈µα′2.c2||µ˜x′2.〈µα2.c′2||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉〉〉 i  〉 =χV 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i). 〈K1i( # »β1i , # »y1i)|∣∣∣∣µ˜x1.〈µα1. 〈µα′1.c1||µ˜x′1.c′1〉 ||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i). 〈K2i( # »β2i , # »y2i)|∣∣∣∣µ˜x2.〈µα2. 〈µα′2.c2||µ˜x′2.c′2〉 ||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉〉 i  〉 335 =Iso 〈 x ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i). 〈K1i( # »β1i , # »y1i)|∣∣∣∣µ˜x1.〈µα1. 〈x1||α1〉 ||µ˜[ # »K1j( # »β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉k]〉〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i). 〈K2i( # »β2i , # »y2i)|∣∣∣∣µ˜x2.〈µα2. 〈x2||α2〉 ||µ˜[ # »K2j( # »β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉k]〉〉 i  〉 =ηµηµ˜ 〈 x ∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈K1i( # »β1i , # »y1i)||µ˜[ # » K1j( # » β1k , # »y1k ).〈K4k( # »β1k , # »β3 , #»y3 , # »β1k )||α〉 k ]〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈K2i( # »β2i , # »y2i)||µ˜[ # » K2j( # » β2k , # »y2k ).〈K5k( # »β2k , # »β3 , #»y3 , # »β2k )||α〉 k ]〉 i  〉 =µµ˜βF1βF2 〈 x ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣µ˜  # » K4i( # » β1i , # » β3 , #»y3 , # »y1i).〈K4i( # »β1i , # »β3 , #»y3 , # »β1i)||α〉 i # » K5i( # » β2i , # » β3 , #»y3 , # »y2i).〈K5i( # »β2i , # »β3 , #»y3 , # »β2i)||α〉 i 〉 =ηF〈x||α〉 And the reverse composition of c′ and c along α and x of type F( #»C ) is equal to the identity command 〈x′||α′〉 similarly. Lemma 8.4 ((Co-)Data interchange shift instance). For any types # »C : S , # »C ′ : S ′ and (co-)data declarations data F( # »X : S ) : T where # » K : ( Γ ` F( # »X : S ) | ∆ ) data F ′( # » X ′ : S ′) : T where # » K′ : ( Γ′ ` F′( # »X ′ : S ′) | ∆′ ) codataG( # »X : S ) : Rwhere # » O : ( Γ | F( # »X : S ) ` ∆ ) codataG ′( # » X ′ : S ′) : Rwhere # » O′ : ( Γ′ | G′( # »X ′ : S ′) ` ∆′ ) F( #»C ) ≈ F′( # »C ′) implies G( #»C ) ≈ G′( # »C ′) when T = V and F( #»C ) ≈ F′( # »C ′) implies G( #»C ) ≈ G′( # »C ′) when R = N . Proof. First, suppose that the commands c1 : (x1 : F( #» C ) ` α′1 : F′( # » C ′)) and c′1 : (x′1 : F′( # » C ′) ` α1 : F( #»C )) 336 witness the isomorphism F( #»C ) ≈ F′( # »C ′). Then the isomorphism between G( #»C ) and G′( # »C ′) is established by: c2 , 〈 µ  # »O′i[ #»y′i , #»β′i ]. 〈 µα1. 〈 K′i( #» β′i , #» y′i ) ∣∣∣∣∣∣µ˜x′1.c′1〉 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » Kj( #» βj , #»yj ).〈x2||Oj[ #»yj , #»βj ]〉 j ]〉i ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣α′2 〉 : (x2:G( #» C ) ` α′2:G′( # » C ′)) c′2 , 〈 µ  # »Oi[ #»yi , #»βi ]. 〈 µα′1. 〈 Ki( #» βi , #»yi ) ∣∣∣∣∣∣µ˜x1.c1〉 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′j( #» β′j , #» y′j ).〈x′2||O′j[ #» y′j , #» β′j ]〉 j ]〉i ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣α2 〉 : (x′2:G′( # » C ′) ` α2:G( #»C )) The composition of c′2 and c2 along α2 and x2 of type G( #» C ) is equal to the identity command 〈x′2||α′2〉 via the βG, ηFV , βF′ , and ηG′ axioms as follows: 〈 µα2.c ′ 2 ∣∣∣∣µ˜x2.c2〉 , 〈 µα2. 〈 µ # »Oi[ #»yi , #»βi ]. 〈 µα′1. 〈 Ki( #» βi , #»yi) ∣∣∣∣∣∣µ˜x1.c1〉 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′j( #» β′j , #» y′j ).〈x′2||O′j [ #» y′j , #» β′j ]〉 j ]〉i ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣α2 〉 ∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜x2. 〈 µ # »O′i[ #»y′i , #»β′i ].〈µα1.〈K′i( #»β′i , #»y′i)∣∣∣∣∣∣µ˜x′1.c′1〉∣∣∣∣∣∣∣∣µ˜[ # »Kj( #»βj , #»yj ).〈x2||Oj [ #»yj , #»βj ]〉j]〉 i ∣∣∣∣∣∣ ∣∣∣∣∣∣α′2 〉〉 =ηµ 〈 µ # »Oi[ #»yi , #»βi ]. 〈 µα′1. 〈 Ki( #» βi , #»yi) ∣∣∣∣∣∣µ˜x1.c1〉 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′j( #» β′j , #» y′j ).〈x′2||O′j [ #» y′j , #» β′j ]〉 j ]〉i∣∣∣∣∣∣ ∣∣∣∣∣∣µ˜x2. 〈 µ # »O′i[ #»y′i , #»β′i ].〈µα1.〈K′i( #»β′i , #»y′i)∣∣∣∣∣∣µ˜x′1.c′1〉∣∣∣∣∣∣∣∣µ˜[ # »Kj( #»βj , #»yj ).〈x2||Oj [ #»yj , #»βj ]〉j]〉 i ∣∣∣∣∣∣ ∣∣∣∣∣∣α′2 〉〉 =µ˜N 〈 µ  # » O′i[ #» y′i , #» β′i ]. 〈 µα1. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣∣µ˜x′1.c′1〉∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ µ˜  # » Kj( #» βj , #»yj ). 〈 µ  # » Ok[ #»yk , # » βk ]. 〈 µα′1. 〈 Kk( # » βk , #»yk) ∣∣∣∣∣∣µ˜x1.c1〉∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉k ∣∣∣∣∣∣Oj [ #»yj , #»βj ]〉 j  〉 i ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣ α′2 〉 337 =βGµαµ˜x 〈 µ  # » O′i[ #» y′i , #» β′i ]. 〈 µα1. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣∣µ˜x′1.c′1〉∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣µ˜  # » Kj( #» βj , #»yj ). 〈 µα′1. 〈 Kj( #» βj , #»yj ) ∣∣∣∣∣∣µ˜x1.c1〉∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉 j  〉 i  ∣∣∣∣∣∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣∣∣∣∣∣ α′2 〉 =µ˜VηFV 〈 µ  # » O′i[ #» y′i , #» β′i ]. 〈 µα1. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣∣µ˜x′1.c′1〉∣∣∣∣∣ ∣∣∣∣∣µ˜x1. 〈 µα′1.c1 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉〉i  ∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣α ′ 2 〉 =χV 〈 µ  # » O′i[ #» y′i , #» β′i ]. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣ µ˜x′1. 〈µα′1. 〈µα1.c′1||µ˜x1.c1〉∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉〉i  ∣∣∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣∣∣α ′ 2 〉 =Iso 〈 µ  # »O′i[ #»y′i , #»β′i ]. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣ ∣∣∣∣∣µ˜x′1. 〈 µα′1. 〈 x′1 ∣∣∣∣α′1〉 ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉〉i ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣α′2 〉 =ηµηµ˜ 〈 µ  # »O′i[ #»y′i , #»β′i ]. 〈 K′i( #» β′i , #» y′i) ∣∣∣∣∣ ∣∣∣∣∣µ˜ [ # » K′l( #» β′l , #» y′l ).〈x′2||O′l[ #» y′l , #» β′l ]〉 l ]〉i ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣α′2 〉 =βF′µµ˜ 〈 µ ( # » O′i[ #» y′i , #» β′i ].〈x′2||O′i[ #» y′i , #» β′i ]〉 i )∣∣∣∣∣ ∣∣∣∣∣α′2 〉 =ηG′ 〈 x′2 ∣∣∣∣α′2〉 The composition of c2 and c′2 along α′2 and x′2 of type G′( #» C ) is equal to the identity command 〈x2||α2〉 via the βG′ , ηF′V , βF, and ηG similarly. Second, suppose that the commands c2 : (x2 : G( #» C ) ` α′2 : G′( # » C ′)) and c′2 : (x′2 : G′( # »C ′) ` α2 : G( #»C )) witness the isomorphism G( #»C ) ≈ G′( # »C ′). Then the isomorphism between F( #»C ) and F′( # »C ′) is established by: c1 , 〈 x1 ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣µ˜  # »Ki( #»βi , #»yi ). 〈 µ ( # » O′j[ #» y′j #» β′j ].〈K′j( #» y′j , #» β′j )||α′1〉 j )∣∣∣∣∣ ∣∣∣∣∣µ˜x′2. 〈µ˜α2.c′2∣∣∣∣∣∣Oi[ #»yi , #»βi ]〉 〉i〉 : (x1:F( #» C ) ` α′1:F′( # » C ′)) c′1 , 〈 x′1 ∣∣∣∣∣∣∣ ∣∣∣∣∣∣∣µ˜  # »Ki( #»βi , #»yi ). 〈 µ ( # » Oj[ #»yj #» βj ].〈Kj( #»yj , #»βj )||α1〉 j )∣∣∣∣∣ ∣∣∣∣∣µ˜x2. 〈µ˜α′2.c2∣∣∣∣∣∣O′i[ #»y′i , #»β′i ]〉 〉i〉 : (x′1:F′( # » C ′) ` α1:F( #»C )) 338 Both compositions of c and c′ are equal to the identity command analogously to the previous part by duality. With the above two isomorphisms established, we now have enough to justify the soundness of all the proposed structural laws for (co-)data declarations. Theorem 8.6 (Structural law soundness). The declaration isomorphism laws in Figures 8.4 and 8.5 are all sound. Proof. The data laws all follow by generalizing the particular instances where the data types are isomorphic: the commute, interchange, and compatibility laws are all immediate consequence of Lemmas 8.1, 8.3 and 8.3, both mix laws follow from Lemma 8.2 by taking either F1 and F′1 to be the empty data declaration of no alternatives or taking F3 and F′3 to be the unit data declaration of one alternative with no components (both of which are isomorphic by reflexivity), and the shift law follows by applying Lemma 8.4 twice. The co-data laws follow from the data laws by Lemma 8.4. Finally, there is one more property about (co-)data declarations that will be extremely useful in the following section. Namely that certain singleton (co-)data types are just trivial wrappers around another type. In the right circumstances, these wrappers can be identified with their underlying types, up to isomorphism, which lets us connect the world of (co-)data declarations with the world of actual types. Lemma 8.5 ((Co-)Data identity). a) For any data F(Θ) : RwhereK : (A : T ` F(Θ) | ), if either T = V or R = V then #»Θ  F(Θ) ≈ A. b) For any codataG(Θ) : RwhereO : ( | G(Θ) ` A : T ), if either T = N or R = N then #»Θ  G( #»Θ) ≈ A. Proof. a) Let Θ = # »X : S , suppose # »C : S , and let B = A # »{C/X}. F( #»C ) ≈ B is established by the commands: c1 , 〈x||µ˜[K(y).〈y||β〉]〉 : (x : F( #»C ) ` β : B) c2 , 〈K(y)||α〉 : (y : B ` α : F( #»C )) First, the composition of c2 and c1 along α and x of type F( #» C ) : R is equal to the identity command 〈y||β〉 by using the ηµ and ηµ˜ axioms to reveal the βF 339 redex as follows: 〈µα::R.c2||µ˜x::R.c1〉 , 〈µα::R. 〈K(y)||α〉||µ˜x::R. 〈x||µ˜[K(y).〈y||β〉]〉〉 =ηµηµ˜ 〈K(y)||µ˜[K(y).〈y||β〉]〉 =βF 〈y||µ˜y::T . 〈y||β〉〉 =µ˜T 〈y||β〉 Next, suppose that T = V . The composition of c1 and c2 along β and y of type B : V is equal to the identity command 〈x||α〉 by using the strength of the µV axiom to reveal the ηF redex as follows: 〈µβ::V .c1||µ˜y::V .c2〉 , 〈µβ::V . 〈x||µ˜[K(y).〈y||β〉]〉||µ˜y::V . 〈K(y)||α〉〉 =µV 〈x||µ˜[K(y).〈y::V||µ˜y. 〈K(y)||α〉〉]〉 =µ˜V 〈x||µ˜[K(y).〈K(y)||α〉]〉 =ηF 〈x||α〉 Otherwise, suppose that R = V. The composition of c1 and c2 along β and y of type B : T is equal to the identity command 〈x||α〉 by using the combined strength of the µV and ηF axioms to percolate out the case analysis on x and create an inner βF redex: 〈µβ::T .c1||µ˜y::T .c2〉 , 〈µβ::T . 〈x||µ˜[K(y).〈y||β〉]〉||µ˜y::T . 〈K(y)||α〉〉 =µV 〈x||µ˜x. 〈µβ::T . 〈x||µ˜[K(y).〈y||β〉]〉||µ˜y::T . 〈K(y)||α〉〉〉 =µVηF 〈x||µ˜[K(y).〈K(y)||µ˜x. 〈µβ::T . 〈x||µ˜[K(y).〈y||β〉]〉||µ˜y::T . 〈K(y)||α〉〉〉]〉 =µ˜V 〈x||µ˜[K(y).〈µβ::T . 〈K(y)||µ˜[K(y).〈y||β〉]〉||µ˜y::T . 〈K(y)||α〉〉]〉 =βFµ˜T 〈x||µ˜[K(y).〈µβ::T . 〈y||β〉||µ˜y::T . 〈K(y)||α〉〉]〉 =ηµµ˜T 〈x||µ˜[K(y).〈K(y)||α〉]〉 =ηF 〈x||α〉 b) Analogous to the proof of Lemma 8.5 (a) by duality. Internal polarized laws of declarations Now that we have established some basic structural laws about isomorphisms between general user-defined (co-)data types, we can focus on some more specific laws about the polarized types in Figure 8.1. In particular, we can show that these 340 Additive laws data F(Θ) : V where K1 : (A : V ` F(Θ) |) K2 : (B : V ` F(Θ) |) ≈⊕L data F′(Θ) : V where K′ : (A⊕B : V ` F′(Θ) | ) data F(Θ) : V where ≈0L data F′(Θ) : V where K′ : (0 : V ` F′(Θ) |) Multiplicative laws data F(Θ) : V where K : (A : V , B : V ` F(Θ) |) ≈⊗L data F′(Θ) : V where K′ : (A⊗B : V ` F′(Θ) |) data F(Θ) : V where K : (` F(Θ) |) ≈1L data F′(Θ) : V where K′ : (1 : V ` F′(Θ) |) Negation Laws data F(Θ) : V where K : (` F(Θ) | A : N ) ≈∼L data F′(Θ) : V where K′ : (∼A : V ` F(Θ) |) Shift Laws data F(Θ) : V where K : (A : S ` F(Θ) |) ≈↓SL data F′(Θ) : V where K′ : (↓SA : V ` F′(Θ) |) FIGURE 8.6. Isomorphism laws of positively polarized data sub-structures. polar types play a part in a family of isomorphisms that closely resemble some of the logical rules of the sequent calculus. Namely, each of the left rules for the positive data types and the right rules for the negative co-data types correspond to an isomorphism between (co-)data declarations with signatures matching the premises and conclusion of the rules, as shown in Figures 8.6 and 8.7. The role of using declarations for this purpose is to give enough structural substrate for stating these rules: the sequents containing multiple inputs and multiple outputs in the rules can be expressed by the types of constructors or observers, and multiple premises can be expressed by multiple alternatives for constructors or observers. And as a result, we can reason about the polarized types as sub-components within the structure of larger (co-)data types. Theorem 8.7 (Polarized sub-structure laws). The declaration isomorphism laws in Figures 8.6 and 8.7 are all sound. 341 Additive laws codataG(Θ) : V where O1 : ( | G(Θ) ` A : N ) O2 : ( | G(Θ) ` B : N ) ≈&R codataG′(Θ) : N where O′ : ( | G′(Θ) ` A&B : N ) codataG(Θ) : N where ≈>R codataG′(Θ) : N where O′ : ( | G′(Θ) ` > : N ) Multiplicative laws codataG(Θ) : N where O : (` G(Θ) | A : N , B : N ) ≈`R codataG′(Θ) : N where O′ : (` G′(Θ) | A`B : N ) codataG(Θ) : N where O : ( | G(Θ) ` ) ≈⊥R codataG′(Θ) : N where O′ : ( | G′(Θ) ` ⊥ : N ) Negation Laws codataG(Θ) : N where O : (A : V | G(Θ) ` ) ≈¬R codataG′(Θ) : N where O′ : ( | G(Θ) ` ¬A : N ) Shift Laws codataG(Θ) : N where K : ( | G(Θ) ` A : S) ≈↑SR codataG′(Θ) : N where O′ : ( | G′(Θ) ` ↑SA : N ) FIGURE 8.7. Isomorphism laws of negatively polarized co-data sub-structures. 342 Proof. Due to the (co-)data interchange laws from Figures 8.4 and 8.5 we only need to demonstrate half of the isomorphisms in Figures 8.6 and 8.7 since each side implies the other. So let us focus only on the more familiar data type declarations, because all the laws for polarized co-data sub-structures are derived from those. In each case, the main technique for establishing these laws is that, for any substitution θ matching the environment Θ, the data type F′(Θ)θ on each right-hand side is isomorphic to the single component of the single alternative under the substitution θ according to Lemma 8.5 because each of the data types are call-by-value (i.e. F′(Θ) : V). What remains is to then demonstrate that in each case, the data type F(Θ)θ is also isomorphic to that same type. The sub-structure laws for the nullary data types (0, 1) are the easiest to show. Note how for the 0L law we directly have that F(Θ) ≈ 0 as a trivial case of Lemma 8.1 (b), and F′(Θ) ≈ 0 by Lemma 8.5 (a), so together we know F(Θ) ≈ 0 ≈ F′(Θ). Similarly for the 1L law, we have F(Θ) ≈ 1 as a trivial case of Lemma 8.1 (a), and so we get F(Θ) ≈ 1 ≈ F′(Θ) from Lemma 8.5 (a) as well. The sub-structural laws for the unary data types (∼, ↓S) follow a different line of reasoning, but are not much more difficult to demonstrate. For instance, consider the negating ∼L law, where we know that F(Θ)θ ≈ ∼Aθ by Lemma 8.3 (b) because Aθ ≈ Aθ by reflexivity, and as usual F′(Θ)θ ≈ ∼Aθ by Lemma 8.5 (a). Additionally, the shifting ↓SL law is sound because we know that F(Θ)θ ≈ ↓SAθ by Lemma 8.3 (a) because of the reflexive isomorphism Aθ ≈ Aθ, and F′(Θ)θ ≈ ↓SAθ by Lemma 8.5 (a). And finally, the sub-structural laws for the binary (co-)data types (⊕,⊗) require the most effort. This is because each of these types have two parts, and so we must relate one part at a time and then mix the result together. In particular, we know that F1(Θ)θ ≈ F′1(A,B)θ and F2(Θ)θ ≈ F′2(A,B)θ for the declarations data F1(Θ) : V where K1 : (A : V ` F1(Θ) | ) data F′1(X : V , Y : V) : V where K′1 : (X : V ` F′1(Θ) | ) data F2(Θ) : V where K2 : (B : V ` F2(Θ) | ) data F′2(X : V , Y : V) : V where K′2 : (Y : V ` F′2(Θ) | ) by applying Lemma 8.3 (a) to the reflexive isomorphisms Aθ ≈ X {Aθ/X} and Bθ ≈ Y {Bθ/Y }. Now note the two different ways to mix these isomorphisms together with Lemma 8.2. First, we could mix the above F1(Θ)θ ≈ F′1(A,B)θ and F2(Θ)θ ≈ 343 F′2(A,B)θ as the first two isomorphisms while the third is F3(Θ)θ ≈ F′3(A,B)θ given by Lemma 8.1 (a) of the trivial data declarations data F3(Θ) : V where K3 : ( ` F3(Θ) | ) data F′3(X : V , Y : V) : V where K′3 : ( ` F′3(X, Y ) | ) which tells us that F(Θ)θ ≈ A ⊕ B as required by the ⊕L law. Second, we could mix F1(Θ)θ ≈ F′1(A,B)θ and F2(Θ)θ ≈ F′2(A,B)θ as the second two isomorphisms while the first is F0(Θ)θ ≈ F′0(A,B)θ by Lemma 8.1 (b) of the trivial data declarations data F3(Θ) : V where and data F′3(X : V , Y : V) : V where which tells us that F(Θ)θ ≈ A⊗B as required by the ⊗L law. In addition to the specific laws of Figures 8.6 and 8.7, each of the polarized connectives is compatible with isomorphism. For example, if we have A ≈ A′, then we also have A ⊕ B ≈ A′ ⊕ B and B ⊕ A ≈ B ⊕ A′. This fact lets us apply type isomorphisms within the context of certain larger types: if two types are isomorphic, then we can build on them with polarized connectives however we want and still have an isomorphism. Said another way, for any type A made from polarized connectives and any other isomorphic types B ≈ C, we can substitute both B and C for X in A and still have the isomorphism A {B/X} ≈ A {C/X}. Theorem 8.8 (Polarized isomorphism substitution). For any Θ, X : S `P A : T , Θ `G B : S, and Θ `G C : S, if Θ  B ≈ C then Θ  A {B/X} ≈ A {C/X}. Proof. By induction on the typing derivation of Θ, X : S `P A : T and the fact that each polarized connective is compatible with isomorphism. For example, in the case of ⊗, given that A1 ≈ A′1 and A2 ≈ A′2 we have that data F1() : V where K : (A1 ⊗ A2 : V ` F1() | ) ≈⊗L data F2() : V where K : (A1 : V , A2 : V ` F2() | ) ≈ data F3() : V where K : (A′1 : V , A′2 : V ` F3() | ) ≈⊗L data F4() : V where K : (A′1 ⊗ A′2 : V ` F4() | ) 344 by the ⊗L, data compatibility, and data mix laws, and so A1 ⊗ A2 ≈ F1() ≈ F4() ≈ A′1 ⊗ A′2 by Lemma 8.5. The compatibility of the other polarized connectives follows similarly. Laws of the Polarized Basis We have just seen in the previous section that there is an encoding of user-defined (co-)data types solely in terms of the basic polarized connectives. However, how do we know that this encoding is canonical, or that there are not many different and unrelated encodings for the same purpose? Does it matter what order in which the components of (co-)data types are put together, or in which way they are nested? Or maybe we could instead encode (co-)data types in terms of the positive ⊕ and ⊗ connectives instead of the negative & and `? As it turns out, none of these differences matter. The advantage of using the polarized connectives, as declared in Figure 8.1, as the basis for encodings is that they exhibit many pleasant—if none too surprising—properties, some of which have been explored previously by Zeilberger (2009) and Munch-Maccagnoni (2013). That is, in contrast with types like call-by-name tuples or call-by-value functions, the relationships between types that we should expect—corresponding to common and well-known relationships from algebra and logic—are actually isomorphisms between polarized types even in the face of effects that let terms avoid giving a result. Algebraic laws Let’s begin by first exploring the algebraic properties of the polarized connectives. In particular, the isomorphic relationship between the additive and multiplicative connectives from Figure 8.1 – On the positive side, the ⊕ and 0 connectives form a commutative monoid of types up to isomorphism—meaning they satisfy commutative, associative, and unit laws as isomorphisms between types—and so do the ⊗ and 1 connectives. Furthermore, all four together form a commutative semiring up to isomorphism— meaning the “multiplication” ⊗ distributes over the “addition” ⊕ and is annihilated by the “zero” 0. 345 A⊕B ≈ B ⊕ A (A⊕B)⊕ C ≈ A⊕ (B ⊕ C) 0⊕ A ≈ A ≈ A⊕ 0 A⊗B ≈ B ⊗ A (A⊗B)⊗ C ≈ A⊗ (B ⊗ C) 1⊗ A ≈ A ≈ A⊗ 1 A⊗ (B ⊕ C) ≈ (A⊗B)⊕ (A⊗ C) (A⊕B)⊗ C ≈ (A⊗ C)⊕ (B ⊗ C) A⊗ 0 ≈ 0 ≈ 0⊗ A A&B ≈ B & A (A&B) & C ≈ A& (B & C) >& A ≈ A ≈ A&> A`B ≈ B ` A (A`B)` C ≈ A` (B ` C) ⊥` A ≈ A ≈ A`⊥ A` (B & C) ≈ (A`B) & (A` C) (A&B)` C ≈ (A` C) & (B ` C) A`> ≈ > ≈ >` A FIGURE 8.8. Algebraic laws of the polarized basis of types. – On the negative side, the & and > connectives form a commutative monoid up to isomorphism and ` and ⊥ do as well. All four together form a commutative semiring with & as addition and ` as mutiplication. These properties of the additive and multiplicative connectives are summarized in Figure 8.8. We can verify that each of these isomorphisms are, in fact, isomorphisms using the previously-established laws of (co-)data declarations in general and internal polarized- substructures in particular from Figures 8.4, 8.5, 8.6 and 8.7. The general technique follows the observation that, because of Lemma 8.5, if we have either a singleton data declaration isomorphism or a singleton (co-)data declaration isomorphism of the form: data F() : V where K : (A : V ` F() | ) ≈ data F ′() : V where K′ : (A′ : V ` F′() | ) or codataG() : N where O : ( | G() ` A : N ) ≈ codataG ′() : N where O′ : ( | G′() ` A′ : N ) then we have A ≈ A′ by composing A ≈ F() ≈ F′() ≈ A′ or A ≈ G() ≈ G′() ≈ A′. Therefore, we can prove isomorphism laws about the polarized (co-)data types by (1) placing both sides of the proposed isomorphism within a singleton data or 346 co-data type, as appropriate, (2) “unpacking” the two sides within the structure of the containing (co-)data type declaration, and (3) use the laws of declaration isomorphisms to show the two sides are indeed isomorphic. Each of these algebraic laws can be derived from the laws in Figures 8.6 and 8.7 as follows. Commutativity The commutativity laws for reordering the binary connectives, unsurprisingly, follows from the commutativity laws for reordering the parts of declarations. For the multiplicative ⊗ and `, we use the first commute law to reorder the components within a single constructor or observer, as follows: data F1() : V whereK : (A⊗B : V ` F1() | ) ≈⊗L data F2() : V whereK : (A : V , B : V ` F2() | ) ≈ data F3() : V whereK : (B : V , A : V ` F3() | ) ≈⊗L data F4() : V whereK : (B ⊗ A : V ` F4() | ) codataG1() : N whereO : ( | G1() ` A`B : N ) ≈`R codataG2() : N whereO : ( | G2() ` A : N , B : N ) ≈ codataG3() : N whereO : ( | G3() ` B : N , A : N ) ≈`L codataG4() : N whereO : ( | G4() ` B ` A : N ) Whereas for the additive ⊕ and &, we use the second commute law to reorder the alternatives within a declaration as shown in the following isomorphism: data F1() : V whereK : (A⊕B : V ` F1() | ) ≈⊕L data F2() : V whereK1 : (A : V ` F2() | ) K2 : (B : V ` F2() | ) ≈ data F3() : V whereK2 : (B : V ` F3() | ) K1 : (A : V ` F3() | ) ≈⊕L data F4() : V whereK : (B ⊕ A : V ` F4() | ) 347 codataG1() : N whereO : ( | G1() ` A&B : N ) ≈&R codataG2() : N whereO1 : ( | G2() ` A : N ) O2 : ( | G2() ` B : N ) ≈ codataG3() : N whereO2 : ( | G3() ` B : N ) O1 : ( | G3() ` A : N ) ≈&R codataG4() : N whereO : ( | G4() ` B & A : N ) Unit Combining the binary connectives with their corresponding units is an identity operation that leaves types unchanged, up to isomorphism. These unit laws rely on the fact that the right and left laws for the nullary connectives “cancel out,” in an appropriate way, any occurence of the nullary connective within a (co-)data declaration as described by the 1L, 0L, ⊥R, and >R laws. For the multiplicative 1 and ⊥ connectives, we use the fact that 1 vanishes from the left-hand side of a constructor and ⊥ vanishes from the right-hand side of an observer: data F1() : V whereK : (1⊗ A : V ` F1() | ) ≈⊗L data F2() : V whereK : (1 : V , A : V ` F2() | ) ≈1L data F3() : V whereK : (A : V ` F3() | ) ≈1L data F4() : V whereK : (A : V , 1 : V ` F4() | ) ≈⊗L data F5() : V whereK : (A⊗ 1 : V ` F5() | ) codataG1() : N whereO : ( | G1() ` ⊥` A : N ) ≈`R codataG2() : N whereO : ( | G2() ` ⊥ : N , A : N ) ≈⊥R codataG3() : N whereO : ( | G3() ` A : N ) ≈⊥R codataG4() : N whereO : ( | G4() ` A : N ,⊥ : N ) ≈`R codataG5() : N whereO : ( | G5() ` A`⊥ : N ) Note the use of the mix law to extend 1L and ⊥R to allow for an extra component along side the unit connective. Alternatively, for the additive 0 and > connectives, we use the fact that any constructor containing a 0 on its left-hand side completely 348 vanishes itself, whereas an observer containing a > on its right-hand side vanishes: data F1() : V whereK : (0⊕ A : V ` F1() | ) ≈⊕L data F2() : V whereK1 : (0 : V ` F2() | ) K2 : (A : V ` F2() | ) ≈0L data F3() : V whereK : (A : V ` F3() | ) ≈0L data F4() : V whereK1 : (A : V ` F4() | ) K2 : (0 : V ` F4() | ) ≈⊕L data F5() : V whereK : (A⊕ 0 : V ` F5() | ) codataG1() : N whereO : ( | G1() ` >& A : N ) ≈&R codataG2() : N whereO1 : ( | G2() ` > : N ) O2 : ( | G2() ` A : N ) ≈>R codataG3() : N whereO : ( | G3() ` A : N ) ≈>R codataG4() : N whereO1 : ( | G4() ` A : N ) O2 : ( | G4() ` > : N ) ≈&R codataG5() : N whereO : ( | G5() ` A&> : N ) Again, the mix law is used to extend 0L and >R for (co-)data declarations with another alternative. Associativity Nested applications of the same binary connective can be reassociated, up to isomorphism. This is because (co-)data declarations are “flat:” there is a single, flat list of alternative, with each one containing a single, flat list of components on either side of the turnstyle. Therefore, after we fully unpack a nested application of a connective, it flattens out, so that we may repack the same parts back together in the other order. 349 For the multiplicative ⊗ and `, we have the following isomorphism: data F1() : V whereK : ((A⊗B)⊗ C : V ` F1() | ) ≈⊗L data F2() : V whereK : (A⊗B : V , C : V ` F2() | ) ≈⊗L data F3() : V whereK : (A : V , B : V , C : V ` F3() | ) ≈⊗L data F4() : V whereK : (A : V , B ⊗ C : V ` F4() | ) ≈⊗L data F5() : V whereK : (A⊗ (B ⊗ C) : V ` F5() | ) codataG1() : N whereO : ( | G1() ` (A`B)` C : N ) ≈`R codataG2() : N whereO : ( | G2() ` A`B : N , C : N ) ≈`R codataG3() : N whereO : ( | G3() ` A : N , B : N , C : N ) ≈`R codataG4() : N whereO : ( | G4() ` A : N , B ` C : N ) ≈`R codataG4() : N whereO : ( | G4() ` A` (B ` C) : N ) Note that the mix law is used to extend ⊗L and `R to allow for an extra component on either side of the main pair. For the additive ⊕ and &, we have the following isomorphisms, again using mix to extend ⊕L and &R to allow for an extra alternative before or after the main pair: data F1() : V whereK : ((A⊕B)⊕ C : V ` F1() | ) ≈⊕L data F2() : V whereK1 : (A⊕B : V ` F2() | ) K2 : (C : V ` F2() | ) ≈⊕L data F3() : V whereK1 : (A : V ` F3() | ) K2 : (B : V ` F3() | ) K3 : (C : V ` F3() | ) ≈⊕L data F4() : V whereK1 : (A : V ` F4() | ) K2 : (B ⊕ C : V ` F4() | ) ≈⊕L data F5() : V whereK : (A⊕ (B ⊕ C) : V ` F5() | ) 350 codataG1() : N whereO : ( | G1() ` (A&B) & C : N ) ≈&R codataG2() : N whereO1 : ( | G2() ` A&B : N ) O2 : ( | G2() ` C : N ) ≈&R codataG3() : N whereO1 : ( | G3() ` A : N ) O2 : ( | G3() ` B : N ) O3 : ( | G3() ` C : N ) ≈&R codataG4() : N whereO1 : ( | G4() ` A : N ) O2 : ( | G4() ` B & C : N ) ≈&R codataG5() : N whereO : ( | G5() ` A& (B & C) : N ) Distributivity Distributing a multiplication over an addition also arises from the flat nature of (co-)data declarations much like reassociating a binary connective. The difference is that when the addition is flattened out into the structure of the declaration, the multiplied type is carried along for the ride (via the mix law) and copied across both alternatives, as shown in the following isomorphisms: data F1() : V whereK : (A⊗ (B ⊕ C) : V ` F1() | ) ≈⊗L data F2() : V whereK : (A : V , B ⊕ C : V ` F2() | ) ≈⊕L data F3() : V whereK1 : (A : V , B : V ` F3() | ) K2 : (A : V , C : V ` F3() | ) ≈⊗L data F4() : V whereK1 : (A⊗B : V ` F4() | ) K2 : (A : V , C : V ` F4() | ) ≈⊗L data F5() : V whereK1 : (A⊗B : V ` F5() | ) K2 : (A⊗ C : V ` F5() | ) ≈⊕L data F6() : V where K : ((A⊗B)⊕ (A⊗ C) : V ` F6() | ) 351 codataG1() : N whereO : ( | G1() ` A` (B & C) : N ) ≈`R codataG2() : N whereO : ( | G2() ` A : N , B & C : N ) ≈&R codataG3() : N whereO1 : ( | G3() ` A : N , B : N ) O2 : ( | G3() ` A : N , C : N ) ≈`R codataG4() : N whereO1 : ( | G4() ` A`B : N ) O2 : ( | G4() ` A : N , C : N ) ≈`R codataG5() : N whereO1 : ( | G5() ` A`B : N ) O2 : ( | G5() ` A` C : N ) ≈&R codataG6() : N where O : ( | G6() ` (A`B) & (A` Cx) : N ) Annihilation When a type is multiplied by the additive unit, it is cancelled out. This occurs because, unlike an addition, the multiplication places the type next to the unit where it is in harms way. Thus, when 0L and >R are extended (by the mix law) to allow for an extra component alongside the units, it is swept aside as the entire alternative is deleted, as in the following isomorphisms: data F1() : V whereK : (A⊗ 0 : V ` F1() | ) ≈⊗L data F2() : V whereK : (A : V , 0 : V ` F2() | ) ≈0L data F3() : V where ≈0L data F4() : V whereK : (0 : V , A : V ` F4() | ) ≈⊗L data F5() : V whereK : (0⊗ A : V ` F5() | ) codataG1() : N whereO : ( | G1() ` A`> : N ) ≈`R codataG2() : N whereO : ( | G2() ` A : N ,> : N ) ≈>R codataG3() : N where ≈>R codataG4() : N whereO : ( | G4() ` > : N , A : N ) ≈`R codataG5() : N whereO : ( | G5() ` >` A : N ) 352 ∼(A&B) ≈ (∼A)⊕ (∼B) ∼> ≈ 0 ∼(A`B) ≈ (∼A)⊗ (∼B) ∼⊥ ≈ 1 ∼(¬A) ≈ A ¬(A⊕B) ≈ (¬A) & (¬B) ¬0 ≈ > ¬(A⊗B) ≈ (¬A)` (¬B) ¬1 ≈ ⊥ ¬(∼A) ≈ A FIGURE 8.9. De Morgan duality laws of the polarized basis of types. Duality laws Isomorphism of types also gives us common logical properties of the polarized connectives based on duality, established with the same technique used in Section 8.4. In particular, we get two parallel copies of the De Morgan laws—one for ∼ negation and the other for ¬ negation—that relates the positive data types with the negative co-data types as shown in Figure 8.9. The positive “or” (⊕) is dualized into the negative “and” (&) and the positive “and” (⊗) is dualized into the negative “or” (`). Additionally, the two negation connectives cancel each other out, up to isomorphism. That is to say, they are a characterization of involutive negation as (co-)data types.5 Remark 8.1. Note that, while the polarized basis for (co-)data types from Figure 8.1 is nicely symmetric, it is also somewhat redundant. The fact that the negation connectives are involutive and follow the de Morgan laws means that we get the following derived type isomorphisms: A⊕B ≈ ∼((¬A) & (¬B)) A&B ≈ ¬((∼A)⊕ (∼B)) A⊗B ≈ ∼((¬A)` (¬B)) A`B ≈ ¬((∼A)⊗ (∼B)) The consequence of these isomorphisms is that we only really need half the listed additive and multiplicative connectives. We could, for example, take only ⊕, 0, ⊗, 1, ∼, and ¬ and primitive connectives and encode &, >, `, and ⊥ in terms of them as above. Dually, we could take &, >, `, ⊥, ∼, and ¬ as primitive and encode ⊕, 0, ⊗, 1. Or we could instead mix and match between the positive (data) and negative 5This fact was noticed by Zeilberger (2009) and further brought to the forefront by Munch- Maccagnoni (2014). The key is to have two dual negations, where one can be encoded with implication (¬A ≈ A→ ⊥) and its dual can be encoded with subtraction (∼A ≈ 1−A). 353 (co-data) forms as desired. The only requirement is that we have at least one binary and nullary additive connective, one binary and nullary multiplicative connective, and both dual involutive negations. End remark 8.1. Involutive negation Double negation elimination (both the positive ∼(¬A) ≈ A and negative ¬(∼A) ≈ A forms) is perhaps deceptively simple: the ∼L and ¬R laws just flip the double-negated type back and forth across the turnstyle until both negations disappear: data F1() : V whereK : (∼(¬A) : V ` F1 | ) ≈∼L data F2() : V whereK : ( ` F2 | ¬A : N ) ≈¬R data F3() : V whereK : (A : V ` F3 | ) codataG1() : V whereO : ( | G1 ` ¬(∼A) : N ) ≈¬R codataG2() : V whereO : (∼A : V | G2 ` ) ≈∼L codataG3() : V whereO : ( | G3 ` A : N ) Note that in the case of ∼(¬A), ¬R must be used in a data declaration instead of a co-data declaration, and likewise ∼L must be used in a co-data declaration for ¬(∼A). This can be accomplished with the (co-)data interchange laws, that let us convert each data isomorphism from Figures 8.6 and 8.6 into a co-data isomorphism and vice versa. Constant negation Involutive negation converts “true” into “false” and “false” into “true,” but it also swaps between the data and co-data formulations of each. For the multiplicative units, the data type 1 for true is the negation of the co-data type ⊥ for false because both represent a (co-)data type with one alternative containing nothing: data F1() : V whereK : (∼⊥ : V ` F1 | ) ≈∼L data F2() : V whereK : ( ` F2 | ⊥ : N ) ≈⊥R data F3() : V whereK : ( ` F3 | ) ≈1L data F4() : V whereK : (1 : V ` F4 | ) 354 codataG1() : N whereO : ( | G1 ` ¬1 : N ) ≈¬R codataG2() : N whereO : (1 : V | G2 ` ) ≈1L codataG3() : N whereO : ( | G3 ` ) ≈⊥R codataG4() : N whereO : ( | G4 ` ⊥ : N ) For the additive units, the data type 0 for false is the negation of the co-data type > for true because both represent a (co-)data type with no alternatives: data F1() : V whereK : (∼> : V ` F1() | ) ≈∼L data F2() : V whereK : ( ` F2() | > : N ) ≈>R data F3() : V where ≈0L data F4() : V whereK : (0 : V ` F4() | ) codataG1() : V whereO : ( | G1() ` ¬0 : N ) ≈¬R codataG2() : V whereO : (0 : V | G2() ` ) ≈0L codataG3() : V where ≈>R codataG4() : V whereO : ( | G4() ` > : N ) De Morgan laws Involutive negation also converts “and” into “or” and “or” into “and” while interchanging data with co-data. For the multiplicatives, the connective ⊗ is an “and” pair that amalgamates two pieces of data into a single structure, and this is the negation of ` which is an “or” pair that conjoins two observers together: data F1() : V whereK : (∼(A`B) : V ` F1() | ) ≈∼L data F2() : V whereK : ( ` F2() | A`B : N ) ≈`R data F3() : V whereK : ( ` F3() | A : N , B : N ) ≈∼L data F4() : V whereK : (∼A : V ` F4() | B : N ) ≈∼L data F5() : V whereK : (∼A : V ,∼B : V ` F5() | ) ≈⊗L data F6() : V whereK : ((∼A)⊗ (∼B) : V ` F6() | ) 355 codataG1() : N whereO : ( | G1() ` ¬(A⊗B) : N ) ≈¬R codataG2() : N whereO : (A⊗B : V | G2() ` ) ≈⊗L codataG3() : N whereO : (A : V , B : V | G3() ` ) ≈¬R codataG4() : N whereO : (A : V | G4() ` ¬B : N ) ≈¬R codataG5() : N whereO : ( | G5() ` ¬A : N ,¬B : N ) ≈`R codataG6() : N whereO : ( | G6() ` (¬A)` (¬B) : N ) For the additives, the connective ⊕ is an “or” that yields one of two possible alternative types of answers, and this is the negation of & which gives observers the option of one of two possible types of questions: data F1() : V whereK : (∼(A&B) : V ` F1() | ) ≈∼L data F2() : V whereK : ( ` F2() | A&B : N ) ≈&R data F3() : V whereK1 : ( ` F3() | A : N ) K2 : ( ` F3() | B : N ) ≈∼L data F4() : V whereK1 : (∼A : V ` F4() | ) K2 : ( ` F4() | B : N ) ≈∼L data F5() : V whereK1 : (∼A : V ` F5() | ) K2 : (∼B : V ` F5() | ) ≈⊕L data F6() : V whereK : ((∼A)⊕ (∼B) : V ` F6() | ) codataG1() : N whereO : ( | G1() ` ¬(A⊕B) : N ) ≈¬R codataG2() : N whereO : (A⊕B : V | G2() ` ) ≈⊕L codataG3() : N where O1 : (A : V | G3() ` ) O2 : (B : V | G3() ` ) ≈⊕L codataG4() : N where O1 : ( | G4() ` ¬A : N ) O2 : (B : V | G4() ` ) ≈⊕L codataG5() : N where O1 : ( | G5() ` ¬A : N ) O2 : ( | G5() ` ¬B : N ) ≈&R codataG6() : N whereO : ( | G6() ` (¬A) & (¬B) : N ) 356 Shift laws The last group of polarized connectives, the shifts, have not appeared in any of the algebraic or duality laws here. That is partially because their role is not to represent the structural aspects of (co-)data types—like the ability to contain several components or offer multiple alternatives—but instead serve to explicitly signal the mechanisms, like the ability to delay a computation and force it later, that integrate different evaluation strategies. In fact, the presence of shifts have the effect of prohibiting the usual algebraic and dual laws of polarized types which we see in practice in functional programming languages. Returning to the examples of unfaithful encodings from the introduction, consider again the problem of encoding triples in terms of pairs in a call-by-name language like Haskell, where lazy pairs are described by the ×N data type declared previously in Section 8.1, and lazy triples are represented as: data LazyTriple(X:N , Y :N , Z:N ) : N where L3 : (X:N , Y :N , Z:N ` LazyTriple(X, Y, Z) | ) By applying the polarization encoding from Figure 8.3 to a collection of declarations G containing both ×N and LazyTriple, we get that6 X : N , Y : N , Z : N  JLazyTriple(X, Y, Z)KG ≈ ⇑(↓X ⊗ (↓Y ⊗ ↓Z)) X : N , Y : N , Z : N  JX ×N (Y ×N Z)KG ≈ ⇑(↓X ⊗ ↓⇑(↓Y ⊗ ↓Z)) but these two types represent very different space of possible program behaviors because of the extra shifts in the encoding of X ×N (Y ×N Z). In other words, the difference between the two is that the type X ×N (Y ×N Z) allows for extra values like PairN (x, µ . 〈y||β〉), where µ . 〈y||β〉 is a term that does not return any result, but LazyTriple(X, Y, Z) does not, which is explicitly expressed by the presence or absence of shifts in their encoding. Furthermore, whereas we can apply properties like associativity of ⊗ within the encoding of LazyTriple(X, Y, Z), where X : N , Y : N , Z : N  ⇑(↓X ⊗ (↓Y ⊗ ↓Z)) ≈ ⇑((↓X ⊗ ↓Y )⊗ ↓Z) 6More specifically, the immediate output of translation is JLazyTriple(X,Y, Z)KG , ⇑((↓X⊗(↓Y ⊗ (↓Z ⊗ 1))) ⊕ 0) and JX ×N Y KG , ⇑((↓X ⊗ (↓Y ⊗ 1)) ⊕ 0), which is cleaned up as shown by the laws in Section 8.4. 357 ↓VA ≈ A ≈ V⇑A ↑NA ≈ A ≈ N⇓A FIGURE 8.10. Identity laws of the redundant self-shift connectives. this is blocked by the extra shifts in ⇑(↓X ⊗ ↓⇑(↓Y ⊗ ↓Z)), which prevent the law from applying. We can also view the troubles with currying in a call-by-value language like ML in terms of extra shifts by with the representation of call-by-value functions as the →V co-data type previously declared in Section 8.1, whose encoding simplifies to X : V , Y : V  JX →V Y KG ≈ ⇓(¬X ` ↑Y ) Again, the shifts get in the way when we try to apply the algebraic or logical laws of the polarized connectives. The type of uncurried call-by-value functions is X : V , Y : V , Z : V  J(X ⊗ Y )→V ZK ≈ ⇓(¬(X ⊗ Y )` ↑Z) ≈ ⇓(¬X ` ¬Y ` ↑Z) whereas the type of curried call-by-value functions is X : V , Y : V , Z : V  JX →V (Y →V Z)K ≈ ⇓(¬X ` ↑⇓(¬Y ` ↑Z)) which is not the same because of the extra shifts. This does not mean that the shifts are completely lawless, however. Since we began with a large family of shifts—singleton data and co-data type constructors mapping between any kind S and V or N—some of them turn out to be redundant as shown in Figure 8.10. The data shifts ↓V and V⇑ for wrapping call-by-value type as another call- by-value type and the co-data shifts ↑N and N⇓ for doing the same to call-by-name types are all identity operations on types, up to isomorphism. In particular, the data declarations for ↓V and V⇑ are the simplest instance of Lemma 8.5 (a) which means that ↓VA ≈ A ≈ V⇑A, and likewise ↑NA ≈ A ≈ N⇓A because of Lemma 8.5 (b). This fact tells us that the polarizing translation on already-polarized types is actually an identity up to isomorphism, i.e. for any Θ `P A : S, it follows that Θ  JAKP ≈ A. For example, we have X : V , Y : V  JX ⊕ Y KP , V⇑(((↓VX ⊗ 1)⊕ (↓VY ⊗ 1))⊕ 0) ≈ X ⊕ Y 358 for the additive data type, and X : V  J¬XKP , N⇓(>& (⊥` (¬↓VX))) ≈ ¬X for the negation co-data type, justifying our rule of thumb for deciding the appropriate strategies for the polarized basis P of (co-)data types. Functional laws So far, our attention has been largely focused on properties of the polarized (co-)data types from Figure 8.1, some of which, like `, are unfamiliar as programming constructs. But what about a more familiar construct like functions? We have seen that call-by-value functions don’t behave as nicely as we’d like, which can be understood as unfortunate extra shifts between evaluation strategies. So is there a type of function that avoids these problems? As it turns out, the mixed-polarity, “primordial” (Zeilberger, 2009) function type that we considered in Chapter IV captures the best of both the call-by-value and call-by-name worlds, which is represented by the co-data declaration: codata (X:V → Y :N ) : N where · : (X : V | X → Y ` Y : N ) The particular placement of V and N again follows the rule of thumb from Section 8.1, so as a consequence the polarized encoding for A → B avoids any impactful shifts. Because of the identity laws for shifts from Figure 8.10, the polarizing encoding for the above declaration G simplifies down to just ¬ and `: X : V , Y : N  JX → Y KG ≈ N⇓(¬(↓VX)` (↑NY )) ≈ ¬X ` Y This gives us the most primitive expression of functions in our multi-strategy language; the rest can be encoded in terms of the above polarized function type by adding back the extra shifts. Alternatively, we could have chosen to replace the unfamiliar ` with this function type. Because of the involutive nature of the ¬ and ∼ negations, we have the following encoding of ` in terms of → and ∼: A ` B ≈ ¬(∼A) ` B ≈ (∼A) → B. Certainly functions are more familiar than ` as a programming construct, but the cost of leaning on this familiarity is the loss of symmetry because functions are a “half-negated or.” In 359 A→ B ≈ (∼B)→ (¬A) (A⊗B)→ C ≈ A→ (B → C) 1→ A ≈ A A→ ⊥ ≈ ¬A A→ (B & C) ≈ (A→ B) & (A→ C) (A⊕B)→ C ≈ (A→ C) & (B → C) A→ > ≈ > 0→ A ≈ > ∼(A→ B) ≈ A⊗ (∼B) A→ (¬B) ≈ ¬(A⊗B) FIGURE 8.11. Derived laws of polarized functions. particular, we can recast all of the algebraic and logical laws about ` in terms of→ as shown in Figure 8.11, some of which are familiar properties of implication, that are all derived from the encoding A→ B ≈ ¬A`B. The commutativity, associativity, and unit laws of the underlying ` give us contrapositive, currying, thunking, and negating laws: A→ B ≈ (¬A)`B ≈ B ` (¬A) ≈ (¬(∼B))` (¬A) ≈ (∼B)→ (¬A) (A⊗B)→ C ≈ (¬(A⊗B))` C ≈ ((¬A)` (¬B))` C ≈ (¬A)` ((¬B)` C) ≈ A→ (B → C) 1→ A ≈ (¬1)` A ≈ ⊥` A ≈ A A→ ⊥ ≈ (¬A)`⊥ ≈ ¬A Likewise, distributing ` over & and annihilating it with > recognize certain functions as products or trivial units: A→ (B & C) ≈ (¬A)` (B & C) ≈ ((¬A)`B) & ((¬A)` C) ≈ (A→ B) & (A→ C) (A⊕B)→ C ≈ (¬(A⊕B))` C ≈ ((¬A) & (¬B))` C ≈ ((¬A)` C) & ((¬B)` C) ≈ (A→ C) & (B → C) A→ > ≈ (¬A)`> ≈ > 0→ A ≈ (¬0)` A ≈ >` A ≈ > 360 And finally, the De Morgan duality between ` and ⊗ tells us that the continuation of a function is a pair, and that a continuation for a pair is a function: ∼(A→ B) ≈ ∼((¬A)`B) ≈ (∼(¬A))⊗ (∼B) ≈ A⊗ (∼B) A→ (¬B) ≈ (¬A)` (¬B) ≈ ¬(A⊗B) The Faithfulness of Polarization Now that we have laid down some laws for declaration isomorphisms, we can put them to use for encoding user-defined (co-)data types in terms of the polarized connectives from Figure 8.1. In particular, we can extend the laws from Figures 8.6 and 8.7 for polarized sub-structures appearing within a simple singleton declaration to apply to any general (co-)data type using the mix laws from Figures 8.4 and 8.5. For example, given a declaration of the form data F(Θ) : V where K0 : (Γ0, A : V , B : V ` F(Θ) | ∆0) # »K : (Γ ` F(Θ) | ∆) we can combine the A and B components of the K0 constructor with the ⊗ connective by starting with the ⊗L law, and then building up to the full declaration of F by applying the mix law to the appropriate reflexive isomorphisms as discussed in Section 8.3 as follows: data F(Θ) : V where K : (A : V , B : V ` F(Θ) | ) ≈ data F ′(Θ) : V where K′ : (A⊗B : V ` F′(Θ) | ) data F(Θ) : V where K0 : (A : V , B : V ,Γ0 ` F(Θ) | ∆0) ≈ data F ′(Θ) : V where K′0 : (A⊗B : V ,Γ0 ` F′(Θ) | ∆0) data F(Θ) : V where K0 : (A : V , B : V ,Γ0 ` F(Θ) | ∆0) # »K : (Γ ` F(Θ) | ∆) ≈ data F′(Θ) : V where K′0 : (A⊗B : V ,Γ0 ` F′(Θ) | ∆0) # » K : (Γ ` F′(Θ) | ∆) Other combinations of components at different positions in constructors of F can be targeting with the commute laws for data declarations. This idea is the central technique of the encoding, which just repeats the above procedure until we are left 361 with only a singleton (co-)data type that “wraps” its encoding. First we consider how to encode a just one (co-)data type declaration in terms of polarized connectives. Theorem 8.9 (Polarizing (co-)data declarations). For any S validating χS , a) given data F(Θ) : Swhere # » Ki : ( # » Aij : Tijj ` F(Θ) | # »Bij : Rijj )i ∈ G, we have Θ  F(Θ) ≈ JF(Θ)KG, and b) given codataG(Θ) : Swhere # » Oi : ( # » Aij : Tijj | G(Θ) ` # »Bij : Rijj )i ∈ G, we have Θ  G(Θ) ≈ JG(Θ)KG. Proof. a) Observe that we have the following data isomorphism by extending the polarized laws from Figure 8.6 with the mix and commute laws from Figure 8.4: data F1(Θ) : V where # » Ki : ( # » Aij : Tijj ` F1(Θ) | # »Bij : Rijj )i ≈↓L data F2(Θ) : V where # » Ki : ( # »↓TijAij : V j ` F2(Θ) | # »Bij : Rijj )i ≈↑R data F3(Θ) : V where # » Ki : ( # »↓TijAij : V j ` F3(Θ) | # »↑RijBij : N j )i ≈∼L data F4(Θ) : V where # » Ki : ( # »↓TijAij : V j , # »∼(↑RijBij) : V j ` F4(Θ) | )i ≈1L,⊗L data F5(Θ) : V where # » Ki : (⊗( # »↓TijAijj, # »∼(↑RijBij)j) : V ` F5(Θ) | ) i ≈0L,⊕L data F6(Θ) : V where K : ⊕ # »⊗( # »↓TijAijj, # »∼(↑RijBij)j) i  : V ` F6(Θ) |  With the above isomorphism between F1 and F6, it follows from the data shift law that: data F(Θ) : Swhere # » Ki : ( # » Aij : Tijj ` F(Θ) | # »Bij : Rijj )i ≈ data F′(Θ) : SwhereK : ⊕ # »⊗( # »↓TijAijj, # »∼(↑RijBij)j) i  : V ` F′(Θ) |  362 We then get that Θ  F′(Θ) ≈ S⇑ ⊕ # »⊗( # »↓TijAijj, # »∼(↑RijBij)j) i  , JF(Θ)KG by applying Lemma 8.3 (a) to the reflexive isomorphism of ⊕ # »⊗( # »↓TijAijj, # »∼(↑RijBij)j) i  so by transitivity Θ  F(Θ) ≈ JF(Θ)KG. b) Analogous to the proof of Theorem 8.9 (a) by duality. Now that we know how to encode individual (co-)data types in isolation, we look to a global encoding of arbitrary types made out of a collection G of (co-)data declarations. The only limitation on the group of declarations G is that they be well-formed and non-cyclic, which is a consequence of the judgement ( `G ) seq from Section 6.2. The non-cyclic requirement ensures that the dependency chains between declarations is well-founded, so the process of inlining the encodings of (co-)data types will eventually terminate and give a final, fully-expanded encoding. Theorem 8.10 ((Co-)Data Polarization). Given derivations of both ( `G ) seq and Θ `G A : S, it follows that Θ  A ≈ JAKG. Proof. By lexicographic induction on (1) the derivation of ( `G ) seq , and (2) the derivation of Θ `G A : S. The case when A is a variable is immediate. The case where A = F( #»C ) for some F declared in G as data F( # » X : S ′) : Swhere # » K : ( # » A : T ` F( #»X ) | # »B : R ) follows from Theorems 8.9 and 8.8. In particular, we have # » Θ  C ≈ JCKG # »X : S ′  Aij ≈ JAijKG′ # »X : S ′  Bij ≈ JBijKG′ from the inductive hypothesis for some G ′ strictly smaller than G. From Theorem 8.9, we have F( #»C ) ≈ S⇑( ⊕ ( # »⊗ ( # »↓TijAijθ j ,∼( # »↑RijBijθ j )) i )) 363 where θ = # »{C/X}, and from Theorem 8.8 we know that Θ  S⇑ ⊕ # »⊗( # »↓TijAijθj,∼( # »↑RijBijθj)) i  ≈ S⇑ ⊕ # »⊗( # »↓TijJAijKGJθKGj, # »∼(↑RijJBijKGJθKG)j)i  where JθKG = # »{JCKG/X}. Therefore, we have Θ  F( #»C ) ≈ rF( #»C )zG by distributing the substitution θ over translation. The case where A = G( #»C ) for some G declared in G as co-data follows similarly. Note that as an immediate consequence of the full (co-)data polarization encoding (Theorem 8.10), we can generalize the fact that isomorphism distributes over substitution into a type made from polarized connectives (Theorem 8.8) to conclude that isomorphism distributes over substitution into any type built from (non-cyclic) (co-)data type constructors. In particular, for any non-cyclically well- formed G, Θ, X : S `G A : T , Θ `G B : S, and Θ `G C : S, if Θ  B ≈ C then Θ  A {B/X} ≈ A {C/X}. This fact means that we can apply any isomorphism within the context of any encodable (co-)data type. 364 CHAPTER IX Representing Functional Programs At this point, we have looked at the design and theory of many different programming language features in the setting of the sequent calculus. We looked at a symmetric mechanism for user-defined data and co-data types, an abstract treatment of evaluation strategy in terms of substitution that lets us mix multiple evaluation orders within a program (Chapter V). We also looked at ways to do capture type abstraction as higher-order data and co-data types, and well-founded recursion in both types and programs (Chapter VI). But how has what we learned impact functional programming languages which are based on natural deduction and the λ-calculus? Is there some way to transfer ideas born in the sequent calculus over to more traditional programming languages? Yes! The sequent calculus was developed as a tool for studying natural deduction (Gentzen, 1935a), so the two already have a well-established relationship (Gentzen, 1935a). Here, we will now employ that canonical relationship between the two logics to develop the natural deduction, λ-based counterpart to everything we have done in the µµ˜ sequent calculus. This lets us talk about ideas like user-defined (co-)data types, multiple kinds of evaluation strategies, and mixed induction and co-induction in a core language that is much closer to pure functional programming. The pure λ-calculus counterpart is not just loosely based on the ideas developed in the µµ˜-calculus, the two are in a close correspondence between their static and dynamic semantics. In particular, by limiting ourselves to just one consequence, which eliminates the possibility of control effects, typability and equations between program fragments in the two languages are in a one-for-one correspondence. The correspondence between sequent calculus and natural deduction has two applications. First, it lets us compile functional programs, which come from languages based on the λ-calculus, so a sequent calculus representation. Second, it lets us transfer results, such as strong normalization (in Chapter VII) found in the sequent-based language to the natural deduction one. So the single-consequence µµ˜-calculus can be seen as a more machine-like version of a pure λ-calculus core language which serves as a compile target as well as a good vehicle for studying the properties of functional programs. 365 Next, we consider what natural deduction language corresponds to the entire, multiple-consequence, µµ˜-calculus. Multiple consequences are neatly accommodated by Parigot’s (1992) λµ-calculus, which is a form of natural deduction for classical logic, and a term language for first-class control effects. This extension sacrifices purity for greater expressivity (Felleisen, 1991) which has practical applications in compilers. In particular, optimizing compilers rely on the idea of join points—a representation of shared control flow in a program which joins back together after diverging across different branching paths, like after an if-then-else construct—for preventing code size explosion while transforming programs. In common compiler intermediate languages, join points are represented as φ- nodes in static single assignment (SSA) (Cytron et al., 1991) and as just continuations in continuation-passing style (CPS) (Appel, 1992). However, the proper treatment of join points is typically a troublesome issue in languages based on a pure direct-style λ-calculus (Kennedy, 2007). In contrast, the direct-style λµ-calculus presented here can be used as a basis for a compiler intermediate language for functional languages that allows for the proper expression of shared control flow in terms of general first- class control. If we are interested in compiling pure functional programs in particular, we can restrict this calculus down to the pure subset without loosing the correct treatment of join points by limiting the types and occurrences of co-variables (Maurer et al., 2017; Downen et al., 2016). This restriction also makes it more direct to give a good account of recursive join points which are helpful for implementing efficient loops for functional languages. This chapter covers the following topics: – A λ-calculus based natural deduction counterpart to the µµ˜ sequent calculus, λlet (Section 9.1), including all of the language features we have considered so far: user-defined, higher-order (co-)data types, mixed evaluation strategies, and well-founded recursion. – The direct correspondence between the static and dynamic semantics of λlet and the single-consequence restriction µµ˜ (Section 9.2), which is “direct” in the sense that the translation between the two languages is local, not global, so that the types of terms are exactly the same in both languages (unlike continuation- passing style transformations). 366 – The multipe-consequence extension of the pure natural deduction λlet -calculus, λµlet (Section 9.3), which heightens the correspondence to cover the full µµ˜ sequent calculus and allows for the direct representation of shared control flow. Pure Data and Co-Data in Natural Deduction So far, we have looked at a calculus in sequent style, which corresponds to a classical logic and thus includes control effects (Griffin, 1990). Let’s now shift focus, and see how the intuition we gained from the sequent calculus can be reflected back into a more traditional core calculus for representing functional programs. The goal here is to see how the principles we have developed in the sequent setting can be incorporated into a λ-calculus based language: using the traditional connection between natural deduction and the sequent calculus, we show how to translate our primitive and noetherian recursive types and programs into natural deduction style. In essence, we will consider a functional calculus based on an effect-free subset of the µµ˜-calculus corresponding to Gentzen’s (1935a) LJ sequent calculus for intuitionistic logic. Static semantics Essentially, the intuitionistic restriction of the µµ˜ sequent calculus for representing effect-free programs follows a single mantra, based on the connection between the classical and intuitionistic logics LK and LJ: there is always exactly one conclusion. In the type system, this means that the sequent for typing terms has the more restricted form Γ `ΘG v : A, where the active type on the right is no longer ambiguous and does not need to be distinguished (with |), as is more traditional for functional languages. Notice that this limitation on the form of sequents impacts which data and co-data types we can express. For example, the common sums and products, which were declared as dataX ⊕ Y where ι1 : X ` X ⊕ Y | ι2 : X ` X ⊕ Y | codataX & Y where pi1 : | X & Y ` X pi2 : | X & Y ` Y fit into this restricted typing discipline, because each of their constructors and observers involves exactly one type to the right of entailment. However, the (co-)data types for 367 representing more exotic connectives like the two negations data∼X where ∼ : ` ∼X | X codata¬Y where ¬ : X | ¬Y ` or the binary and nullary disjunctive co-data types codataX ` Y where [ , ] : | X ` Y ` X, Y codata⊥where [] : | ⊥ ` do not fit, because they require placing zero or two types to the right of entailment. In sequent style, this means these pure data types can never contain a co-value, and pure co-data types must always involve exactly one co-value for returning the unique result. In functional style, the data types are exactly the algebraic data types used in functional languages, with the corresponding constructors and case expressions, and the co-data types can be thought of as merging functions with records into a notion of abstract “objects” which compute and return a value when observed. For example, to observe a value of type X & Y , we could access the first component as a record field, v pi1, and we describe an object of this type by saying how it responds to all possible observations, λ{pi1 ⇒ v1 | pi2 ⇒ v2}, with the typing rules: Γ ` v1 : A Γ ` v2 : B Γ ` λ{pi1 ⇒ v1 | pi2 ⇒ v2} : A&B Γ ` v : A&B Γ ` v pi1 : A Γ ` v : A&B Γ ` v pi2 : B Likewise, the traditional λ-abstractions and type abstractions from system F (as seen previously in Chapter II) can be expressed by objects of these form. Specifically, since they definable (as seen previously in Sections 5.2 and 6.2) as pure co-data types with one observer, · : (X | X → Y ` Y ) and @ : ( | ∀(X) `Y :k X Y ) respectively, so that the application of a function v to an argument v′ is written as v · v′, the specification of the polymorphic v to a type A is written as v @ A, and the basic λ-abstractions are syntactic sugar that removes the extra generality: λx.v , λ{ · x⇒ v} ΛY :k.v , λ{@ Y :k ⇒ v} Thus, these objects serve as “generalized λ-abstractions” (Abel & Pientka, 2013) defined by shallow case analysis rather than deep pattern-matching. 368 v ∈ Term ::= x | letx = v in v′ (core) | K( #»B, #»v ) | case v′ of # » K( # »Y :l, #»x )⇒ v (data) | λ { # » O[ # »Y :l, #»x ]⇒ v } | v′ O[ #»B, #»v ] (co-data) F ∈ FrameCxt ::=  | letx = F in v (core frames) | F O[ #»B, #»v ] (co-data frames) | caseF of # » K( # »Y :l, #»x )⇒ v (data frames) FIGURE 9.1. Untyped syntax for a natural deduction language of data and co-data. Putting this more formally, the untyped syntax of λlet , a natural deduction style pure λ-calculus, is given in Figure 9.1. At its core, the λlet -calculus includes variables and let expressions, which allow for the binding and reference of names without imposing any particular structure. In addition, the untyped syntax of λlet includes arbitrary data structures and case analysis on data structures (of the form K( #»B, #»v ) and case v′ of # » K( # »Y :l, #»x )⇒ v) as well as arbitrary co-data objects and observations of those objects (of the form λ { # » O[ # »Y :l, #»x ]⇒ v } and v′ O[ #»B, #»v ]). The observations of the results of terms are given by the syntax of frame contexts F . While these contexts are a meta-syntactic construct (that is, they are contexts of the syntax of terms, but not syntax themselves), they will soon play a crucial in the dynamic semantics of λlet to come. On top of the untyped syntax, we have the static typing rules. The rules for the type-level upward and well-formed sequents are exactly the same as in the µµ˜-calculus, so we do not repeat them here, and instead only present the typing rules for terms. First, there are the core typing rules in Figure 9.2 which correspond to the core of the µµ˜-calculus: the Var rule corresponds to right-variable rule VR, the Let rule corresponds to the Cut rule, and the TC rule corresponds to the right-type conversion rule TCR. Note that weakening and contraction are built into these rules, following the style of natural deduction which makes structural inferences implicit. Next, we have the typing rules for pure data and co-data types in the λlet -calculus: the rules for simple multi-kinded (co-)data types are shown in Figure 9.3 and the more advanced rules for higher-order (co-)data types are shown in Figure 9.4. Intuitively, these rules 369 Judgement ::= Γ `ΘG v : A Γ, x : A `ΘG x : A Var Γ `ΘG v : A Θ `G A : S Γ, x : A `ΘG v′ : C Γ `ΘG letx = v in v′ : C Let Θ `G A =βη B : k Γ `ΘG v : A Γ `ΘG v : B TC FIGURE 9.2. A natural deduction language for the core calculus. generalize the typing rules from the λ-calculus in Chapter II. Also note that for a case expression which introduces type variables in its branches, the associated elimination rule implicitly imposes the restriction that the return type cannot reference those type variables. The implicit restriction comes from the fact that, for the conclusion of the elimination rule to be well formed. That is, if we know that the sequent corresponding to Γ `ΘG case v of # » Ki( # » Y :l #»x )⇒ vi i : C is well formed, i.e. if we have a derivation of( Γ `ΘG C ) seq , then that implies that none of #»Y are free in C, since they are not already in Θ because we are able to extend the typing environment to Θ, # »Y : l in the premise. Dynamic semantics With the static semantics for how natural deduction programs are formed, we now consider the dynamic semantics for how programs behave. As with the µµ˜-sequent calculus, we will characterize the impact of evaluation strategy on substitution as a parameter to the language. In λlet , the corresponding notion of a substitution strategy T is a subset of terms called values (V ∈ ValueT ) and a subset of frame contexts called co-values (E ∈ CoValueT ), such that variables are values, the empty context is a co-value, and co-values compose (i.e. if E and E ′ are co-values then so is E[E ′]). Next, an evaluation strategy T includes a substitution strategy as well as a subset of all contexts called evaluation contexts (D ∈ EvalCxtT ) such that every co-value is an evaluation context. Note that the scope of potential evaluation contexts is quite large, and co-values point out a special subset of all evaluation contexts. In essence, 370 Given data F( # »X : k) : Swhere # » Ki : # » Aij : Tijj ` F( #»X ) i ∈ G, we have the rules: # » Γ `ΘG v : Aij { # » B/X }j Γ `ΘG Ki( #»v ) : F( #» B) FIKi Θ `G F( #»B) : S Γ `ΘG v : F( #» B) # » Γ, # » x : Aij { # » B/X }j `ΘG vi : C i Γ `ΘG case v of # »Ki( #»x )⇒ vi i : C FE Given codataG( # »X:k) : Swhere # » Oi : # » Aij : Tijj | G( #»X ) ` A′i : Ri i ∈ G, we have the rules: # » Γ, # » x : Aij { # » B/X }j `ΘG vi : A′i { # » B/X }i Γ `ΘG λ { # »Oi[ #»x ]⇒ vi i } : G( #»B) GI Θ `G G( #»B) : S Γ `ΘG v : G( #» B) # » Γ `ΘG vj : Aij { # » B/X }j Γ `ΘG v Oi[ #»vjj] : A′i GEOi FIGURE 9.3. Natural deduction typing rules for simple (co-)data. 371 Given data F( # »X : k) : Swhere # » Ki : # » Aij : Tijj ` # » Y :lij j F( #»X ) i ∈ G, we have the rules: # » Θ `G B′ : lij { # » B/X }j # » Γ `ΘG v : Aij { # » B′/Y , # » B/X }j Γ `ΘG Ki( # » B′ , #»v ) : F( #»B) FIKi Θ `G F( #»B) : S Γ `ΘG v : F( #» B) # » Γ, # » x : Aij { # » B/X }j `Θ, # » Y :lij j G vi : C i Γ `ΘG case v of # » Ki( # » Y :lij j , #»x )⇒ vi i : C FE Given codataG( # »X:k) : Swhere # » Oi : # » Aij : Tijj | G( #»X ) ` # » Y :lij j A′i : Ri i ∈ G, we have the rules: # » Γ, # » x : Aij { # » B/X }j `Θ, # » Y :lij j G vi : A′i { # » B/X }i Γ `ΘG λ { # » Oi[ # » Y :lij j , #»x ]⇒ vi i } : G( #»B) GI Θ `G G( #»B) : S Γ `ΘG v : G( #» B) # » Θ `G B′ : lij { # » B/X }j # » Γ `ΘG vj : Aij { # » B′/Y , # » B/X }j Γ `ΘG v Oi[ # » B′ j , #»vj j ] : A′i GEOi FIGURE 9.4. Natural deduction typing rules for higher-order (co-)data. 372 (letT ) letx = V in v letT v {V/x} (V ∈ ValueT ) (ηletT ) letx = v inE[x] ηletT E[v] (E ∈ CoValueT , x /∈ FV (E)) (ccT ) E[letx = v′ in v] ccT letx = v′ inE[v] ( 6= E ∈ CoValueT , x /∈ FV (E)) (ccT ) E  case v′ of# » K( # »Y :l)⇒ v  ccT case v ′ of # » K( # »Y :l)⇒ E[v] FIGURE 9.5. A core parametric theory for the natural deduction calculus. co-values are evaluation contexts with some additional properties: evaluation contexts in general are stationary, but co-values are mobile. With the concept of λlet evaluation strategies in mind, lets look at the strategy- parametric rewriting rules for the λlet -calculus. The core rewriting rules, which correspond to the core theory of the µµ˜-calculus, are given in Figure 9.5. These rules are responsible for interpreting let expressions: the letT rule substitutes a let -bound T -value for the bound variable, and the ηletT rule eliminates a trivial let expression of the form letx = v inE[x] which introduces a name only to use it exactly once in the eye of a T -co-value. The core theory also includes commuting conversions ccT which push co-values inside of the block structures of let and case expressions. Intuitively, in the term E[letx = v′ in v], the result of v is passed to the co-value E, however the two are separated by an intermediate let . A ccT reduction is thus needed to push the co-value inward and bring the question E in contact with the answer v as in letx = v′ inE[v]. The same situation happens with a case in place of a let , so there is a commuting conversion for case , too. Unfortunately, this means that the core λlet theory must know about and manipulate language constructs revolving around data types, unlike the core theory of the µµ˜ sequent calculus which made no assumptions about specific types. Next, we have the rewriting rules for data and co-data in the natural deduction λlet -calculus. First are the untyped and strategy-parameterized β and ς laws in Figure 9.6, which mimic the similar β and ς in the µµ˜-calculus. The β laws generalize the β laws for the λ-calculus from Chapter II to accomodate arbitrary data and co-data types and arbitrary substitution strategies. The ς lift laws are necessary to keep evaluation moving forward when non-values are found in unfortunate contexts. 373 (βT ) case Ki( #» B, #» V )of # » Ki( # » Y :l, #»x )⇒ vi i βT vi { # » V/x, # » B/Y } (βT ) λ { # » Oi[ # » Y :l, #»x ]⇒ vi i } O1[ #» B, #» V ] βT vi { # » V/x, # » B/Y } (ςT ) Ki( #» B, #» V , v′, #»v ) ςT letx = v′ in Ki( #» B, #» V , x, #»v ) (v′ /∈ ValueT , x fresh) (ςT ) V ′ O[ #» B, #» V , v′, #»v ] ςT letx = v′ inV ′ O[ #»B, #»V , x, #»v ] (v′ /∈ ValueT , x fresh) FIGURE 9.6. The untyped parametric βς laws for arbitrary data and co-data types. For example, call-by-value λ-calculi often have issues with getting stuck prematurely on open terms, where evaluation should still continue even though the value of everything isn’t known yet. For example, in the open λ-calculus term (λx.λy.x) (f 1) 2, plain call-by-value β-reduction is stuck because f 1 is not a value and cannot be substituted for x even though the result of the term must be f 1 for any value of f . However, the ς rules can lift the inconveniently-placed f 1 out of the way, letting reduction proceed as follows: (λx.λy.x) (f 1) 2→ςV let z = f 1 in (λx.λy.x) z 2 →βV let z = f 1 in (λy.z) 2 →βV let z = f 1 in z →ηletV f 1 We also have the typed and strategy-independent β and η laws in Figure 9.6. The β laws work for any evaluation strategy by binding unevaluated sub-terms in let expressions, and the η laws expand terms based on their type. Note that, as in the µµ˜-calculus, the η law for co-data types acts on variables, but the more commonly seen generalization to values is derivable with the help of the core theory for let : V : G( #»C ) =letT let y = V in y 374 (βF) case Ki( #» B, #»v )of # » Ki( # » Y :l, #»x )⇒ vi βF let # »x = vi in vi { # » B/Y } (βG) λ { # » Oi[ # » Y :l, #»x ]⇒ vi } Oi[ #» B, #»v ] βG let # »x = v in vi { # » B/Y } (ηF) v : F( #»C ) ≺ηF case v of # » K( # »Y :l, #»x )⇒ K( # »Y :l, #»x ) (ηG) y : G( #»C ) ≺ηG λ { # » O[ # »Y :l, #»x ]⇒ y O[ # »Y :l, #»x ] } FIGURE 9.7. The typed βη laws for declared data and co-data types. V ∈ ValueV ::= x | K( #»A, #»V ) | λ { # » O[ # »X:l, #»x ]⇒ v } E ∈ CoValueV ::= F D ∈ EvalCxtV ::= E FIGURE 9.8. Call-by-value (V) strategy in natural deduction. =ηG let y = V inλ { # » O[ # »Y :l, #»x ]⇒ y O[ # »Y :l, #»x ] } =letT λ { # » O[ # »Y :l, #»x ]⇒ V O[ # »Y :l, #»x ] } where the definition of capture-avoiding substitution enforces the side condition that the variables #»Y and #»x are not free in V . In this sense, the strategy-independent η laws in Figure 9.7 also generalize the η laws for the λ-calculus from Chapter II. Let’s now consider some example evaluation strategies, corresponding to the ones we defined for the µµ˜ sequent calculus. First, we have the call-by-value strategy V shown in Figure 9.8, which says that only variables, data structures made from values, and co-data objects are values. However, all frame contexts are co-values, which also exactly spell out the set of evaluation contexts. This definition essentially follows the normal notion of values and evaluation contexts in the call-by-value λ-calculus, except that evaluation does not descend into data structures (like pairs) or the arguments of co-data observations (like function calls). Instead, the ς rules lift unevaluated components out of these contexts and bind them to a variable with a let expression which is a co-value so evaluation can continue on them. Next, we have the call-by- name strategy N shown in Figure 9.8, which says that every term is a value, but 375 V ∈ ValueN ::= v E ∈ CoValueN ::=  | caseE of # » K( # »X:l, #»x )⇒ v | E O[ #»A, #»v ] D ∈ EvalCxtN ::= E FIGURE 9.9. Call-by-name (N ) strategy in natural deduction. only the empty context, a case on a co-value, and an observation on a co-value are co-values. This evaluation strategy more closely matches the call-by-name λ-calculus, where every term is substitutable and evaluation contexts, which are exactly co-values, are only those contexts which force an answer to be given for computation to continue. Finally, we have the most subtle strategy of the three: the call-by-need strategy LV shown in Figure 9.10. Note how the values of call-by-need are exactly the values of call-by-value. However, the co-values of call-by-need lie in between call-by-value and call-by-name. In particular, every call-by-name co-value is a call-by-need co-value, but there are extra co-values of the form letx = E inD[E ′[x]], where the evaluation context D can include extra let -bindings around E ′[x]. Note that this is the first evaluation strategy with an interesting difference between co-values and evaluation contexts: evaluation contexts can wrap a co-value with extra let -bindings as in letx1 = v1 in . . . letxn = vn inE. This is because those bindings are delayed until their value is needed, as in the call-by-need λ-calculus. For example, if we have the term letx = 1 + 1 in v, the right-hand-side of x = 1 + 1 is delayed, and instead v is evaluated in the context letx = 1 + 1 in v. If it happens that v reduces to a term that needs x, that is to say v 7→ D[E[x]], then letx =  inD[E[x]] is an evaluation context, so that 1 + 1 is evaluated and substituted for x as in: letx = 1 + 1 in v 7→ letx = 1 + 1 inD[E[x]] 7→ letx = 2 inD[E[x]] 7→ D[E[2]] {2/x} However, the bindings are not mobile; they should not be pushed inward by commuting conversions like co-values are. 376 V ∈ ValueLV ::= x | K( #»A, #»V ) | λ { # » O[ # »X:l, #»x ]⇒ v } E ∈ CoValueLV ::=  | caseE of # » K( # »X:l, #»x )⇒ v | E O[ #»A, #»V ] | letx = E inD[E ′[x]] D ∈ EvalCxtLV ::= E | letx = v inD FIGURE 9.10. Call-by-need (LV) strategy in natural deduction. Finally, we can combine multiple evaluation strategies within a single program using as similar technique as in the µµ˜ sequent calculus. That is to say, evaluation strategies can be combined by taking the disjoint union of the respective substitution strategies and composing together each of their evaluation contexts to get a single set of operational contexts. The disjointness of the union can be regulated by kinds, as a looser form of typing discipline as shown in Figure 9.11. As before, the statement v :: T intuitively means that v : A and A : T for some unknown type A. We can then disjointly union several substitution strategies #»T based on kinds by associating each strategy with a kind, and distinguishing values based on the kind of their output and co-values based on the kind of their input. That is to say, the combined set of values Value #»T contains any value V ∈ ValueTi such that v :: Ti. In contrast, the combined set of co-values CoValue #»T contains any co-value E ∈ CoValueTi such that E[v] :: Tj for all v :: Ti. For example, to combine the three strategies above, we would get the following composite strategy: V :: V V ∈ ValueV V ∈ ValueV,N ,LV V :: N V ∈ ValueN V ∈ ValueV,N ,LV V :: LV V ∈ ValueLV V ∈ ValueV,N ,LV E ∈ CoValueV ∀v :: V .∃R.E[v] :: R E ∈ CoValueV,N ,LV E ∈ CoValueN ∀v :: N .∃R.E[v] :: R E ∈ CoValueV,N ,LV E ∈ CoValueLV ∀v :: LV .∃R.E[v] :: R E ∈ CoValueV,N ,LV E ∈ CoValueV,N ,LV E ∈ EvalCxtV,N ,LV D ∈ EvalCxtV,N ,LV letx :: LV = v inD ∈ EvalCxtV,N ,LV 377 Judgement ::= Γ `ΘG v :: A Core kinding rules: Γ, x :: T `ΘG x :: T Var Γ `ΘG v :: T Γ, x :: T `ΘG v′ :: R Γ `ΘG letx = v in v′ :: R Let Given data F( # »X : k) : Swhere # » Ki : # » Aij : Tijj ` # » Y :lij j F( #»X ) i ∈ G, we have the rules: # »Γ `G v :: Tij j Γ `G Ki( # » B′ , #»v ) :: S FIKi Γ `G v :: S # » Γ, # »x :: Tijj `G vi :: R i Γ `G case v of # » Ki( # » Y :lij j , #»x )⇒ vi i :: R FE Given codataG( # »X:k) : Swhere # » Oi : # » Aij : Tijj | G( #»X ) ` # » Y :lij j A′i : Ri i ∈ G, we have the rules: # » Γ, # »x : Tijj `G vi :: Ri i Γ `G λ { # » Oi[ # » Y :lij j , #»x ]⇒ vi i} :: S GI Γ `G v :: S # »Γ `G vj :: Tij j Γ `G v Oi[ # » B′ j , #»vj j] :: Ri GEOi FIGURE 9.11. Type-agnostic kind system for multi-kinded natural deduction terms. 378 Note that, the statement of a combination of evaluation strategies is not as clean in the λlet natural deduction calculus as it was in the µµ˜ sequent calculus because the term-heavy syntax of λlet makes it more indirect to talk about concepts such as co-values. Remark 9.1. It is worth considering if the η laws in Figure 9.7 really say all that needs to be said about extensionality for data types. For example, the instance of the η law for sum types A⊕B is v : A⊕B =η⊕ case v of ι1 (x)⇒ ι1 (x) ι2 (y)⇒ ι2 (y) and after all, there are apparently much stronger extensionality laws for sums, like C[v : A⊕B] = case v of ι1 (x)⇒ C[ι1 (x)] ι2 (y)⇒ C[ι2 (y)] which generalizes η⊕ so that the term v : A ⊕ B may appear in any context C.1 Unfortunately, this strong sum η law is deeply troublesome when faced with computational effects like nontermination. For example, if we apply the strong η law for sums with C = λx.x  and v = Ω : A ⊕ B, where Ω is a term which loops forever without returning a result, then the stronger η⊕ law is completely unsound with respect to contextual equivalence, since λz.z Ω 6∼= caseΩof ι1 (x)⇒ λz.z ι1 (x) | ι2 (y)⇒ λz.z ι2 (y) ∼= Ω Thus, the exceptionally strong version of η⊕ only makes sense in a pure and normalizing language, where everything terminates and all terms evaluate to some result. Dealing with a strong extensionality law like this, which places very strict requirements on the language like strong normalization that can only be deep properties of the language as a whole, is difficult to handle directly. As an alternative 1This strong sum law is sometimes also written in terms of a substitution v′ {v/z} where v can occur in many places instead of the context C[v] where v occurs in exactly one place, but these amount to the same thing since C might be let z =  in v′ so that C[v] = let z = v in v′. 379 approach, Munch-Maccagnoni & Scherer (2015) propose the use of polarization to tame the strong sum extensionality law to make it more manageable without loosing anything important. In particular, the polarization hypothesis suggests that call-by- value is the most fitting evaluation strategy for a data type like A ⊕ B, and so in the strong η⊕ would be more appropriate to restrict the term in question to be a (call-by-value) value as in: C[V : A⊕B] = caseV of ι1 (x)⇒ C[ι1 (x)] | ι2 (y)⇒ C[ι2 (y)] This equation does not suffer from the same troubles when effects like nontermination are introduced: an infinitely looping term is never a value, so the above extensionality law does not cause the same sort of counter-example. In other words, this restricted version of the strong extensionality law for sums is sound even in the presence of effects. And as Munch-Maccagnoni & Scherer note, if the language happens to be strongly normalizing, then every (closed) term reduces to a value, anyway, and so the unrestricted sum extensionality law can be derived after the fact as a deeper property of the language. So where does this leave our simplistic treatment of extensionality for data types taken here? As it turns out, the strong sum law is derivable from the simplistic η⊕ law in the equational theory with the help of commuting conversions. First, note that Munch-Maccagnoni & Scherer’s extensionality law for call-by-value sums is derivable as follows: caseV of ι1 (x)⇒ C[ι1 (x)] ι2 (y)⇒ C[ι2 (y)] =letV caseV of ι1 (x)⇒ let z = ι1 (x) inC[z] ι2 (y)⇒ let z = ι2 (y) inC[z] =ccV let z = caseV of ι1 (x)⇒ ι1 (x) ι2 (y)⇒ ι2 (y) inC[z] =η⊕ let z = V inC[z] =letV C[V ] 380 Γ `ΘG v0 : A 0 Γ, x : A j `Θ,j:IxG v1 : A (j + 1) Γ `ΘG λ{@ 0:Ix ⇒ v0 | @x (j+1):Ix ⇒ v1} : ∀Ix(A) ∀IxRrec Θ `G ∃Ix(A) : S Γ `ΘG v : ∃Ix(A) Γ, x : A 0 `ΘG v0 : C Γ, x : A (j + 1) `Θ,j:IxG v1 : A j Γ `ΘG loop v of 0:Ix @ x⇒ v0 | (j+1) : Ix @ x⇒ v1 : C ∃IxLrec Γ, x : ∀