\[ \require{mathtools} \let\DeclarePairedDelimiter\DeclarePairedDelimiters % MathJax typo %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % symbol %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % sets \newcommand{\N}{\mathbb{N}} % natural numbers \newcommand{\Z}{\mathbb{Z}} % integers \newcommand{\Q}{\mathbb{Q}} % rational numbers \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\C}{\mathbb{C}} % complex numbers % category \newcommand{\catf}[1]{{\mathbf{#1}}} \newcommand{\cA}{\catf{A}} \newcommand{\cB}{\catf{B}} \newcommand{\cC}{\catf{C}} \newcommand{\cD}{\catf{D}} \newcommand{\cE}{\catf{E}} \newcommand{\cF}{\catf{F}} \newcommand{\cG}{\catf{G}} \newcommand{\cH}{\catf{H}} \newcommand{\cI}{\catf{I}} \newcommand{\cJ}{\catf{J}} \newcommand{\cK}{\catf{K}} \newcommand{\cL}{\catf{L}} \newcommand{\cM}{\catf{M}} \newcommand{\cN}{\catf{N}} \newcommand{\cO}{\catf{O}} \newcommand{\cP}{\catf{P}} \newcommand{\cQ}{\catf{Q}} \newcommand{\cR}{\catf{R}} \newcommand{\cS}{\catf{S}} \newcommand{\cT}{\catf{T}} \newcommand{\cU}{\catf{U}} \newcommand{\cV}{\catf{V}} \newcommand{\cW}{\catf{W}} \newcommand{\cX}{\catf{X}} \newcommand{\cY}{\catf{Y}} \newcommand{\cZ}{\catf{Z}} \newcommand{\cZero}{\catf{0}} \newcommand{\cOne}{\catf{1}} \newcommand{\cTwo}{\catf{2}} \newcommand{\cArr}{\catf{Arr}} % arrow \newcommand{\cPSh}{\catf{PSh}} % presheaf \newcommand{\cCone}{\catf{Cone}} % cone \newcommand{\cCocone}{\catf{Cocone}} % cocone \newcommand{\cFin}{\catf{Fin}} % finite prefix \newcommand{\cSet}{\catf{Set}} % functions \newcommand{\cRel}{\catf{Rel}} % relations \newcommand{\cPos}{\catf{Pos}} % posets \newcommand{\cMon}{\catf{Mon}} % monoids \newcommand{\cCMon}{\catf{CMon}} % commutative monoids \newcommand{\cGrp}{\catf{Grp}} % groups \newcommand{\cAb}{\catf{Ab}} % abelian groups \newcommand{\cCat}{\catf{Cat}} % categories \newcommand{\cGrpd}{\catf{Grpd}} % groupoids \newcommand{\cVect}{\catf{Vect}} % vector spaces \newcommand{\cMat}{\catf{Mat}} % matrices \newcommand{\cTop}{\catf{Top}} % topological spaces and continuous maps \newcommand{\cMet}{\catf{Met}} % metric spaces and metric maps \newcommand{\cMeas}{\catf{Meas}} % measurable spaces and measurable functions \newcommand{\cStoch}{\catf{Stoch}} % measurable spaces and stochastic maps \newcommand{\cProb}{\catf{Prob}} % probability measures and measure-preserving functions %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % supscript %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\op}{^\mathrm{op}} \newcommand{\inv}{^{-1}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % function %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \DeclareMathOperator{\lcm}{lcm} \DeclareMathOperator{\logsumexp}{log-sum-exp} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % arrow %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % morphism \newcommand{\xto}{\xrightarrow} % -> \newcommand{\xot}{\xleftarrow} % <- \newcommand{\toot}{\rightleftarrows} % <=> \newcommand{\xtoot}[2]{\xrightleftharpoons[#2]{#1}} \newcommand{\iso}{\cong} % ~= \newcommand{\klto}{\rightsquigarrow} % ~> \newcommand{\mono}{\rightarrowtail} % >-> \newcommand{\epi}{\twoheadrightarrow} % ->> \newcommand{\ihom}{\multimap} % -o % category \newcommand{\incl}{\hookrightarrow} \newcommand{\adjto}[2]{\overset{{}\xto[]{#1}{}}{\underset{{}\xot[#2]{}{}}{\bot}}} % functor \newcommand{\nat}{\Rightarrow} % => \newcommand{\xnat}{\xRightarrow} % => \newcommand{\comma}{\downarrow} \newcommand{\adj}{\dashv} % -| %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % delimiter %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \DeclarePairedDelimiter{\parens}{(}{)} % parentheses ( ) \DeclarePairedDelimiter{\bracks}{[}{]} % brackets [ ] \DeclarePairedDelimiter{\braces}{\{}{\}} % braces { } \DeclarePairedDelimiter{\angles}{\langle}{\rangle} % angles \DeclarePairedDelimiter{\floor}{\lfloor}{\rfloor} \DeclarePairedDelimiter{\ceil}{\lceil}{\rceil} \newcommand{\set}{\braces} \newcommand{\singleton}{{\set{*}}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % operator %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\compL}{\mathbin{\circ}} \newcommand{\compR}{\mathbin{;}} \newcommand{\Kl}{\mathrm{Kl}} \DeclareMathOperator{\Obj}{Obj} \DeclareMathOperator{\Hom}{Hom} \DeclareMathOperator{\Sub}{Sub} \DeclareMathOperator{\Eq}{Eq} \DeclareMathOperator{\Coeq}{Coeq} \DeclareMathOperator{\Pull}{Pull} \DeclareMathOperator{\Push}{Push} \DeclareMathOperator{\Lan}{Lan} \DeclareMathOperator{\Ran}{Ran} \DeclareMathOperator{\Lift}{Lift} \DeclareMathOperator{\Rift}{Rift} \DeclareMathOperator{\id}{id} \DeclareMathOperator{\colim}{colim} % exponential \DeclareMathOperator{\partf}{partial} \DeclareMathOperator{\curry}{curry} % natural number \DeclareMathOperator{\zero}{zero} \DeclareMathOperator{\successor}{succ} % list \DeclareMathOperator{\nil}{nil} \DeclareMathOperator{\cons}{cons} \newcommand{\concat}{\mathbin{{+}\mspace{-8mu}{+}}} % fold \DeclareMathOperator{\fold}{fold} \DeclareMathOperator{\map}{map} \DeclareMathOperator{\filter}{filter} % bool \DeclareMathOperator{\cond}{cond} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % order %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\meet}{\wedge} \newcommand{\join}{\vee} \newcommand{\bigmeet}{\bigwedge} \newcommand{\bigjoin}{\bigvee} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % logic %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newcommand{\lT}{\top} \newcommand{\lF}{\bot} % logical connective \newcommand{\binop}[1]{\mathbin{#1}} % conjunction AND \newcommand{\lcon}{\binop{\wedge}} % not conjunction NAND \newcommand{\lncon}{\binop{\uparrow}} % disjunction OR \newcommand{\ldis}{\binop{\vee}} % not disjunction NOR \newcommand{\lndis}{\binop{\downarrow}} % implication \newcommand{\limp}{\binop{\rightarrow}} % not implication \newcommand{\lnimp}{\binop{\nrightarrow}} % converse implication \newcommand{\lcim}{\binop{\leftarrow}} % not converse implication \newcommand{\lncim}{\binop{\nleftarrow}} % equivalence XNOR \newcommand{\leqv}{\binop{\leftrightarrow}} % not equivalence XOR \newcommand{\lneqv}{\binop{\nleftrightarrow}} \newcommand{\lproves}{\binop{\vdash}} % |- \newcommand{\lmodels}{\binop{\vDash}} % |= \newcommand{\lnproves}{\binop{\nvdash}} % |/- \newcommand{\lnmodels}{\binop{\nvDash}} % |/= \newcommand{\truth}{{\set{\lT, \lF}}} \newcommand{\quant}{{[0, \infty]}} \newcommand{\inter}{{[0, 1]}} \newcommand{\qnot}{\mathord{\sim}} \newcommand{\qcon}{\binop{\otimes}} \newcommand{\qdis}{\binop{\oplus}} \newcommand{\qimp}{\binop{\multimap}} \DeclareMathOperator*{\qforall}{\triangledown} \DeclareMathOperator*{\qexists}{\vartriangle} \]

張一凡 · チョウイーファン · [ʈʂāŋ īː fǽn]
Assistant Professor, Department of Complexity Science and Engineering, Graduate School of Frontier Sciences, The University of Tokyo
Visiting Scientist, Imperfect Information Learning Team, RIKEN Center for Advanced Intelligence Project
[yivan.xyz@gmail.com] [yivan.zhang@k.u-tokyo.ac.jp]
[CV] [Google Scholar] [OpenReview] [DBLP] [GitHub]

I’m an assistant professor at The University of Tokyo, where I work on the theory and application of machine learning. I was previously a postdoctoral researcher at RIKEN Center for Advanced Intelligence Project. I obtained my Ph.D. from The University of Tokyo, where I was fortunate to be advised by Prof. Masashi Sugiyama.

Lately, I’ve been exploring algebra and applied category theory in machine learning. My long-term goal is to understand how the structure of learning systems gives rise to intelligent behavior. I aim to uncover how perception, abstraction, reasoning, and creativity emerge from first principles, and how algebraic tools can help manage their growing complexity. I’m especially interested in how intelligent agents can acquire, represent, and reuse knowledge and skills in structured ways, and how they can behave in alignment with instructions and human values, even under minimal supervision.

Over the years, I’ve become fascinated by both foundational formalisms, like formal definitions of disentangled representations or reward aggregation in reinforcement learning, as well as fundamental challenges, such as compositional generalization and weak supervision. I’m drawn to questions that appear across domains in different forms, and I seek unifying abstractions that reveal their shared structure. These are the kinds of problems that resist quick fixes, but reward careful formulation and deep theoretical insight. I believe that we must ask the right questions before looking for the right answers.

That’s why I value abstractions that clarify rather than obscure complexity. For me, category theory offers such a lens: it reveals common structures, offers diagrammatic reasoning, highlights compositionality, and helps manage complexity by showing how parts relate to wholes. While much of my thinking is rooted in abstract mathematics, I strive to translate abstract theories into practical intuitions that resonate with the broader machine learning community.

Concretely, my research spans the following topics:

Foundational machine learning
- algebraic formalisms and techniques
- explainable decision-making under uncertainty
- weak supervision
Representation learning
- disentanglement
- symmetry and equivariance
- abstraction and structured representation
- neural-symbolic learning and reasoning
Reinforcement learning
- non-standard problem formulations
- compositional and hierarchical learning
- safety and alignment
- exploration

I love things that are discovered, not invented.

My research is/was generously supported by RIKEN Center for Advanced Intelligence Project (RIKEN AIP), Japan Society for the Promotion of Science (JSPS), Japan Science and Technology Agency (JST), and Microsoft Research Asia (MSRA).

If you have a specific research idea you’d like to discuss, feel free to get in touch via email. I’m also happy to meet in person for a coffee if you’re in Tokyo. Prospective students are encouraged to first consult the Graduate School of Frontier Sciences or the Graduate School of Information Science and Technology at The University of Tokyo for further information before reaching out.

News

August 2025: Co-organizing an IJCAI’25 tutorial, “AI Meets Algebra: Foundations and Frontiers” in Montréal, Canada
May 2025: “Recursive Reward Aggregation” accepted to RLC’25 in Edmonton, Canada

Featured publications

Recursive Reward Aggregation

reinforcement learning, Bellman equation, dynamic programming, recursion scheme, recursive coalgebra, algebra fusion, lenses and optics
Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, Masashi Sugiyama
Reinforcement Learning Conference 2025 (RLC’25)
[arXiv] [OpenReview] [code]

Summation is a recursive function, so are many others.

The discounted sum \({\fsum}_\gamma(r_{1:t}) = r_1 + \gamma r_2 + \gamma^2 r_3 + \cdots + \gamma^{t-1} r_t = r_1 + \gamma \; (r_2 + \gamma \; (\cdots + \gamma \; (r_t + 0))) = r_1 + \gamma \; {\fsum}_\gamma(r_{2:t})\) is a recursive function. So are many others—length, min, max, or sum of squares. From these, we can compute common statistics like the range, mean, or variance. For example, to get the variance, we can keep track of the length, sum, and sum of squares, and then use a final formula to combine them.

In reinforcement learning, rewards are generated step by step over time as an agent takes actions. If we also aggregate these rewards step by step, we can keep track of the reward statistics as we go. Using ideas from algebra, we can describe the recursive generation of rewards as a hylomorphism (a recursive coalgebra homomorphism, building a list from a seed value) and the recursive aggregation of rewards as a catamorphism (the initial algebra homomorphism, reducing a list to a single value). A technique called algebra fusion lets us merge these two steps into a single recursive statistic function, which gives us a generalized Bellman equation. This algebraic perspective allows us to extend many value-based and actor-critic methods to a broad class of recursive reward aggregations.

/assets/images/fusion.svg — **Algebra fusion**: a *hylomorphism* \(\gen: S \to [R]\) and a *catamorphism* \(\agg: [R] \to T\) can be fused to form a single hylomorphism \(\statistic_\policy := \agg \compL \gen: S \to T\).

Concretely, let \(S\), \(A\), \(R\), \(T\) be the sets of states, actions, rewards, and statistics. For the variance, the statistics \(T = \N \times \R \times \R_{\geq 0}\) are the length \(n\), sum \(s\), and sum of squares \(q\) of a sequence of rewards. For a policy \(\policy: S \to A\), a transition function \(\transition: S \times A \to S\), and a reward function \(\reward: S \times A \to R\) shown below, the statistic function \(\statistic_\policy: S \to T\) can be computed recursively using

an update function \(\update: R \times T \to T: (r, [n, s, q]) \mapsto [n + 1, s + r, q + r^2]\) for non-terminal states, and
an initial value \(\init = [0, 0, 0] \in T\) for terminal states.

Then, the value function \(\svalue_\policy: S \to R\) can be obtained by applying a post-processing function \(\post: T \to R: [n, s, q] \mapsto \frac{q}{n} - \left(\frac{s}{n}\right)^2\) after the statistic function \(\statistic_\policy: S \to T\).

/assets/images/optics.svg — **Optics**: the recursive statistic function \(\statistic_\policy: S \to T\) is a composition of *bidirectional processes*, where the *forward process* \(S \to R \times S\) takes a state \(s_t\) and generates a reward \(r_{t+1}\) and the next state \(s_{t+1}\), and the *backward process* \(R \times T \to T\) takes a statistic \(\tau_{t+1}\) from the future and updates it with the reward \(r_{t+1}\) to produce the current statistic \(\tau_t\).

Enriching Disentanglement: From Logical Definitions to Quantitative Metrics

disentangled representation learning, enriched category theory, logic-metric correspondence, endofunctor algebra (sub)homomorphism
Yivan Zhang, Masashi Sugiyama
Neural Information Processing Systems 2024 (NeurIPS’24)
Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML) at International Conference on Machine Learning 2023 (ICML’23)
[arXiv] [OpenReview] [NeurIPS’24] [TAG-ML’23] [poster]

"How constant" is a non-constant function?

A constant function maps all inputs to the same output, but “how constant” is a non-constant function? Intuitively, a function is “more constant” if its output changes little when its input changes. We can measure constancy by checking how “spread out” the outputs are. This gives us a way to turn a yes-or-no question (“is it constant?”) into a numeric score (“how constant is it?”). Can we do this for other properties of a function?

In this work, we show how to do this kind of thing more generally. We take logical definitions—written using things like “equals”, “implies”, and “for all”—and turn them into quantitative metrics that tell us “how true” the property is. We do this by replacing each part of the logical formula with a matching quantitative counterpart: we replace the true/false values \(\truth\) with non-negative real numbers \(\quant\), replace equality \({=}: A \times A \to \truth\) with a kind of distance \(d: A \times A \to \quant\), and replace things like “for all” \(\forall\) with ways of combining values, like taking the maximum or average. This gives us a way to turn logical formulas into loss functions.

/assets/images/logic_metric_correspondence.svg

For example, consider the property of a function \(f: A \to B\) being constant. Logically, this means either

(p₁) \(\exists b \in B.\; \forall a \in A.\; f(a) = b\) all outputs equal to the same value, or equivalently,
(p₂) \(\forall a, a' \in A.\; f(a) = f(a')\) all outputs equal to each other.

To turn them into quantitative metrics, we can replace equality with a distance and quantifiers with aggregations and measure

(q₁) \(\inf_{b \in B} \operatorname{agg}_{a \in A} d(f(a), b)\) how far the outputs are from a central point, or
(q₂) \(\operatorname{agg}_{a, a' \in A} d(f(a), f(a'))\) how far the outputs are from each other.

Here, \(\operatorname{agg}\) can be any aggregation function that outputs zero if and only if all the inputs are zero, like the maximum, sum, or sum of squares. Then, both metrics give us a real number that tells us how close \(f: A \to B\) is to being constant.

The theory behind the logic-metric correspondence is actually quite simple and intuitive. It starts from simple facts like “the sum of two non-negative numbers is zero if and only if both numbers are zero.” Concretely, we can view the set \(\quant\) of non-negative real numbers equipped with the addition \(+: \quant \times \quant \to \quant\) as an “algebra” and the set \(\truth\) of binary truth values equipped with the conjunction \(\lcon: \truth \times \truth \to \truth\) as another algebra. They both have the “type” or “signature” \((-) \times (-) \to (-)\). For two algebras \((A, \alpha: A \times A \to A)\) and \((B, \beta: B \times B \to B)\), we can define an algebra homomorphism \(f: A \to B\) as a function that preserves the algebraic structure, meaning

\[f(\alpha(a, a')) = \beta(f(a), f(a'))\]

for all \(a, a' \in A\). Here, since we are interested in whether the metric is zero or not, we can consider the zero predicate \(\zeta: \quant \to \truth: x \mapsto (x = 0)\)—an “\(\texttt{is-zero}\)” function that returns true if and only if its input is zero. Then, “the sum of two non-negative numbers is zero if and only if both numbers are zero” can be rewritten as

\[\texttt{is-zero}(a + a') \leqv \texttt{is_zero}(a) \lcon \texttt{is_zero}(a')\]

for all \(a, a' \in \quant\). This exactly means that the zero predicate \(\zeta: \quant \to \truth\) is an algebra homomorphism from the algebra \((\quant, +)\) of non-negative real numbers to the algebra \((\truth, \lcon)\) of binary truth values.

Similarly, for a set \(A\), a quantity \(q: A \to \quant\) and a predicate \(p: A \to \truth\) can be viewed as algebras \((\quant, q: A \to \quant)\) and \((\truth, p: A \to \truth)\) with the same signature \(A \to (-)\). A quantity homomorphic to a predicate via a function \(f: \quant \to \truth\) means that

\[f(q(a)) = p(a)\]

for all \(a \in A\). For example, the distances \(d_A: A \times A \to \quant\) that we need are those measure to zero if and only if the pairs of inputs are equal, i.e.,

\[\texttt{is-zero}(d_A(a, a')) \leqv (a =_A a')\]

for all \(a, a' \in A\). We call them strict premetrics. Then, we can also say that strict premetrics are homomorhic to the equality predicate \(=_A: A \times A \to \truth\) via the zero predicate \(\zeta: \quant \to \truth\).

/assets/images/logic_metric_homomorphism.svg

Why do we care about this? Why bother rewriting simple facts as algebra homomorphisms? Because this gives us a way to build up more complex metrics from simple ones, compositionally. We can prove that if each part of a logical formula is homomorphic to a part of a metric, then the whole formula will still match the whole metric. This lets us turn logical definitions into quantitative metrics in a principled, compositional way.

A Category-theoretical Meta-analysis of Definitions of Disentanglement

disentangled representation learning, category theory, universal construction, monoidal product
Yivan Zhang, Masashi Sugiyama
International Conference on Machine Learning 2023 (ICML’23)
International Workshop on Symbolic-Neural Learning 2023 (SNL’23)
[arXiv] [OpenReview] [ICML’23] [slides] [poster]

Why are there so many definitions of disentanglement?

Why are there so many definitions of disentanglement? Some are based on functions, some on group theory, others on statistics, information theory, or even topology. At first, they seem totally different. But they all try to capture a similar idea: things should stay separate in a meaningful way, so different parts of a system should depend on different parts of the input. So, are these definitions really different? Or are they secretly connected?

In this work, we show that many of these definitions actually use different versions of the same structure—product. More precisely, we look at morphisms like \(f: A \otimes B \to C\) from a product \(A \otimes B\) of objects and ask: is it essentially a morphism \(f': A \to C\) that depends only on \(A\), not \(B\)? Depending on the setting, the product \(\otimes\) can be the Cartesian product \(\times\) of sets and functions, the direct product of groups and their actions, the joint distribution of random variables, or the product manifold in topology. No matter the case, a “disentangled encoder” is one that reconstructs the product structure.

/assets/images/disentanglement.svg — **Disentanglement**: reconstructing the product structure

We use category theory to explain this in a general and unified way. This helps clarify how different definitions relate to each other and gives researchers a common language to talk about them. It also makes it easier to choose definitions or design methods that match the structure of a given problem.

[product, product morphism, monoidal product]

/assets/images/product_function.svg — **Product** in the category \(\cSet\) of sets: Cartesian product of sets and functions

Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization

weakly supervised learning, noisy label learning, transition matrix, contraction mapping
Yivan Zhang, Gang Niu, Masashi Sugiyama
International Conference on Machine Learning 2021 (ICML’21)
[arXiv] [ICML’21] [code] [slides] [poster]

Learning from Aggregate Observations

weakly supervised learning, multiple instance learning, consistency up to an equivalence
Yivan Zhang, Nontawat Charoenphakdee, Zhenguo Wu, Masashi Sugiyama
Neural Information Processing Systems 2020 (NeurIPS’20)
[arXiv] [NeurIPS’20] [code]

Quotes

On abstraction & clarity

“Mathematics is the art of giving the same name to different things.” – Henri Poincaré, The Future of Mathematics, 1910
“The purpose of abstracting is not to be vague, but to create a new semantic level in which one can be absolutely precise.” – Edsger W. Dijkstra, Turing Award Lecture The Humble Programmer, 1972
“Category theory takes a bird’s eye view of mathematics. From high in the sky, details become invisible, but we can spot patterns that were impossible to detect from ground level.” – Tom Leinster, Basic Category Theory, 2016

On rigor & intuition

“Cohomology does not itself solve hard problems in topology or algebra. It clears away tangled multitudes of individually trivial problems. It puts the hard problems in clear relief and makes their solution possible. The same holds for category theory in general.” – Colin McLarty, The Last Mathematician from Hilbert’s Göttingen: Saunders Mac Lane as Philosopher of Mathematics, 2007
“The point of rigour is not to destroy all intuition; instead, it should be used to destroy bad intuition while clarifying and elevating good intuition. It is only with a combination of both rigorous formalism and good intuition that one can tackle complex mathematical problems; one needs the former to correctly deal with the fine details, and the latter to correctly deal with the big picture.” – Terence Tao, There’s more to mathematics than rigour and proofs, 2007

On problem formulation & solution

“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” – John W. Tukey, The Future of Data Analysis, 1962
“You might discover something via category theory but then share it without discussing category theory. If your audience is well versed in category theory, then go ahead and bring out your categories. But otherwise your audience might be bored or intimidated, as many people would be listening to an argument based on the finer points of Koine Greek grammar.” “The point isn’t to hide anything, only to consider your audience.” – John D. Cook, Category theory and Koine Greek, 2016

On systems, emergence, complexity & compositionality

“The whole is more than the sum of its parts.” – attributed to Aristotle, Metaphysics, 350 BC
“It is more correct to say that the whole is something else than the sum of its parts, because summing is a meaningless procedure, whereas the whole-part relationship is meaningful.” – Kurt Koffka, Principles of Gestalt Psychology, 1935.
“Roughly, by a complex system I mean one made up of a large number of parts that interact in a nonsimple way. In such systems, the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that, given the properties of the parts and the laws of their interaction, it is not a trivial matter to infer the properties of the whole. In the face of complexity, an in-principle reductionist may be at the same time a pragmatic holist.” – Herbert A. Simon, The Architecture of Complexity, 1962

Yivan Zhang