Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribePhysics-informed cluster analysis and a priori efficiency criterion for the construction of local reduced-order bases
Nonlinear model order reduction has opened the door to parameter optimization and uncertainty quantification in complex physics problems governed by nonlinear equations. In particular, the computational cost of solving these equations can be reduced by means of local reduced-order bases. This article examines the benefits of a physics-informed cluster analysis for the construction of cluster-specific reduced-order bases. We illustrate that the choice of the dissimilarity measure for clustering is fundamental and highly affects the performances of the local reduced-order bases. It is shown that clustering with an angle-based dissimilarity on simulation data efficiently decreases the intra-cluster Kolmogorov N-width. Additionally, an a priori efficiency criterion is introduced to assess the relevance of a ROM-net, a methodology for the reduction of nonlinear physics problems introduced in our previous work in [T. Daniel, F. Casenave, N. Akkari, D. Ryckelynck, Model order reduction assisted by deep neural networks (ROM-net), Advanced Modeling and Simulation in Engineering Sciences 7 (16), 2020]. This criterion also provides engineers with a very practical method for ROM-nets' hyperparameters calibration under constrained computational costs for the training phase. On five different physics problems, our physics-informed clustering strategy significantly outperforms classic strategies for the construction of local reduced-order bases in terms of projection errors.
Improving Physics Reasoning in Large Language Models Using Mixture of Refinement Agents
Large Language Models (LLMs) demonstrate remarkable capabilities in various reasoning tasks. However, they encounter significant challenges when it comes to scientific reasoning, particularly in physics, which requires not only mathematical reasoning but also factual and conceptual understanding. When addressing complex physics problems, LLMs typically face three key issues: problem miscomprehension, incorrect concept application, and computational errors. While each of these problems can be addressed individually, there is a need for a generalized approach that can tackle all three issues simultaneously. To address this, we introduce Mixture of Refinement Agents (MoRA), a novel agentic refinement framework that iteratively refines the LLM generated base solution by correcting the aforementioned errors, resulting in a significant performance improvement for open-source LLMs. Our approach aims to bridge the gap between opensource LLMs and GPT-4o by utilizing the latter as error identifier to guide these refinement agents. We evaluate our approach on the SciEval and MMLU subsets along with our own physics dataset (PhysicsQA). MoRA significantly improves the performance of Llama-3-70B and Gemma-2-27B on these datasets, achieving up to a 16% increase in final answer accuracy.
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
We introduce a benchmark to evaluate the capability of AI to solve problems in theoretical physics, focusing on high-energy theory and cosmology. The first iteration of our benchmark consists of 57 problems of varying difficulty, from undergraduate to research level. These problems are novel in the sense that they do not come from public problem collections. We evaluate our data set on various open and closed language models, including o3-mini, o1, DeepSeek-R1, GPT-4o and versions of Llama and Qwen. While we find impressive progress in model performance with the most recent models, our research-level difficulty problems are mostly unsolved. We address challenges of auto-verifiability and grading, and discuss common failure modes. While currently state-of-the art models are still of limited use for researchers, our results show that AI assisted theoretical physics research may become possible in the near future. We discuss the main obstacles towards this goal and possible strategies to overcome them. The public problems and solutions, results for various models, and updates to the data set and score distribution, are available on the website of the dataset tpbench.org.
PhysicsMinions: Winning Gold Medals in the Latest Physics Olympiads with a Coevolutionary Multimodal Multi-Agent System
Physics is central to understanding and shaping the real world, and the ability to solve physics problems is a key indicator of real-world physical intelligence. Physics Olympiads, renowned as the crown of competitive physics, provide a rigorous testbed requiring complex reasoning and deep multimodal understanding, yet they remain largely underexplored in AI research. Existing approaches are predominantly single-model based, and open-source MLLMs rarely reach gold-medal-level performance. To address this gap, we propose PhysicsMinions, a coevolutionary multi-agent system for Physics Olympiad. Its architecture features three synergistic studios: a Visual Studio to interpret diagrams, a Logic Studio to formulate solutions, and a Review Studio to perform dual-stage verification. The system coevolves through an iterative refinement loop where feedback from the Review Studio continuously guides the Logic Studio, enabling the system to self-correct and converge towards the ground truth. Evaluated on the HiPhO benchmark spanning 7 latest physics Olympiads, PhysicsMinions delivers three major breakthroughs: (i) Strong generalization: it consistently improves both open-source and closed-source models of different sizes, delivering clear benefits over their single-model baselines; (ii) Historic breakthroughs: it elevates open-source models from only 1-2 to 6 gold medals across 7 Olympiads, achieving the first-ever open-source gold medal in the latest International Physics Olympiad (IPhO) under the average-score metric; and (iii) Scaling to human expert: it further advances the open-source Pass@32 score to 26.8/30 points on the latest IPhO, ranking 4th of 406 contestants and far surpassing the top single-model score of 22.7 (ranked 22nd). Generally, PhysicsMinions offers a generalizable framework for Olympiad-level problem solving, with the potential to extend across disciplines.
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Large language models (LLMs) have demonstrated remarkable capabilities in solving complex reasoning tasks, particularly in mathematics. However, the domain of physics reasoning presents unique challenges that have received significantly less attention. Existing benchmarks often fall short in evaluating LLMs' abilities on the breadth and depth of undergraduate-level physics, underscoring the need for a comprehensive evaluation. To fill this gap, we introduce UGPhysics, a large-scale and comprehensive benchmark specifically designed to evaluate UnderGraduate-level Physics (UGPhysics) reasoning with LLMs. UGPhysics includes 5,520 undergraduate-level physics problems in both English and Chinese, covering 13 subjects with seven different answer types and four distinct physics reasoning skills, all rigorously screened for data leakage. Additionally, we develop a Model-Assistant Rule-based Judgment (MARJ) pipeline specifically tailored for assessing answer correctness of physics problems, ensuring accurate evaluation. Our evaluation of 31 leading LLMs shows that the highest overall accuracy, 49.8% (achieved by OpenAI-o1-mini), emphasizes the necessity for models with stronger physics reasoning skills, beyond math abilities. We hope UGPhysics, along with MARJ, will drive future advancements in AI for physics reasoning.
Scaling Physical Reasoning with the PHYSICS Dataset
Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics.
Enhancing LLMs for Physics Problem-Solving using Reinforcement Learning with Human-AI Feedback
Large Language Models (LLMs) have demonstrated strong capabilities in text-based tasks but struggle with the complex reasoning required for physics problems, particularly in advanced arithmetic and conceptual understanding. While some research has explored ways to enhance LLMs in physics education using techniques such as prompt engineering and Retrieval Augmentation Generation (RAG), not enough effort has been made in addressing their limitations in physics reasoning. This paper presents a novel approach to improving LLM performance on physics questions using Reinforcement Learning with Human and Artificial Intelligence Feedback (RLHAIF). We evaluate several reinforcement learning methods, including Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and Remax optimization. These methods are chosen to investigate RL policy performance with different settings on the PhyQA dataset, which includes challenging physics problems from high school textbooks. Our RLHAIF model, tested on leading LLMs like LLaMA2 and Mistral, achieved superior results, notably with the MISTRAL-PPO model, demonstrating marked improvements in reasoning and accuracy. It achieved high scores, with a 58.67 METEOR score and a 0.74 Reasoning score, making it a strong example for future physics reasoning research in this area.
P1: Mastering Physics Olympiads with Reinforcement Learning
Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.
PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models
Physics problem-solving is a challenging domain for large AI models, requiring integration of conceptual understanding, mathematical reasoning, and interpretation of physical diagrams. Current evaluation methodologies show notable limitations in capturing the breadth and complexity of undergraduate-level physics, underscoring the need for more rigorous assessments. To this end, we present PhysUniBench, a large-scale multimodal benchmark designed to evaluate and improve the reasoning capabilities of multimodal large language models (MLLMs) specifically on undergraduate-level physics problems. PhysUniBench consists of 3,304 physics questions spanning 8 major sub-disciplines of physics, each accompanied by one visual diagrams. The benchmark includes both open-ended and multiple-choice questions, systematically curated and difficulty-rated through an iterative model-in-the-loop process. The benchmark's construction involved a rigorous multi-stage process, including multiple roll-outs, expert-level evaluation, automated filtering of easily solved problems, and a nuanced difficulty grading system with five levels. Through extensive experiments, we observe that current state-of-the-art models encounter substantial challenges in physics reasoning. For example, GPT-4o mini achieves only about 34.2\% accuracy in the proposed PhysUniBench. These results highlight that current MLLMs struggle with advanced physics reasoning, especially on multi-step problems and those requiring precise diagram interpretation. By providing a broad and rigorous assessment tool, PhysUniBench aims to drive progress in AI for Science, encouraging the development of models with stronger physical reasoning, problem-solving skills, and multimodal understanding. The benchmark and evaluation scripts are available at https://prismax-team.github.io/PhysUniBenchmark/.
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems
Recent advancements have seen Large Language Models (LLMs) and Large Multimodal Models (LMMs) surpassing general human capabilities in various tasks, approaching the proficiency level of human experts across multiple domains. With traditional benchmarks becoming less challenging for these models, new rigorous challenges are essential to gauge their advanced abilities. In this work, we present OlympiadBench, an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. Each problem is detailed with expert-level annotations for step-by-step reasoning. Evaluating top-tier models on OlympiadBench, we implement a comprehensive assessment methodology to accurately evaluate model responses. Notably, the best-performing model, GPT-4V, attains an average score of 17.97% on OlympiadBench, with a mere 10.74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning. Our analysis orienting GPT-4V points out prevalent issues with hallucinations, knowledge omissions, and logical fallacies. We hope that our challenging benchmark can serve as a valuable resource for helping future AGI research endeavors. The data and evaluation code are available at https://github.com/OpenBMB/OlympiadBench
Neural Operator: Is data all you need to model the world? An insight into the impact of Physics Informed Machine Learning
Numerical approximations of partial differential equations (PDEs) are routinely employed to formulate the solution of physics, engineering and mathematical problems involving functions of several variables, such as the propagation of heat or sound, fluid flow, elasticity, electrostatics, electrodynamics, and more. While this has led to solving many complex phenomena, there are some limitations. Conventional approaches such as Finite Element Methods (FEMs) and Finite Differential Methods (FDMs) require considerable time and are computationally expensive. In contrast, data driven machine learning-based methods such as neural networks provide a faster, fairly accurate alternative, and have certain advantages such as discretization invariance and resolution invariance. This article aims to provide a comprehensive insight into how data-driven approaches can complement conventional techniques to solve engineering and physics problems, while also noting some of the major pitfalls of machine learning-based approaches. Furthermore, we highlight, a novel and fast machine learning-based approach (~1000x) to learning the solution operator of a PDE operator learning. We will note how these new computational approaches can bring immense advantages in tackling many problems in fundamental and applied physics.
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
We introduce PHYBench, a novel, high-quality benchmark designed for evaluating reasoning capabilities of large language models (LLMs) in physical contexts. PHYBench consists of 500 meticulously curated physics problems based on real-world physical scenarios, designed to assess the ability of models to understand and reason about realistic physical processes. Covering mechanics, electromagnetism, thermodynamics, optics, modern physics, and advanced physics, the benchmark spans difficulty levels from high school exercises to undergraduate problems and Physics Olympiad challenges. Additionally, we propose the Expression Edit Distance (EED) Score, a novel evaluation metric based on the edit distance between mathematical expressions, which effectively captures differences in model reasoning processes and results beyond traditional binary scoring methods. We evaluate various LLMs on PHYBench and compare their performance with human experts. Our results reveal that even state-of-the-art reasoning models significantly lag behind human experts, highlighting their limitations and the need for improvement in complex physical reasoning scenarios. Our benchmark results and dataset are publicly available at https://phybench-official.github.io/phybench-demo/.
CFDBench: A Large-Scale Benchmark for Machine Learning Methods in Fluid Dynamics
In recent years, applying deep learning to solve physics problems has attracted much attention. Data-driven deep learning methods produce fast numerical operators that can learn approximate solutions to the whole system of partial differential equations (i.e., surrogate modeling). Although these neural networks may have lower accuracy than traditional numerical methods, they, once trained, are orders of magnitude faster at inference. Hence, one crucial feature is that these operators can generalize to unseen PDE parameters without expensive re-training.In this paper, we construct CFDBench, a benchmark tailored for evaluating the generalization ability of neural operators after training in computational fluid dynamics (CFD) problems. It features four classic CFD problems: lid-driven cavity flow, laminar boundary layer flow in circular tubes, dam flows through the steps, and periodic Karman vortex street. The data contains a total of 302K frames of velocity and pressure fields, involving 739 cases with different operating condition parameters, generated with numerical methods. We evaluate the effectiveness of popular neural operators including feed-forward networks, DeepONet, FNO, U-Net, etc. on CFDBnech by predicting flows with non-periodic boundary conditions, fluid properties, and flow domain shapes that are not seen during training. Appropriate modifications were made to apply popular deep neural networks to CFDBench and enable the accommodation of more changing inputs. Empirical results on CFDBench show many baseline models have errors as high as 300% in some problems, and severe error accumulation when performing autoregressive inference. CFDBench facilitates a more comprehensive comparison between different neural operators for CFD compared to existing benchmarks.
ARB: Advanced Reasoning Benchmark for Large Language Models
Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reasoning and knowledge benchmarks. However, many of these benchmarks are losing utility as LLMs get increasingly high scores, despite not yet reaching expert performance in these domains. We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields. ARB presents a more challenging test than prior benchmarks, featuring problems in mathematics, physics, biology, chemistry, and law. As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge. We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks. In order to improve both automatic and assisted evaluation capabilities, we introduce a rubric-based evaluation approach, allowing GPT-4 to score its own intermediate reasoning steps. Further, we conduct a human evaluation of the symbolic subset of ARB, finding promising agreement between annotators and GPT-4 rubric evaluation scores.
DOoM: Difficult Olympiads of Math
This paper introduces DOoM, a new open-source benchmark designed to assess the capabilities of language models in solving mathematics and physics problems in Russian. The benchmark includes problems of varying difficulty, ranging from school-level tasks to university Olympiad and entrance exam questions. In this paper we discuss the motivation behind its creation, describe dataset's structure and evaluation methodology, and present initial results from testing various models. Analysis of the results shows a correlation between model performance and the number of tokens used, and highlights differences in performance between mathematics and physics tasks.
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated systems, etc. To ensure a deep understanding of the problem-solving process,we focus exclusively on calculation problems, requiring LLMs to independently generate comprehensive solutions. Meanwhile, leveraging tree-based representations of expressions, we introduce the Scalable Expression Edit Distance (SEED) score, which provides fine-grained (non-binary) partial credit and yields a more accurate assessment of similarity between prediction and ground-truth. Our results show that even the best models, Grok-4, reach only 36 average SEED score and 28% accuracy on CMPhysBench, underscoring a significant capability gap, especially for this practical and frontier domain relative to traditional physics. The code anddataset are publicly available at https://github.com/CMPhysBench/CMPhysBench.
Solving Quantitative Reasoning Problems with Language Models
Language models have achieved remarkable performance on a wide range of tasks that require natural language understanding. Nevertheless, state-of-the-art models have generally struggled with tasks that require quantitative reasoning, such as solving mathematics, science, and engineering problems at the college level. To help close this gap, we introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content. The model achieves state-of-the-art performance on technical benchmarks without the use of external tools. We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences that require quantitative reasoning, and find that the model can correctly answer nearly a third of them.
Have LLMs Advanced Enough? A Challenging Problem Solving Benchmark For Large Language Models
The performance of large language models (LLMs) on existing reasoning benchmarks has significantly improved over the past years. In response, we present JEEBench, a considerably more challenging benchmark dataset for evaluating the problem solving abilities of LLMs. We curate 515 challenging pre-engineering mathematics, physics and chemistry problems from the highly competitive IIT JEE-Advanced exam. Long-horizon reasoning on top of deep in-domain knowledge is essential for solving problems in this benchmark. Our evaluation on various open-source and proprietary models reveals that the highest performance, even after using techniques like self-consistency, self-refinement and chain-of-thought prompting, is less than 40%. The typical failure modes of GPT-4, the best model, are errors in algebraic manipulation, difficulty in grounding abstract concepts into mathematical equations accurately and failure in retrieving relevant domain-specific concepts. We also observe that by mere prompting, GPT-4 is unable to assess risk introduced by negative marking for incorrect answers. For this, we develop a post-hoc confidence-thresholding method over self-consistency, which enables effective response selection. We hope that our challenging benchmark will guide future re-search in problem-solving using LLMs.
Multi-stage Neural Networks: Function Approximator of Machine Precision
Deep learning techniques are increasingly applied to scientific problems, where the precision of networks is crucial. Despite being deemed as universal function approximators, neural networks, in practice, struggle to reduce the prediction errors below O(10^{-5}) even with large network size and extended training iterations. To address this issue, we developed the multi-stage neural networks that divides the training process into different stages, with each stage using a new network that is optimized to fit the residue from the previous stage. Across successive stages, the residue magnitudes decreases substantially and follows an inverse power-law relationship with the residue frequencies. The multi-stage neural networks effectively mitigate the spectral biases associated with regular neural networks, enabling them to capture the high frequency feature of target functions. We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision O(10^{-16}) of double-floating point within a finite number of iterations. Such levels of accuracy are rarely attainable using single neural networks alone.
A multifidelity approach to continual learning for physical systems
We introduce a novel continual learning method based on multifidelity deep neural networks. This method learns the correlation between the output of previously trained models and the desired output of the model on the current training dataset, limiting catastrophic forgetting. On its own the multifidelity continual learning method shows robust results that limit forgetting across several datasets. Additionally, we show that the multifidelity method can be combined with existing continual learning methods, including replay and memory aware synapses, to further limit catastrophic forgetting. The proposed continual learning method is especially suited for physical problems where the data satisfy the same physical laws on each domain, or for physics-informed neural networks, because in these cases we expect there to be a strong correlation between the output of the previous model and the model on the current training domain.
Solving physics-based initial value problems with unsupervised machine learning
Initial value problems -- a system of ordinary differential equations and corresponding initial conditions -- can be used to describe many physical phenomena including those arise in classical mechanics. We have developed a novel approach to solve physics-based initial value problems using unsupervised machine learning. We propose a deep learning framework that models the dynamics of a variety of mechanical systems through neural networks. Our framework is flexible, allowing us to solve non-linear, coupled, and chaotic dynamical systems. We demonstrate the effectiveness of our approach on systems including a free particle, a particle in a gravitational field, a classical pendulum, and the H\'enon--Heiles system (a pair of coupled harmonic oscillators with a non-linear perturbation, used in celestial mechanics). Our results show that deep neural networks can successfully approximate solutions to these problems, producing trajectories which conserve physical properties such as energy and those with stationary action. We note that probabilistic activation functions, as defined in this paper, are required to learn any solutions of initial value problems in their strictest sense, and we introduce coupled neural networks to learn solutions of coupled systems.
FlamePINN-1D: Physics-informed neural networks to solve forward and inverse problems of 1D laminar flames
Given the existence of various forward and inverse problems in combustion studies and applications that necessitate distinct methods for resolution, a framework to solve them in a unified way is critically needed. A promising approach is the integration of machine learning methods with governing equations of combustion systems, which exhibits superior generality and few-shot learning ability compared to purely data-driven methods. In this work, the FlamePINN-1D framework is proposed to solve the forward and inverse problems of 1D laminar flames based on physics-informed neural networks. Three cases with increasing complexity have been tested: Case 1 are freely-propagating premixed (FPP) flames with simplified physical models, while Case 2 and Case 3 are FPP and counterflow premixed (CFP) flames with detailed models, respectively. For forward problems, FlamePINN-1D aims to solve the flame fields and infer the unknown eigenvalues (such as laminar flame speeds) under the constraints of governing equations and boundary conditions. For inverse problems, FlamePINN-1D aims to reconstruct the continuous fields and infer the unknown parameters (such as transport and chemical kinetics parameters) from noisy sparse observations of the flame. Our results strongly validate these capabilities of FlamePINN-1D across various flames and working conditions. Compared to traditional methods, FlamePINN-1D is differentiable and mesh-free, exhibits no discretization errors, and is easier to implement for inverse problems. The inverse problem results also indicate the possibility of optimizing chemical mechanisms from measurements of laboratory 1D flames. Furthermore, some proposed strategies, such as hard constraints and thin-layer normalization, are proven to be essential for the robust learning of FlamePINN-1D. The code for this paper is partially available at https://github.com/CAME-THU/FlamePINN-1D.
Physics-Informed Neural Networks for One-Dimensional Quantum Well Problems
We implement physics-informed neural networks (PINNs) to solve the time-independent Schr\"odinger equation for three canonical one-dimensional quantum potentials: an infinite square well, a finite square well, and a finite barrier. The PINN models incorporate trial wavefunctions that exactly satisfy boundary conditions (Dirichlet zeros at domain boundaries), and they optimize a loss functional combining the PDE residual with a normalization constraint. For the infinite well, the ground-state energy is known (E = pi^2 in dimensionless units) and held fixed in training, whereas for the finite well and barrier, the eigenenergy is treated as a trainable parameter. We use fully-connected neural networks with smooth activation functions to represent the wavefunction and demonstrate that PINNs can learn the ground-state eigenfunctions and eigenvalues for these quantum systems. The results show that the PINN-predicted wavefunctions closely match analytical solutions or expected behaviors, and the learned eigenenergies converge to known values. We present training logs and convergence of the energy parameter, as well as figures comparing the PINN solutions to exact results. The discussion addresses the performance of PINNs relative to traditional numerical methods, highlighting challenges such as convergence to the correct eigenvalue, sensitivity to initialization, and the difficulty of modeling discontinuous potentials. We also discuss the importance of the normalization term to resolve the scaling ambiguity of the wavefunction. Finally, we conclude that PINNs are a viable approach for quantum eigenvalue problems, and we outline future directions including extensions to higher-dimensional and time-dependent Schr\"odinger equations.
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems
Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but focus on understanding the usefulness of incorporating "error-correction" data directly into the pretraining stage. This data consists of erroneous solution steps immediately followed by their corrections. Using a synthetic math dataset, we show promising results: this type of pretrain data can help language models achieve higher reasoning accuracy directly (i.e., through simple auto-regression, without multi-round prompting) compared to pretraining on the same amount of error-free data. We also delve into many details, such as (1) how this approach differs from beam search, (2) how such data can be prepared, (3) whether masking is needed on the erroneous tokens, (4) the amount of error required, (5) whether such data can be deferred to the fine-tuning stage, and many others.
DGNO: A Novel Physics-aware Neural Operator for Solving Forward and Inverse PDE Problems based on Deep, Generative Probabilistic Modeling
Solving parametric partial differential equations (PDEs) and associated PDE-based, inverse problems is a central task in engineering and physics, yet existing neural operator methods struggle with high-dimensional, discontinuous inputs and require large amounts of {\em labeled} training data. We propose the Deep Generative Neural Operator (DGNO), a physics-aware framework that addresses these challenges by leveraging a deep, generative, probabilistic model in combination with a set of lower-dimensional, latent variables that simultaneously encode PDE-inputs and PDE-outputs. This formulation can make use of unlabeled data and significantly improves inverse problem-solving, particularly for discontinuous or discrete-valued input functions. DGNO enforces physics constraints without labeled data by incorporating as virtual observables, weak-form residuals based on compactly supported radial basis functions (CSRBFs). These relax regularity constraints and eliminate higher-order derivatives from the objective function. We also introduce MultiONet, a novel neural operator architecture, which is a more expressive generalization of the popular DeepONet that significantly enhances the approximating power of the proposed model. These innovations make DGNO particularly effective for challenging forward and inverse, PDE-based problems, such as those involving multi-phase media. Numerical experiments demonstrate that DGNO achieves higher accuracy across multiple benchmarks while exhibiting robustness to noise and strong generalization to out-of-distribution cases. Its adaptability, and the ability to handle sparse, noisy data while providing probabilistic estimates, make DGNO a powerful tool for scientific and engineering applications.
PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving
We introduce PHYSICS, a comprehensive benchmark for university-level physics problem solving. It contains 1297 expert-annotated problems covering six core areas: classical mechanics, quantum mechanics, thermodynamics and statistical mechanics, electromagnetism, atomic physics, and optics. Each problem requires advanced physics knowledge and mathematical reasoning. We develop a robust automated evaluation system for precise and reliable validation. Our evaluation of leading foundation models reveals substantial limitations. Even the most advanced model, o3-mini, achieves only 59.9% accuracy, highlighting significant challenges in solving high-level scientific problems. Through comprehensive error analysis, exploration of diverse prompting strategies, and Retrieval-Augmented Generation (RAG)-based knowledge augmentation, we identify key areas for improvement, laying the foundation for future advancements.
EquiNO: A Physics-Informed Neural Operator for Multiscale Simulations
Multiscale problems are ubiquitous in physics. Numerical simulations of such problems by solving partial differential equations (PDEs) at high resolution are computationally too expensive for many-query scenarios, e.g., uncertainty quantification, remeshing applications, topology optimization, and so forth. This limitation has motivated the application of data-driven surrogate models, where the microscale computations are substituted with a surrogate, usually acting as a black-box mapping between macroscale quantities. These models offer significant speedups but struggle with incorporating microscale physical constraints, such as the balance of linear momentum and constitutive models. In this contribution, we propose Equilibrium Neural Operator (EquiNO) as a complementary physics-informed PDE surrogate for predicting microscale physics and compare it with variational physics-informed neural and operator networks. Our framework, applicable to the so-called multiscale FE^{,2}, computations, introduces the FE-OL approach by integrating the finite element (FE) method with operator learning (OL). We apply the proposed FE-OL approach to quasi-static problems of solid mechanics. The results demonstrate that FE-OL can yield accurate solutions even when confronted with a restricted dataset during model development. Our results show that EquiNO achieves speedup factors exceeding 8000-fold compared to traditional methods and offers an optimal balance between data-driven and physics-based strategies.
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning
Large language models demonstrate remarkable capabilities across various domains, especially mathematics and logic reasoning. However, current evaluations overlook physics-based reasoning - a complex task requiring physics theorems and constraints. We present PhysReason, a 1,200-problem benchmark comprising knowledge-based (25%) and reasoning-based (75%) problems, where the latter are divided into three difficulty levels (easy, medium, hard). Notably, problems require an average of 8.1 solution steps, with hard requiring 15.6, reflecting the complexity of physics-based reasoning. We propose the Physics Solution Auto Scoring Framework, incorporating efficient answer-level and comprehensive step-level evaluations. Top-performing models like Deepseek-R1, Gemini-2.0-Flash-Thinking, and o3-mini-high achieve less than 60% on answer-level evaluation, with performance dropping from knowledge questions (75.11%) to hard problems (31.95%). Through step-level evaluation, we identified four key bottlenecks: Physics Theorem Application, Physics Process Understanding, Calculation, and Physics Condition Analysis. These findings position PhysReason as a novel and comprehensive benchmark for evaluating physics-based reasoning capabilities in large language models. Our code and data will be published at https:/dxzxy12138.github.io/PhysReason.
Machine Learning Algebraic Geometry for Physics
We review some recent applications of machine learning to algebraic geometry and physics. Since problems in algebraic geometry can typically be reformulated as mappings between tensors, this makes them particularly amenable to supervised learning. Additionally, unsupervised methods can provide insight into the structure of such geometrical data. At the heart of this programme is the question of how geometry can be machine learned, and indeed how AI helps one to do mathematics. This is a chapter contribution to the book Machine learning and Algebraic Geometry, edited by A. Kasprzyk et al.
Interpretable Physics Reasoning and Performance Taxonomy in Vision-Language Models
As Vision-Language Models (VLMs) grow in sophistication, their ability to perform reasoning is coming under increasing supervision. While they excel at many tasks, their grasp of fundamental scientific principles, such as physics, remains an underexplored frontier. To reflect the advancements in these capabilities, we introduce a novel and accessible framework designed to rigorously evaluate VLMs on their understanding of 2D physics. Our framework features a pragmatic scenario generator that creates a diverse testbed of over 400 problems across four core domains: Projectile Motion, Collision Dynamics, Mechanics, and Fluid Dynamics. Through comprehensive evaluation of four state-of-the-art VLMs, we demonstrate a strong correlation between model scale and reasoning ability, with our top-performing model, Qwen2.5-VL-7B, achieving an overall score of 0.815. We find that while models excel at formulaic problems, they struggle significantly with domains requiring abstract spatial reasoning. By designing this framework, we aim to democratize the study of scientific reasoning in VLMs and foster deeper insights into their capabilities and limitations.
Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
While large language models (LLMs) with reasoning capabilities are progressing rapidly on high-school math competitions and coding, can they reason effectively through complex, open-ended challenges found in frontier physics research? And crucially, what kinds of reasoning tasks do physicists want LLMs to assist with? To address these questions, we present the CritPt (Complex Research using Integrated Thinking - Physics Test, pronounced "critical point"), the first benchmark designed to test LLMs on unpublished, research-level reasoning tasks that broadly covers modern physics research areas, including condensed matter, quantum physics, atomic, molecular & optical physics, astrophysics, high energy physics, mathematical physics, statistical physics, nuclear physics, nonlinear dynamics, fluid dynamics and biophysics. CritPt consists of 71 composite research challenges designed to simulate full-scale research projects at the entry level, which are also decomposed to 190 simpler checkpoint tasks for more fine-grained insights. All problems are newly created by 50+ active physics researchers based on their own research. Every problem is hand-curated to admit a guess-resistant and machine-verifiable answer and is evaluated by an automated grading pipeline heavily customized for advanced physics-specific output formats. We find that while current state-of-the-art LLMs show early promise on isolated checkpoints, they remain far from being able to reliably solve full research-scale challenges: the best average accuracy among base models is only 4.0% , achieved by GPT-5 (high), moderately rising to around 10% when equipped with coding tools. Through the realistic yet standardized evaluation offered by CritPt, we highlight a large disconnect between current model capabilities and realistic physics research demands, offering a foundation to guide the development of scientifically grounded AI tools.
Lagrangian PINNs: A causality-conforming solution to failure modes of physics-informed neural networks
Physics-informed neural networks (PINNs) leverage neural-networks to find the solutions of partial differential equation (PDE)-constrained optimization problems with initial conditions and boundary conditions as soft constraints. These soft constraints are often considered to be the sources of the complexity in the training phase of PINNs. Here, we demonstrate that the challenge of training (i) persists even when the boundary conditions are strictly enforced, and (ii) is closely related to the Kolmogorov n-width associated with problems demonstrating transport, convection, traveling waves, or moving fronts. Given this realization, we describe the mechanism underlying the training schemes such as those used in eXtended PINNs (XPINN), curriculum regularization, and sequence-to-sequence learning. For an important category of PDEs, i.e., governed by non-linear convection-diffusion equation, we propose reformulating PINNs on a Lagrangian frame of reference, i.e., LPINNs, as a PDE-informed solution. A parallel architecture with two branches is proposed. One branch solves for the state variables on the characteristics, and the second branch solves for the low-dimensional characteristics curves. The proposed architecture conforms to the causality innate to the convection, and leverages the direction of travel of the information in the domain. Finally, we demonstrate that the loss landscapes of LPINNs are less sensitive to the so-called "complexity" of the problems, compared to those in the traditional PINNs in the Eulerian framework.
ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning
Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data -- natural vibrational frequencies -- via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.
HyperTrack: Neural Combinatorics for High Energy Physics
Combinatorial inverse problems in high energy physics span enormous algorithmic challenges. This work presents a new deep learning driven clustering algorithm that utilizes a space-time non-local trainable graph constructor, a graph neural network, and a set transformer. The model is trained with loss functions at the graph node, edge and object level, including contrastive learning and meta-supervision. The algorithm can be applied to problems such as charged particle tracking, calorimetry, pile-up discrimination, jet physics, and beyond. We showcase the effectiveness of this cutting-edge AI approach through particle tracking simulations. The code is available online.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.
MeshMask: Physics-Based Simulations with Masked Graph Neural Networks
We introduce a novel masked pre-training technique for graph neural networks (GNNs) applied to computational fluid dynamics (CFD) problems. By randomly masking up to 40\% of input mesh nodes during pre-training, we force the model to learn robust representations of complex fluid dynamics. We pair this masking strategy with an asymmetric encoder-decoder architecture and gated multi-layer perceptrons to further enhance performance. The proposed method achieves state-of-the-art results on seven CFD datasets, including a new challenging dataset of 3D intracranial aneurysm simulations with over 250,000 nodes per mesh. Moreover, it significantly improves model performance and training efficiency across such diverse range of fluid simulation tasks. We demonstrate improvements of up to 60\% in long-term prediction accuracy compared to previous best models, while maintaining similar computational costs. Notably, our approach enables effective pre-training on multiple datasets simultaneously, significantly reducing the time and data required to achieve high performance on new tasks. Through extensive ablation studies, we provide insights into the optimal masking ratio, architectural choices, and training strategies.
Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark
Existing physical commonsense reasoning benchmarks predominantly focus on Western contexts, overlooking cultural variations in physical problem-solving. To address this gap, we introduce EPiK (Everyday Physics in Korean Contexts), a novel benchmark comprising 181 binary-choice problems that test physical reasoning within Korean cultural contexts, ranging from kimchi (Korean food) to traditional fermentation. EPiK is constructed using a two-stage generation and verification pipeline to create culturally-authentic problems across 9 reasoning subtasks and 84 scenarios. Unlike approaches based on simple translation, our method generates problems organically from Korean contexts while upholding rigorous physical reasoning standards. Our evaluations show that Korean-specialized models consistently outperform general-purpose models of comparable size. This performance gap highlights the limitations of culturally-agnostic models and demonstrates the critical need for culturally-aware benchmarks to truly measure language understanding. Our EPiK is publicly available at https://huggingface.co/datasets/jjae/EPiK.
HyPINO: Multi-Physics Neural Operators via HyperPINNs and the Method of Manufactured Solutions
We present HyPINO, a multi-physics neural operator designed for zero-shot generalization across a broad class of parametric PDEs without requiring task-specific fine-tuning. Our approach combines a Swin Transformer-based hypernetwork with mixed supervision: (i) labeled data from analytical solutions generated via the Method of Manufactured Solutions (MMS), and (ii) unlabeled samples optimized using physics-informed objectives. The model maps PDE parametrizations to target Physics-Informed Neural Networks (PINNs) and can handle linear elliptic, hyperbolic, and parabolic equations in two dimensions with varying source terms, geometries, and mixed Dirichlet/Neumann boundary conditions, including interior boundaries. HyPINO achieves strong zero-shot accuracy on seven benchmark problems from PINN literature, outperforming U-Nets, Poseidon, and Physics-Informed Neural Operators (PINO). Further, we introduce an iterative refinement procedure that compares the physics of the generated PINN to the requested PDE and uses the discrepancy to generate a "delta" PINN. Summing their contributions and repeating this process forms an ensemble whose combined solution progressively reduces the error on six benchmarks and achieves over 100x gain in average L_2 loss in the best case, while retaining forward-only inference. Additionally, we evaluate the fine-tuning behavior of PINNs initialized by HyPINO and show that they converge faster and to lower final error than both randomly initialized and Reptile-meta-learned PINNs on five benchmarks, performing on par on the remaining two. Our results highlight the potential of this scalable approach as a foundation for extending neural operators toward solving increasingly complex, nonlinear, and high-dimensional PDE problems with significantly improved accuracy and reduced computational cost.
ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks
The loss functions of many learning problems contain multiple additive terms that can disagree and yield conflicting update directions. For Physics-Informed Neural Networks (PINNs), loss terms on initial/boundary conditions and physics equations are particularly interesting as they are well-established as highly difficult tasks. To improve learning the challenging multi-objective task posed by PINNs, we propose the ConFIG method, which provides conflict-free updates by ensuring a positive dot product between the final update and each loss-specific gradient. It also maintains consistent optimization rates for all loss terms and dynamically adjusts gradient magnitudes based on conflict levels. We additionally leverage momentum to accelerate optimizations by alternating the back-propagation of different loss terms. We provide a mathematical proof showing the convergence of the ConFIG method, and it is evaluated across a range of challenging PINN scenarios. ConFIG consistently shows superior performance and runtime compared to baseline methods. We also test the proposed method in a classic multi-task benchmark, where the ConFIG method likewise exhibits a highly promising performance. Source code is available at https://tum-pbs.github.io/ConFIG
MultiAdam: Parameter-wise Scale-invariant Optimizer for Multiscale Training of Physics-informed Neural Networks
Physics-informed Neural Networks (PINNs) have recently achieved remarkable progress in solving Partial Differential Equations (PDEs) in various fields by minimizing a weighted sum of PDE loss and boundary loss. However, there are several critical challenges in the training of PINNs, including the lack of theoretical frameworks and the imbalance between PDE loss and boundary loss. In this paper, we present an analysis of second-order non-homogeneous PDEs, which are classified into three categories and applicable to various common problems. We also characterize the connections between the training loss and actual error, guaranteeing convergence under mild conditions. The theoretical analysis inspires us to further propose MultiAdam, a scale-invariant optimizer that leverages gradient momentum to parameter-wisely balance the loss terms. Extensive experiment results on multiple problems from different physical domains demonstrate that our MultiAdam solver can improve the predictive accuracy by 1-2 orders of magnitude compared with strong baselines.
VORTEX: Physics-Driven Data Augmentations Using Consistency Training for Robust Accelerated MRI Reconstruction
Deep neural networks have enabled improved image quality and fast inference times for various inverse problems, including accelerated magnetic resonance imaging (MRI) reconstruction. However, such models require a large number of fully-sampled ground truth datasets, which are difficult to curate, and are sensitive to distribution drifts. In this work, we propose applying physics-driven data augmentations for consistency training that leverage our domain knowledge of the forward MRI data acquisition process and MRI physics to achieve improved label efficiency and robustness to clinically-relevant distribution drifts. Our approach, termed VORTEX, (1) demonstrates strong improvements over supervised baselines with and without data augmentation in robustness to signal-to-noise ratio change and motion corruption in data-limited regimes; (2) considerably outperforms state-of-the-art purely image-based data augmentation techniques and self-supervised reconstruction methods on both in-distribution and out-of-distribution data; and (3) enables composing heterogeneous image-based and physics-driven data augmentations. Our code is available at https://github.com/ad12/meddlr.
Flow Matching Meets PDEs: A Unified Framework for Physics-Constrained Generation
Generative machine learning methods, such as diffusion models and flow matching, have shown great potential in modeling complex system behaviors and building efficient surrogate models. However, these methods typically learn the underlying physics implicitly from data. We propose Physics-Based Flow Matching (PBFM), a novel generative framework that explicitly embeds physical constraints, both PDE residuals and algebraic relations, into the flow matching objective. We also introduce temporal unrolling at training time that improves the accuracy of the final, noise-free sample prediction. Our method jointly minimizes the flow matching loss and the physics-based residual loss without requiring hyperparameter tuning of their relative weights. Additionally, we analyze the role of the minimum noise level, sigma_{min}, in the context of physical constraints and evaluate a stochastic sampling strategy that helps to reduce physical residuals. Through extensive benchmarks on three representative PDE problems, we show that our approach yields up to an 8times more accurate physical residuals compared to FM, while clearly outperforming existing algorithms in terms of distributional accuracy. PBFM thus provides a principled and efficient framework for surrogate modeling, uncertainty quantification, and accelerated simulation in physics and engineering applications.
PhySense: Principle-Based Physics Reasoning Benchmarking for Large Language Models
Large language models (LLMs) have rapidly advanced and are increasingly capable of tackling complex scientific problems, including those in physics. Despite this progress, current LLMs often fail to emulate the concise, principle-based reasoning characteristic of human experts, instead generating lengthy and opaque solutions. This discrepancy highlights a crucial gap in their ability to apply core physical principles for efficient and interpretable problem solving. To systematically investigate this limitation, we introduce PhySense, a novel principle-based physics reasoning benchmark designed to be easily solvable by experts using guiding principles, yet deceptively difficult for LLMs without principle-first reasoning. Our evaluation across multiple state-of-the-art LLMs and prompt types reveals a consistent failure to align with expert-like reasoning paths, providing insights for developing AI systems with efficient, robust and interpretable principle-based scientific reasoning.
PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs
While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.
Solving Key Challenges in Collider Physics with Foundation Models
Foundation Models are neural networks that are capable of simultaneously solving many problems. Large Language Foundation Models like ChatGPT have revolutionized many aspects of daily life, but their impact for science is not yet clear. In this paper, we use a new Foundation Model for hadronic jets to solve three key challenges in collider physics. In particular, we show how experiments can (1) save significant computing power when developing reconstruction algorithms, (2) perform a complete uncertainty quantification for high-dimensional measurements, and (3) search for new physics with model agnostic methods using low-level inputs. In each case, there are significant computational or methodological challenges with current methods that limit the science potential of deep learning algorithms. By solving each problem, we take jet Foundation Models beyond proof-of-principle studies and into the toolkit of practitioners.
Ultra Fast Transformers on FPGAs for Particle Physics Experiments
This work introduces a highly efficient implementation of the transformer architecture on a Field-Programmable Gate Array (FPGA) by using the hls4ml tool. Given the demonstrated effectiveness of transformer models in addressing a wide range of problems, their application in experimental triggers within particle physics becomes a subject of significant interest. In this work, we have implemented critical components of a transformer model, such as multi-head attention and softmax layers. To evaluate the effectiveness of our implementation, we have focused on a particle physics jet flavor tagging problem, employing a public dataset. We recorded latency under 2 mus on the Xilinx UltraScale+ FPGA, which is compatible with hardware trigger requirements at the CERN Large Hadron Collider experiments.
The Physics-Informed Neural Network Gravity Model: Generation III
Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) show considerable potential in their capacity to identify solutions to complex differential equations. Over the past two years, much work has gone into the development of PINNs capable of solving the gravity field modeling problem -- i.e.\ learning a differentiable form of the gravitational potential from position and acceleration estimates. While the past PINN gravity models (PINN-GMs) have demonstrated advantages in model compactness, robustness to noise, and sample efficiency; there remain key modeling challenges which this paper aims to address. Specifically, this paper introduces the third generation of the Physics-Informed Neural Network Gravity Model (PINN-GM-III) which solves the problems of extrapolation error, bias towards low-altitude samples, numerical instability at high-altitudes, and compliant boundary conditions through numerous modifications to the model's design. The PINN-GM-III is tested by modeling a known heterogeneous density asteroid, and its performance is evaluated using seven core metrics which showcases its strengths against its predecessors and other analytic and numerical gravity models.
Transformer Meets Boundary Value Inverse Problems
A Transformer-based deep direct sampling method is proposed for electrical impedance tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A real-time reconstruction is achieved by evaluating the learned inverse operator between carefully designed data and the reconstructed images. An effort is made to give a specific example to a fundamental question: whether and how one can benefit from the theoretical structure of a mathematical problem to develop task-oriented and structure-conforming deep neural networks? Specifically, inspired by direct sampling methods for inverse problems, the 1D boundary data in different frequencies are preprocessed by a partial differential equation-based feature map to yield 2D harmonic extensions as different input channels. Then, by introducing learnable non-local kernels, the direct sampling is recast to a modified attention mechanism. The new method achieves superior accuracy over its predecessors and contemporary operator learners and shows robustness to noises in benchmarks. This research shall strengthen the insights that, despite being invented for natural language processing tasks, the attention mechanism offers great flexibility to be modified in conformity with the a priori mathematical knowledge, which ultimately leads to the design of more physics-compatible neural architectures.
Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling
Despite the success of physics-informed neural networks (PINNs) in approximating partial differential equations (PDEs), PINNs can sometimes fail to converge to the correct solution in problems involving complicated PDEs. This is reflected in several recent studies on characterizing the "failure modes" of PINNs, although a thorough understanding of the connection between PINN failure modes and sampling strategies is missing. In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that training PINNs relies on successful "propagation" of solution from initial and/or boundary condition points to interior points. We show that PINNs with poor sampling strategies can get stuck at trivial solutions if there are propagation failures, characterized by highly imbalanced PDE residual fields. To mitigate propagation failures, we propose a novel Retain-Resample-Release sampling (R3) algorithm that can incrementally accumulate collocation points in regions of high PDE residuals with little to no computational overhead. We provide an extension of R3 sampling to respect the principle of causality while solving time-dependent PDEs. We theoretically analyze the behavior of R3 sampling and empirically demonstrate its efficacy and efficiency in comparison with baselines on a variety of PDE problems.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Recent advances in deep learning have inspired numerous works on data-driven solutions to partial differential equation (PDE) problems. These neural PDE solvers can often be much faster than their numerical counterparts; however, each presents its unique limitations and generally balances training cost, numerical accuracy, and ease of applicability to different problem setups. To address these limitations, we introduce several methods to apply latent diffusion models to physics simulation. Firstly, we introduce a mesh autoencoder to compress arbitrarily discretized PDE data, allowing for efficient diffusion training across various physics. Furthermore, we investigate full spatio-temporal solution generation to mitigate autoregressive error accumulation. Lastly, we investigate conditioning on initial physical quantities, as well as conditioning solely on a text prompt to introduce text2PDE generation. We show that language can be a compact, interpretable, and accurate modality for generating physics simulations, paving the way for more usable and accessible PDE solvers. Through experiments on both uniform and structured grids, we show that the proposed approach is competitive with current neural PDE solvers in both accuracy and efficiency, with promising scaling behavior up to sim3 billion parameters. By introducing a scalable, accurate, and usable physics simulator, we hope to bring neural PDE solvers closer to practical use.
Exploring a Physics-Informed Decision Transformer for Distribution System Restoration: Methodology and Performance Analysis
Driven by advancements in sensing and computing, deep reinforcement learning (DRL)-based methods have demonstrated significant potential in effectively tackling distribution system restoration (DSR) challenges under uncertain operational scenarios. However, the data-intensive nature of DRL poses obstacles in achieving satisfactory DSR solutions for large-scale, complex distribution systems. Inspired by the transformative impact of emerging foundation models, including large language models (LLMs), across various domains, this paper explores an innovative approach harnessing LLMs' powerful computing capabilities to address scalability challenges inherent in conventional DRL methods for solving DSR. To our knowledge, this study represents the first exploration of foundation models, including LLMs, in revolutionizing conventional DRL applications in power system operations. Our contributions are twofold: 1) introducing a novel LLM-powered Physics-Informed Decision Transformer (PIDT) framework that leverages LLMs to transform conventional DRL methods for DSR operations, and 2) conducting comparative studies to assess the performance of the proposed LLM-powered PIDT framework at its initial development stage for solving DSR problems. While our primary focus in this paper is on DSR operations, the proposed PIDT framework can be generalized to optimize sequential decision-making across various power system operations.
MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge
Solving mechanics problems using numerical methods requires comprehensive intelligent capability of retrieving relevant knowledge and theory, constructing and executing codes, analyzing the results, a task that has thus far mainly been reserved for humans. While emerging AI methods can provide effective approaches to solve end-to-end problems, for instance via the use of deep surrogate models or various data analytics strategies, they often lack physical intuition since knowledge is baked into the parametric complement through training, offering less flexibility when it comes to incorporating mathematical or physical insights. By leveraging diverse capabilities of multiple dynamically interacting large language models (LLMs), we can overcome the limitations of conventional approaches and develop a new class of physics-inspired generative machine learning platform, here referred to as MechAgents. A set of AI agents can solve mechanics tasks, here demonstrated for elasticity problems, via autonomous collaborations. A two-agent team can effectively write, execute and self-correct code, in order to apply finite element methods to solve classical elasticity problems in various flavors (different boundary conditions, domain geometries, meshes, small/finite deformation and linear/hyper-elastic constitutive laws, and others). For more complex tasks, we construct a larger group of agents with enhanced division of labor among planning, formulating, coding, executing and criticizing the process and results. The agents mutually correct each other to improve the overall team-work performance in understanding, formulating and validating the solution. Our framework shows the potential of synergizing the intelligence of language models, the reliability of physics-based modeling, and the dynamic collaborations among diverse agents, opening novel avenues for automation of solving engineering problems.
AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers
Recent advances in neural surrogate modeling offer the potential for transformative innovations in applications such as automotive aerodynamics. Yet, industrial-scale problems often involve volumetric meshes with cell counts reaching the 100 millions, presenting major scalability challenges. Complex geometries further complicate modeling through intricate surface-volume interactions, while quantities such as vorticity are highly nonlinear and must satisfy strict divergence-free constraints. To address these requirements, we introduce AB-UPT as a novel modeling scheme for building neural surrogates for CFD simulations. AB-UPT is designed to: (i) decouple geometry encoding and prediction tasks via multi-branch operators; (ii) enable scalability to high-resolution outputs via neural simulation in a low-dimensional latent space, coupled with anchored neural field decoders to predict high-fidelity outputs; (iii) enforce physics consistency by a novel divergence-free formulation. We show that AB-UPT yields state-of-the-art predictive accuracy of surface and volume fields on automotive CFD simulations ranging from 33 thousand up to 150 million mesh cells. Furthermore, our anchored neural field architecture enables the enforcement of hard physical constraints on the physics predictions without degradation in performance, exemplified by modeling divergence-free vorticity fields. Notably, the proposed models can be trained on a single GPU in less than a day and predict industry-standard surface and volume fields within seconds. Additionally, we show that the flexible design of our method enables neural simulation from a CAD geometry alone, omitting the need for costly CFD meshing procedures.
Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving PDEs based on Kolmogorov Arnold Networks
AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in various forms, such as strong form, energy form, and inverse form. While mathematically equivalent, these forms are not computationally equivalent, making the exploration of different PDE formulations significant in computational physics. Thus, we propose different PDE forms based on KAN instead of MLP, termed Kolmogorov-Arnold-Informed Neural Network (KINN). We systematically compare MLP and KAN in various numerical examples of PDEs, including multi-scale, singularity, stress concentration, nonlinear hyperelasticity, heterogeneous, and complex geometry problems. Our results demonstrate that KINN significantly outperforms MLP in terms of accuracy and convergence speed for numerous PDEs in computational solid mechanics, except for the complex geometry problem. This highlights KINN's potential for more efficient and accurate PDE solutions in AI for PDEs.
Learning to Fly -- a Gym Environment with PyBullet Physics for Reinforcement Learning of Multi-agent Quadcopter Control
Robotic simulators are crucial for academic research and education as well as the development of safety-critical applications. Reinforcement learning environments -- simple simulations coupled with a problem specification in the form of a reward function -- are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop -- and fairly compare -- control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems
Scientists often infer abstract procedures from specific instances of problems and use the abstractions to generate new, related instances. For example, programs encoding the formal rules and properties of a system have been useful in fields ranging from RL (procedural environments) to physics (simulation engines). These programs can be seen as functions which execute to different outputs based on their parameterizations (e.g., gridworld configuration or initial physical conditions). We introduce the term EFA (Executable Functional Abstraction) to denote such programs for math problems. EFA-like constructs have been shown to be useful for math reasoning as problem generators for stress-testing models. However, prior work has been limited to abstractions for grade-school math (whose simple rules are easy to encode in programs), while generating EFAs for advanced math has thus far required human engineering. We explore the automatic construction of EFAs for advanced math problems. We operationalize the task of automatically constructing EFAs as a program synthesis task, and develop EFAGen, which conditions an LLM on a seed math problem and its step-by-step solution to generate candidate EFA programs that are faithful to the generalized problem and solution class underlying the seed problem. Furthermore, we formalize properties any valid EFA must possess in terms of executable unit tests, and show how the tests can be used as verifiable rewards to train LLMs to become better writers of EFAs. We demonstrate that EFAs constructed by EFAGen behave rationally by remaining faithful to seed problems, produce learnable problem variations, and that EFAGen can infer EFAs across multiple diverse sources of competition-level math problems. Finally, we show downstream uses of model-written EFAs e.g. finding problem variations that are harder or easier for a learner to solve, as well as data generation.
SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
We present SeePhys, a large-scale multimodal benchmark for LLM reasoning grounded in physics questions ranging from middle school to PhD qualifying exams. The benchmark covers 7 fundamental domains spanning the physics discipline, incorporating 21 categories of highly heterogeneous diagrams. In contrast to prior works where visual elements mainly serve auxiliary purposes, our benchmark features a substantial proportion of vision-essential problems (75\%) that mandate visual information extraction for correct solutions. Through extensive evaluation, we observe that even the most advanced visual reasoning models (e.g., Gemini-2.5-pro and o4-mini) achieve sub-60\% accuracy on our benchmark. These results reveal fundamental challenges in current large language models' visual understanding capabilities, particularly in: (i) establishing rigorous coupling between diagram interpretation and physics reasoning, and (ii) overcoming their persistent reliance on textual cues as cognitive shortcuts.
Cheetah: Bridging the Gap Between Machine Learning and Particle Accelerator Physics with High-Speed, Differentiable Simulations
Machine learning has emerged as a powerful solution to the modern challenges in accelerator physics. However, the limited availability of beam time, the computational cost of simulations, and the high-dimensionality of optimisation problems pose significant challenges in generating the required data for training state-of-the-art machine learning models. In this work, we introduce Cheetah, a PyTorch-based high-speed differentiable linear-beam dynamics code. Cheetah enables the fast collection of large data sets by reducing computation times by multiple orders of magnitude and facilitates efficient gradient-based optimisation for accelerator tuning and system identification. This positions Cheetah as a user-friendly, readily extensible tool that integrates seamlessly with widely adopted machine learning tools. We showcase the utility of Cheetah through five examples, including reinforcement learning training, gradient-based beamline tuning, gradient-based system identification, physics-informed Bayesian optimisation priors, and modular neural network surrogate modelling of space charge effects. The use of such a high-speed differentiable simulation code will simplify the development of machine learning-based methods for particle accelerators and fast-track their integration into everyday operations of accelerator facilities.
NMR-Solver: Automated Structure Elucidation via Large-Scale Spectral Matching and Physics-Guided Fragment Optimization
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the most powerful and widely used tools for molecular structure elucidation in organic chemistry. However, the interpretation of NMR spectra to determine unknown molecular structures remains a labor-intensive and expertise-dependent process, particularly for complex or novel compounds. Although recent methods have been proposed for molecular structure elucidation, they often underperform in real-world applications due to inherent algorithmic limitations and limited high-quality data. Here, we present NMR-Solver, a practical and interpretable framework for the automated determination of small organic molecule structures from ^1H and ^{13}C NMR spectra. Our method introduces an automated framework for molecular structure elucidation, integrating large-scale spectral matching with physics-guided fragment-based optimization that exploits atomic-level structure-spectrum relationships in NMR. We evaluate NMR-Solver on simulated benchmarks, curated experimental data from the literature, and real-world experiments, demonstrating its strong generalization, robustness, and practical utility in challenging, real-life scenarios. NMR-Solver unifies computational NMR analysis, deep learning, and interpretable chemical reasoning into a coherent system. By incorporating the physical principles of NMR into molecular optimization, it enables scalable, automated, and chemically meaningful molecular identification, establishing a generalizable paradigm for solving inverse problems in molecular science.
Respecting causality is all you need for training physics-informed neural networks
While the popularity of physics-informed neural networks (PINNs) is steadily rising, to this date PINNs have not been successful in simulating dynamical systems whose solution exhibits multi-scale, chaotic or turbulent behavior. In this work we attribute this shortcoming to the inability of existing PINNs formulations to respect the spatio-temporal causal structure that is inherent to the evolution of physical systems. We argue that this is a fundamental limitation and a key source of error that can ultimately steer PINN models to converge towards erroneous solutions. We address this pathology by proposing a simple re-formulation of PINNs loss functions that can explicitly account for physical causality during model training. We demonstrate that this simple modification alone is enough to introduce significant accuracy improvements, as well as a practical quantitative mechanism for assessing the convergence of a PINNs model. We provide state-of-the-art numerical results across a series of benchmarks for which existing PINNs formulations fail, including the chaotic Lorenz system, the Kuramoto-Sivashinsky equation in the chaotic regime, and the Navier-Stokes equations in the turbulent regime. To the best of our knowledge, this is the first time that PINNs have been successful in simulating such systems, introducing new opportunities for their applicability to problems of industrial complexity.
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks
Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, applying different PINNs to solve the equation in each subdomain and aligning the solution at the interface of the subdomains. Hence, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of the multi-domain PINNs is sensitive to the choice of the interface conditions for solution alignment. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine the optimal interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit models. The first one applies to the entire training procedure, and online updates a Gaussian process (GP) reward surrogate that given the PDE parameters and interface conditions predicts the solution error. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP surrogate for each phase to enable different condition selections at the two stages so as to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
HiPhO: How Far Are (M)LLMs from Humans in the Latest High School Physics Olympiad Benchmark?
Recently, the physical capabilities of (M)LLMs have garnered increasing attention. However, existing benchmarks for physics suffer from two major gaps: they neither provide systematic and up-to-date coverage of real-world physics competitions such as physics Olympiads, nor enable direct performance comparison with humans. To bridge these gaps, we present HiPhO, the first benchmark dedicated to high school physics Olympiads with human-aligned evaluation. Specifically, HiPhO highlights three key innovations. (1) Comprehensive Data: It compiles 13 latest Olympiad exams from 2024-2025, spanning both international and regional competitions, and covering mixed modalities that encompass problems spanning text-only to diagram-based. (2) Professional Evaluation: We adopt official marking schemes to perform fine-grained grading at both the answer and step level, fully aligned with human examiners to ensure high-quality and domain-specific evaluation. (3) Comparison with Human Contestants: We assign gold, silver, and bronze medals to models based on official medal thresholds, thereby enabling direct comparison between (M)LLMs and human contestants. Our large-scale evaluation of 30 state-of-the-art (M)LLMs shows that: across 13 exams, open-source MLLMs mostly remain at or below the bronze level; open-source LLMs show promising progress with occasional golds; closed-source reasoning MLLMs can achieve 6 to 12 gold medals; and most models still have a significant gap from full marks. These results highlight a substantial performance gap between open-source models and top students, the strong physical reasoning capabilities of closed-source reasoning models, and the fact that there is still significant room for improvement. HiPhO, as a rigorous, human-aligned, and Olympiad-focused benchmark for advancing multimodal physical reasoning, is open-source and available at https://github.com/SciYu/HiPhO.
PILArNet: Public Dataset for Particle Imaging Liquid Argon Detectors in High Energy Physics
Rapid advancement of machine learning solutions has often coincided with the production of a test public data set. Such datasets reduce the largest barrier to entry for tackling a problem -- procuring data -- while also providing a benchmark to compare different solutions. Furthermore, large datasets have been used to train high-performing feature finders which are then used in new approaches to problems beyond that initially defined. In order to encourage the rapid development in the analysis of data collected using liquid argon time projection chambers, a class of particle detectors used in high energy physics experiments, we have produced the PILArNet, first 2D and 3D open dataset to be used for a couple of key analysis tasks. The initial dataset presented in this paper contains 300,000 samples simulated and recorded in three different volume sizes. The dataset is stored efficiently in sparse 2D and 3D matrix format with auxiliary information about simulated particles in the volume, and is made available for public research use. In this paper we describe the dataset, tasks, and the method used to procure the sample.
Condensed matter and AdS/CFT
I review two classes of strong coupling problems in condensed matter physics, and describe insights gained by application of the AdS/CFT correspondence. The first class concerns non-zero temperature dynamics and transport in the vicinity of quantum critical points described by relativistic field theories. I describe how relativistic structures arise in models of physical interest, present results for their quantum critical crossover functions and magneto-thermoelectric hydrodynamics. The second class concerns symmetry breaking transitions of two-dimensional systems in the presence of gapless electronic excitations at isolated points or along lines (i.e. Fermi surfaces) in the Brillouin zone. I describe the scaling structure of a recent theory of the Ising-nematic transition in metals, and discuss its possible connection to theories of Fermi surfaces obtained from simple AdS duals.
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.
Source-Guided Flow Matching
Guidance of generative models is typically achieved by modifying the probability flow vector field through the addition of a guidance field. In this paper, we instead propose the Source-Guided Flow Matching (SGFM) framework, which modifies the source distribution directly while keeping the pre-trained vector field intact. This reduces the guidance problem to a well-defined problem of sampling from the source distribution. We theoretically show that SGFM recovers the desired target distribution exactly. Furthermore, we provide bounds on the Wasserstein error for the generated distribution when using an approximate sampler of the source distribution and an approximate vector field. The key benefit of our approach is that it allows the user to flexibly choose the sampling method depending on their specific problem. To illustrate this, we systematically compare different sampling methods and discuss conditions for asymptotically exact guidance. Moreover, our framework integrates well with optimal flow matching models since the straight transport map generated by the vector field is preserved. Experimental results on synthetic 2D benchmarks, physics-informed generative tasks, and imaging inverse problems demonstrate the effectiveness and flexibility of the proposed framework.
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.
On Neural Differential Equations
The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations. NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides. This doctoral thesis provides an in-depth survey of the field. Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions). Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation). We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
Scientific problem-solving involves synthesizing information while applying expert knowledge. We introduce CURIE, a scientific long-Context Understanding,Reasoning and Information Extraction benchmark to measure the potential of Large Language Models (LLMs) in scientific problem-solving and assisting scientists in realistic workflows. This benchmark introduces ten challenging tasks with a total of 580 problems and solution pairs curated by experts in six disciplines - materials science, condensed matter physics, quantum computing, geospatial analysis, biodiversity, and proteins - covering both experimental and theoretical work-flows in science. We evaluate a range of closed and open LLMs on tasks in CURIE which requires domain expertise, comprehension of long in-context information,and multi-step reasoning. While Gemini Flash 2.0 and Claude-3 show consistent high comprehension across domains, the popular GPT-4o and command-R+ fail dramatically on protein sequencing tasks. With the best performance at 32% there is much room for improvement for all models. We hope that insights gained from CURIE can guide the future development of LLMs in sciences. Evaluation code and data are in https://github.com/google/curie
Random Grid Neural Processes for Parametric Partial Differential Equations
We introduce a new class of spatially stochastic physics and data informed deep latent models for parametric partial differential equations (PDEs) which operate through scalable variational neural processes. We achieve this by assigning probability measures to the spatial domain, which allows us to treat collocation grids probabilistically as random variables to be marginalised out. Adapting this spatial statistics view, we solve forward and inverse problems for parametric PDEs in a way that leads to the construction of Gaussian process models of solution fields. The implementation of these random grids poses a unique set of challenges for inverse physics informed deep learning frameworks and we propose a new architecture called Grid Invariant Convolutional Networks (GICNets) to overcome these challenges. We further show how to incorporate noisy data in a principled manner into our physics informed model to improve predictions for problems where data may be available but whose measurement location does not coincide with any fixed mesh or grid. The proposed method is tested on a nonlinear Poisson problem, Burgers equation, and Navier-Stokes equations, and we provide extensive numerical comparisons. We demonstrate significant computational advantages over current physics informed neural learning methods for parametric PDEs while improving the predictive capabilities and flexibility of these models.
Geometric Algebra Attention Networks for Small Point Clouds
Much of the success of deep learning is drawn from building architectures that properly respect underlying symmetry and structure in the data on which they operate - a set of considerations that have been united under the banner of geometric deep learning. Often problems in the physical sciences deal with relatively small sets of points in two- or three-dimensional space wherein translation, rotation, and permutation equivariance are important or even vital for models to be useful in practice. In this work, we present rotation- and permutation-equivariant architectures for deep learning on these small point clouds, composed of a set of products of terms from the geometric algebra and reductions over those products using an attention mechanism. The geometric algebra provides valuable mathematical structure by which to combine vector, scalar, and other types of geometric inputs in a systematic way to account for rotation invariance or covariance, while attention yields a powerful way to impose permutation equivariance. We demonstrate the usefulness of these architectures by training models to solve sample problems relevant to physics, chemistry, and biology.
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
Physics-Informed Neural Networks (PINNs), which incorporate PDEs as soft constraints, train with a composite loss function that contains multiple training point types: different types of collocation points chosen during training to enforce each PDE and initial/boundary conditions, and experimental points which are usually costly to obtain via experiments or simulations. Training PINNs using this loss function is challenging as it typically requires selecting large numbers of points of different types, each with different training dynamics. Unlike past works that focused on the selection of either collocation or experimental points, this work introduces PINN Adaptive ColLocation and Experimental points selection (PINNACLE), the first algorithm that jointly optimizes the selection of all training point types, while automatically adjusting the proportion of collocation point types as training progresses. PINNACLE uses information on the interaction among training point types, which had not been considered before, based on an analysis of PINN training dynamics via the Neural Tangent Kernel (NTK). We theoretically show that the criterion used by PINNACLE is related to the PINN generalization error, and empirically demonstrate that PINNACLE is able to outperform existing point selection methods for forward, inverse, and transfer learning problems.
FluidLab: A Differentiable Environment for Benchmarking Complex Fluid Manipulation
Humans manipulate various kinds of fluids in their everyday life: creating latte art, scooping floating objects from water, rolling an ice cream cone, etc. Using robots to augment or replace human labors in these daily settings remain as a challenging task due to the multifaceted complexities of fluids. Previous research in robotic fluid manipulation mostly consider fluids governed by an ideal, Newtonian model in simple task settings (e.g., pouring). However, the vast majority of real-world fluid systems manifest their complexities in terms of the fluid's complex material behaviors and multi-component interactions, both of which were well beyond the scope of the current literature. To evaluate robot learning algorithms on understanding and interacting with such complex fluid systems, a comprehensive virtual platform with versatile simulation capabilities and well-established tasks is needed. In this work, we introduce FluidLab, a simulation environment with a diverse set of manipulation tasks involving complex fluid dynamics. These tasks address interactions between solid and fluid as well as among multiple fluids. At the heart of our platform is a fully differentiable physics simulator, FluidEngine, providing GPU-accelerated simulations and gradient calculations for various material types and their couplings. We identify several challenges for fluid manipulation learning by evaluating a set of reinforcement learning and trajectory optimization methods on our platform. To address these challenges, we propose several domain-specific optimization schemes coupled with differentiable physics, which are empirically shown to be effective in tackling optimization problems featured by fluid system's non-convex and non-smooth properties. Furthermore, we demonstrate reasonable sim-to-real transfer by deploying optimized trajectories in real-world settings.
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Mathematical modeling is a cornerstone of scientific discovery and engineering practice, enabling the translation of real-world problems into formal systems across domains such as physics, biology, and economics. Unlike mathematical reasoning, which assumes a predefined formulation, modeling requires open-ended problem analysis, abstraction, and principled formalization. While Large Language Models (LLMs) have shown strong reasoning capabilities, they fall short in rigorous model construction, limiting their utility in real-world problem-solving. To this end, we formalize the task of LLM-powered real-world mathematical modeling, where agents must analyze problems, construct domain-appropriate formulations, and generate complete end-to-end solutions. We introduce MM-Bench, a curated benchmark of 111 problems from the Mathematical Contest in Modeling (MCM/ICM), spanning the years 2000 to 2025 and across ten diverse domains such as physics, biology, and economics. To tackle this task, we propose MM-Agent, an expert-inspired framework that decomposes mathematical modeling into four stages: open-ended problem analysis, structured model formulation, computational problem solving, and report generation. Experiments on MM-Bench show that MM-Agent significantly outperforms baseline agents, achieving an 11.88\% improvement over human expert solutions while requiring only 15 minutes and \$0.88 per task using GPT-4o. Furthermore, under official MCM/ICM protocols, MM-Agent assisted two undergraduate teams in winning the Finalist Award (top 2.0\% among 27,456 teams) in MCM/ICM 2025, demonstrating its practical effectiveness as a modeling copilot. Our code is available at https://github.com/usail-hkust/LLM-MM-Agent
On the saddle point problem for non-convex optimization
A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum. Here we argue, based on results from statistical physics, random matrix theory, and neural network theory, that a deeper and more profound difficulty originates from the proliferation of saddle points, not local minima, especially in high dimensional problems of practical interest. Such saddle points are surrounded by high error plateaus that can dramatically slow down learning, and give the illusory impression of the existence of a local minimum. Motivated by these arguments, we propose a new algorithm, the saddle-free Newton method, that can rapidly escape high dimensional saddle points, unlike gradient descent and quasi-Newton methods. We apply this algorithm to deep neural network training, and provide preliminary numerical evidence for its superior performance.
MRI Super-Resolution with Deep Learning: A Comprehensive Survey
High-resolution (HR) magnetic resonance imaging (MRI) is crucial for many clinical and research applications. However, achieving it remains costly and constrained by technical trade-offs and experimental limitations. Super-resolution (SR) presents a promising computational approach to overcome these challenges by generating HR images from more affordable low-resolution (LR) scans, potentially improving diagnostic accuracy and efficiency without requiring additional hardware. This survey reviews recent advances in MRI SR techniques, with a focus on deep learning (DL) approaches. It examines DL-based MRI SR methods from the perspectives of computer vision, computational imaging, inverse problems, and MR physics, covering theoretical foundations, architectural designs, learning strategies, benchmark datasets, and performance metrics. We propose a systematic taxonomy to categorize these methods and present an in-depth study of both established and emerging SR techniques applicable to MRI, considering unique challenges in clinical and research contexts. We also highlight open challenges and directions that the community needs to address. Additionally, we provide a collection of essential open-access resources, tools, and tutorials, available on our GitHub: https://github.com/mkhateri/Awesome-MRI-Super-Resolution. IEEE keywords: MRI, Super-Resolution, Deep Learning, Computational Imaging, Inverse Problem, Survey.
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
Building precise simulations of the real world and invoking numerical solvers to answer quantitative problems is an essential requirement in engineering and science. We present FEABench, a benchmark to evaluate the ability of large language models (LLMs) and LLM agents to simulate and solve physics, mathematics and engineering problems using finite element analysis (FEA). We introduce a comprehensive evaluation scheme to investigate the ability of LLMs to solve these problems end-to-end by reasoning over natural language problem descriptions and operating COMSOL Multiphysics^circledR, an FEA software, to compute the answers. We additionally design a language model agent equipped with the ability to interact with the software through its Application Programming Interface (API), examine its outputs and use tools to improve its solutions over multiple iterations. Our best performing strategy generates executable API calls 88% of the time. LLMs that can successfully interact with and operate FEA software to solve problems such as those in our benchmark would push the frontiers of automation in engineering. Acquiring this capability would augment LLMs' reasoning skills with the precision of numerical solvers and advance the development of autonomous systems that can tackle complex problems in the real world. The code is available at https://github.com/google/feabench
A parallel Basis Update and Galerkin Integrator for Tree Tensor Networks
Computing the numerical solution to high-dimensional tensor differential equations can lead to prohibitive computational costs and memory requirements. To reduce the memory and computational footprint, dynamical low-rank approximation (DLRA) has proven to be a promising approach. DLRA represents the solution as a low-rank tensor factorization and evolves the resulting low-rank factors in time. A central challenge in DLRA is to find time integration schemes that are robust to the arising small singular values. A robust parallel basis update & Galerkin integrator, which simultaneously evolves all low-rank factors, has recently been derived for matrix differential equations. This work extends the parallel low-rank matrix integrator to Tucker tensors and general tree tensor networks, yielding an algorithm in which all bases and connecting tensors are evolved in parallel over a time step. We formulate the algorithm, provide a robust error bound, and demonstrate the efficiency of the new integrators for problems in quantum many-body physics, uncertainty quantification, and radiative transfer.
ConDiff: A Challenging Dataset for Neural Solvers of Partial Differential Equations
We present ConDiff, a novel dataset for scientific machine learning. ConDiff focuses on the parametric diffusion equation with space dependent coefficients, a fundamental problem in many applications of partial differential equations (PDEs). The main novelty of the proposed dataset is that we consider discontinuous coefficients with high contrast. These coefficient functions are sampled from a selected set of distributions. This class of problems is not only of great academic interest, but is also the basis for describing various environmental and industrial problems. In this way, ConDiff shortens the gap with real-world problems while remaining fully synthetic and easy to use. ConDiff consists of a diverse set of diffusion equations with coefficients covering a wide range of contrast levels and heterogeneity with a measurable complexity metric for clearer comparison between different coefficient functions. We baseline ConDiff on standard deep learning models in the field of scientific machine learning. By providing a large number of problem instances, each with its own coefficient function and right-hand side, we hope to encourage the development of novel physics-based deep learning approaches, such as neural operators, ultimately driving progress towards more accurate and efficient solutions of complex PDE problems.
NUNO: A General Framework for Learning Parametric PDEs with Non-Uniform Data
The neural operator has emerged as a powerful tool in learning mappings between function spaces in PDEs. However, when faced with real-world physical data, which are often highly non-uniformly distributed, it is challenging to use mesh-based techniques such as the FFT. To address this, we introduce the Non-Uniform Neural Operator (NUNO), a comprehensive framework designed for efficient operator learning with non-uniform data. Leveraging a K-D tree-based domain decomposition, we transform non-uniform data into uniform grids while effectively controlling interpolation error, thereby paralleling the speed and accuracy of learning from non-uniform data. We conduct extensive experiments on 2D elasticity, (2+1)D channel flow, and a 3D multi-physics heatsink, which, to our knowledge, marks a novel exploration into 3D PDE problems with complex geometries. Our framework has reduced error rates by up to 60% and enhanced training speeds by 2x to 30x. The code is now available at https://github.com/thu-ml/NUNO.
Continuous control with deep reinforcement learning
We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs.
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning
Humans often use visual aids, for example diagrams or sketches, when solving complex problems. Training multimodal models to do the same, known as Visual Chain of Thought (Visual CoT), is challenging due to: (1) poor off-the-shelf visual CoT performance, which hinders reinforcement learning, and (2) the lack of high-quality visual CoT training data. We introduce Zebra-CoT, a diverse large-scale dataset with 182,384 samples, containing logically coherent interleaved text-image reasoning traces. We focus on four categories of tasks where sketching or visual reasoning is especially natural, spanning scientific questions such as geometry, physics, and algorithms; 2D visual reasoning tasks like visual search and jigsaw puzzles; 3D reasoning tasks including 3D multi-hop inference, embodied and robot planning; visual logic problems and strategic games like chess. Fine-tuning the Anole-7B model on the Zebra-CoT training corpus results in an improvement of +12% in our test-set accuracy and yields up to +13% performance gain on standard VLM benchmark evaluations. Fine-tuning Bagel-7B yields a model that generates high-quality interleaved visual reasoning chains, underscoring Zebra-CoT's effectiveness for developing multimodal reasoning abilities. We open-source our dataset and models to support development and evaluation of visual CoT.
Multiphysics Bench: Benchmarking and Investigating Scientific Machine Learning for Multiphysics PDEs
Solving partial differential equations (PDEs) with machine learning has recently attracted great attention, as PDEs are fundamental tools for modeling real-world systems that range from fundamental physical science to advanced engineering disciplines. Most real-world physical systems across various disciplines are actually involved in multiple coupled physical fields rather than a single field. However, previous machine learning studies mainly focused on solving single-field problems, but overlooked the importance and characteristics of multiphysics problems in real world. Multiphysics PDEs typically entail multiple strongly coupled variables, thereby introducing additional complexity and challenges, such as inter-field coupling. Both benchmarking and solving multiphysics problems with machine learning remain largely unexamined. To identify and address the emerging challenges in multiphysics problems, we mainly made three contributions in this work. First, we collect the first general multiphysics dataset, the Multiphysics Bench, that focuses on multiphysics PDE solving with machine learning. Multiphysics Bench is also the most comprehensive PDE dataset to date, featuring the broadest range of coupling types, the greatest diversity of PDE formulations, and the largest dataset scale. Second, we conduct the first systematic investigation on multiple representative learning-based PDE solvers, such as PINNs, FNO, DeepONet, and DiffusionPDE solvers, on multiphysics problems. Unfortunately, naively applying these existing solvers usually show very poor performance for solving multiphysics. Third, through extensive experiments and discussions, we report multiple insights and a bag of useful tricks for solving multiphysics with machine learning, motivating future directions in the study and simulation of complex, coupled physical systems.
Introduction to Holographic Superconductors
These lectures give an introduction to the theory of holographic superconductors. These are superconductors that have a dual gravitational description using gauge/gravity duality. After introducing a suitable gravitational theory, we discuss its properties in various regimes: the probe limit, the effects of backreaction, the zero temperature limit, and the addition of magnetic fields. Using the gauge/gravity dictionary, these properties reproduce many of the standard features of superconductors. Some familiarity with gauge/gravity duality is assumed. A list of open problems is included at the end.
Learning a Neural Solver for Parametric PDE to Enhance Physics-Informed Methods
Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach extends to parametric PDEs. Specifically, we integrate the physical loss gradient with PDE parameters, allowing our method to solve over a distribution of PDE parameters, including coefficients, initial conditions, and boundary conditions. We demonstrate the effectiveness of our approach through empirical experiments on multiple datasets, comparing both training and test-time optimization performance. The code is available at https://github.com/2ailesB/neural-parametric-solver.
Do generative video models learn physical principles from watching videos?
AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.
Why is AI hard and Physics simple?
We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.
Aharonov-Bohm effects on the GUP framework
Modifying the fundamental commutation relation of quantum mechanics to reflect the influence of gravity is an important approach to reconcile the contradiction between quantum field theory and general relativity. In the past two decades, researchers have conducted extensive research on geometric phase problems in non-commutative spaces, but few have mentioned the correction of geometric phase problems using the Generalized Uncertainty Principle (GUP). This paper is the first to study the phase correction of Aharonov-Bohm (AB) effect by GUP.
Five open problems in quantum information
We identify five selected open problems in the theory of quantum information, which are rather simple to formulate, were well-studied in the literature, but are technically not easy. As these problems enjoy diverse mathematical connections, they offer a huge breakthrough potential. The first four concern existence of certain objects relevant for quantum information, namely a family of symmetric informationally complete generalized measurements in an infinite sequence of dimensions, mutually unbiased bases in dimension six, absolutely maximally entangled states for four subsystems with six levels each and bound entangled states with negative partial transpose. The fifth problem requires checking whether a certain state of a two-ququart system is 2-copy distillable. An award for solving each of them is announced.
Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
Video generators are increasingly evaluated as potential world models, which requires them to encode and understand physical laws. We investigate their representation of a fundamental law: gravity. Out-of-the-box video generators consistently generate objects falling at an effectively slower acceleration. However, these physical tests are often confounded by ambiguous metric scale. We first investigate if observed physical errors are artifacts of these ambiguities (e.g., incorrect frame rate assumptions). We find that even temporal rescaling cannot correct the high-variance gravity artifacts. To rigorously isolate the underlying physical representation from these confounds, we introduce a unit-free, two-object protocol that tests the timing ratio t_1^2/t_2^2 = h_1/h_2, a relationship independent of g, focal length, and scale. This relative test reveals violations of Galileo's equivalence principle. We then demonstrate that this physical gap can be partially mitigated with targeted specialization. A lightweight low-rank adaptor fine-tuned on only 100 single-ball clips raises g_{eff} from 1.81,m/s^2 to 6.43,m/s^2 (reaching 65% of terrestrial gravity). This specialist adaptor also generalizes zero-shot to two-ball drops and inclined planes, offering initial evidence that specific physical laws can be corrected with minimal data.
Focus on conceptual ideas in quantum mechanics for teacher training
In this work, we describe strategies and provide case-study activities that can be used to examine the properties of superposition, entanglement, tagging, complementarity, and measurement in quantum curricula geared for teacher training. Having a solid foundation in these conceptual ideas is critical for educators who will be adopting quantum ideas within the classroom. Yet they are some of the most difficult concepts to master. We show how one can systematically develop these conceptual foundations with thought experiments on light and with thought experiments that employ the Stern-Gerlach experiment. We emphasize the importance of computer animations in aiding the instruction on these concepts.
Quantum algorithm for collisionless Boltzmann simulation of self-gravitating systems
The collisionless Boltzmann equation (CBE) is a fundamental equation that governs the dynamics of a broad range of astrophysical systems from space plasma to star clusters and galaxies. It is computationally expensive to integrate the CBE directly in a multi-dimensional phase space, and thus the applications to realistic astrophysical problems have been limited so far. Recently, Todorova & Steijl (2020) proposed an efficient quantum algorithm to solve the CBE with significantly reduced computational complexity. We extend the algorithm to perform quantum simulations of self-gravitating systems, incorporating the method to calculate gravity with the major Fourier modes of the density distribution extracted from the solution-encoding quantum state. Our method improves the dependency of time and space complexities on Nv , the number of grid points in each velocity coordinate, compared to the classical simulation methods. We then conduct some numerical demonstrations of our method. We first run a 1+1 dimensional test calculation of free streaming motion on 64*64 grids using 13 simulated qubits and validate our method. We then perform simulations of Jeans collapse, and compare the result with analytic and linear theory calculations. It will thus allow us to perform large-scale CBE simulations on future quantum computers.
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models
Video generation models have achieved remarkable progress in creating high-quality, photorealistic content. However, their ability to accurately simulate physical phenomena remains a critical and unresolved challenge. This paper presents PhyWorldBench, a comprehensive benchmark designed to evaluate video generation models based on their adherence to the laws of physics. The benchmark covers multiple levels of physical phenomena, ranging from fundamental principles like object motion and energy conservation to more complex scenarios involving rigid body interactions and human or animal motion. Additionally, we introduce a novel ""Anti-Physics"" category, where prompts intentionally violate real-world physics, enabling the assessment of whether models can follow such instructions while maintaining logical consistency. Besides large-scale human evaluation, we also design a simple yet effective method that could utilize current MLLM to evaluate the physics realism in a zero-shot fashion. We evaluate 12 state-of-the-art text-to-video generation models, including five open-source and five proprietary models, with a detailed comparison and analysis. we identify pivotal challenges models face in adhering to real-world physics. Through systematic testing of their outputs across 1,050 curated prompts-spanning fundamental, composite, and anti-physics scenarios-we identify pivotal challenges these models face in adhering to real-world physics. We then rigorously examine their performance on diverse physical phenomena with varying prompt types, deriving targeted recommendations for crafting prompts that enhance fidelity to physical principles.
Exact Solution of the Frustrated Potts Model with Next-Nearest-Neighbor Interactions in One Dimension: An AI-Aided Discovery
The one-dimensional J_1-J_2 q-state Potts model is solved exactly for arbitrary q, based on using OpenAI's latest reasoning model o3-mini-high to exactly solve the q=3 case. The exact results provide insights to outstanding physical problems such as the stacking of atomic or electronic orders in layered materials and the formation of a T_c-dome-shaped phase often seen in unconventional superconductors. The work is anticipated to fuel both the research in one-dimensional frustrated magnets for recently discovered finite-temperature application potentials and the fast moving topic area of AI for sciences.
A non-geometrical approach to quantum gravity
Some results of author's work in a non-geometrical approach to quantum gravity are reviewed here, among them: a quantum mechanism of classical gravity giving a possibility to compute the Newton constant; asymptotic freedom at short distances; interaction of photons with the graviton background leading to the important cosmological consequences; the time delay of photons due to interactions with gravitons; deceleration of massive bodies in the graviton background which may be connected with the Pioneer anomaly and with the problem of dark matter.
Elliptical orbits in the phase-space quantization
The energy levels of hydrogen-like atoms are obtained from the phase-space quantization, one of the pillars of the old quantum theory, by three different methods - (i) direct integration, (ii) Sommerfeld's original method, and (iii) complex integration. The difficulties come from the imposition of elliptical orbits to the electron, resulting in a variable radial component of the linear momentum. Details of the calculation, which constitute a recurrent gap in textbooks that deal with phase-space quantization, are shown in depth in an accessible fashion for students of introductory quantum mechanics courses.
Conservation Laws and the Quantization of Gravity
Adopting general frameworks for quantum-classical dynamics, we analyze the interaction between quantum matter and a classical gravitational field. We point out that, assuming conservation of momentum or energy, and assuming that the dynamics obeys Hamiltonian formalism or a particular decomposition property set out in the paper, the classical gravitational field cannot change the momentum or energy of the quantum system, whereas the quantum gravitational field can do so. Drawing upon the fundamental relationship between conservation laws and the quantum properties of objects, our analysis offers new perspectives for the study of quantum gravity and provides a novel interpretation of existing experimental observations, such as free fall.
Data augmentation and feature selection for automatic model recommendation in computational physics
Classification algorithms have recently found applications in computational physics for the selection of numerical methods or models adapted to the environment and the state of the physical system. For such classification tasks, labeled training data come from numerical simulations and generally correspond to physical fields discretized on a mesh. Three challenging difficulties arise: the lack of training data, their high dimensionality, and the non-applicability of common data augmentation techniques to physics data. This article introduces two algorithms to address these issues, one for dimensionality reduction via feature selection, and one for data augmentation. These algorithms are combined with a wide variety of classifiers for their evaluation. When combined with a stacking ensemble made of six multilayer perceptrons and a ridge logistic regression, they enable reaching an accuracy of 90% on our classification problem for nonlinear structural mechanics.
Towards Cross Domain Generalization of Hamiltonian Representation via Meta Learning
Recent advances in deep learning for physics have focused on discovering shared representations of target systems by incorporating physics priors or inductive biases into neural networks. While effective, these methods are limited to the system domain, where the type of system remains consistent and thus cannot ensure the adaptation to new, or unseen physical systems governed by different laws. For instance, a neural network trained on a mass-spring system cannot guarantee accurate predictions for the behavior of a two-body system or any other system with different physical laws. In this work, we take a significant leap forward by targeting cross domain generalization within the field of Hamiltonian dynamics. We model our system with a graph neural network and employ a meta learning algorithm to enable the model to gain experience over a distribution of tasks and make it adapt to new physics. Our approach aims to learn a unified Hamiltonian representation that is generalizable across multiple system domains, thereby overcoming the limitations of system-specific models. Our results demonstrate that the meta-trained model not only adapts effectively to new systems but also captures a generalized Hamiltonian representation that is consistent across different physical domains. Overall, through the use of meta learning, we offer a framework that achieves cross domain generalization, providing a step towards a unified model for understanding a wide array of dynamical systems via deep learning.
Completely Discretized, Finite Quantum Mechanics
I propose a version of quantum mechanics featuring a discrete and finite number of states that is plausibly a model of the real world. The model is based on standard unitary quantum theory of a closed system with a finite-dimensional Hilbert space. Given certain simple conditions on the spectrum of the Hamiltonian, Schr\"odinger evolution is periodic, and it is straightforward to replace continuous time with a discrete version, with the result that the system only visits a discrete and finite set of state vectors. The biggest challenges to the viability of such a model come from cosmological considerations. The theory may have implications for questions of mathematical realism and finitism.
Reinforcement Learning for Adaptive Time-Stepping in the Chaotic Gravitational Three-Body Problem
Many problems in astrophysics cover multiple orders of magnitude in spatial and temporal scales. While simulating systems that experience rapid changes in these conditions, it is essential to adapt the (time-) step size to capture the behavior of the system during those rapid changes and use a less accurate time step at other, less demanding, moments. We encounter three problems with traditional methods. Firstly, making such changes requires expert knowledge of the astrophysics as well as of the details of the numerical implementation. Secondly, some parameters that determine the time-step size are fixed throughout the simulation, which means that they do not adapt to the rapidly changing conditions of the problem. Lastly, we would like the choice of time-step size to balance accuracy and computation effort. We address these challenges with Reinforcement Learning by training it to select the time-step size dynamically. We use the integration of a system of three equal-mass bodies that move due to their mutual gravity as an example of its application. With our method, the selected integration parameter adapts to the specific requirements of the problem, both in terms of computation time and accuracy while eliminating the expert knowledge needed to set up these simulations. Our method produces results competitive to existing methods and improve the results found with the most commonly-used values of time-step parameter. This method can be applied to other integrators without further retraining. We show that this extrapolation works for variable time-step integrators but does not perform to the desired accuracy for fixed time-step integrators.
Deep Learning solutions to singular ordinary differential equations: from special functions to spherical accretion
Singular regular points often arise in differential equations describing physical phenomena such as fluid dynamics, electromagnetism, and gravitation. Traditional numerical techniques often fail or become unstable near these points, requiring the use of semi-analytical tools, such as series expansions and perturbative methods, in combination with numerical algorithms; or to invoke more sophisticated methods. In this work, we take an alternative route and leverage the power of machine learning to exploit Physics Informed Neural Networks (PINNs) as a modern approach to solving ordinary differential equations with singular points. PINNs utilize deep learning architectures to approximate solutions by embedding the differential equations into the loss function of the neural network. We discuss the advantages of PINNs in handling singularities, particularly their ability to bypass traditional grid-based methods and provide smooth approximations across irregular regions. Techniques for enhancing the accuracy of PINNs near singular points, such as adaptive loss weighting, are used in order to achieve high efficiency in the training of the network. We exemplify our results by studying four differential equations of interest in mathematics and gravitation -- the Legendre equation, the hypergeometric equation, the solution for black hole space-times in theories of Lorentz violating gravity, and the spherical accretion of a perfect fluid in a Schwarzschild geometry.
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions
Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in diverse reasoning tasks, yet their application to complex physics reasoning remains underexplored. Physics reasoning presents unique challenges, requiring grounding in physical conditions and the interpretation of multimodal information. Current physics benchmarks are limited, often focusing on text-only inputs or solely on problem-solving, thereby overlooking the critical intermediate steps of variable identification and process formulation. To address these limitations, we introduce PhysicsArena, the first multimodal physics reasoning benchmark designed to holistically evaluate MLLMs across three critical dimensions: variable identification, physical process formulation, and solution derivation. PhysicsArena aims to provide a comprehensive platform for assessing and advancing the multimodal physics reasoning abilities of MLLMs.
First observation of the Josephson-Anderson relation in experiments on hydrodynamic drag
We verify a recent prediction (Eq. 3.50 in G. L. Eyink, Phys. Rev. X 11, 031054 (2021)) for the drag on an object moving through a fluid. In this prediction the velocity field is decomposed into a nonvortical (potential) and vortical contribution, and so is the associated drag force. In the Josephson-Anderson relation the vortical contribution of the drag force follows from the flux of vorticity traversing the streamlines of the corresponding potential flow. The potential component is directly determined by the plate acceleration and its added mass. The Josephson-Anderson relation is derived from the quantum description of superfluids, but remarkably applies to the classical fluid in our experiment. In our experiment a flat plate is accelerated through water using a robotic arm. This geometry is simple enough to allow analytic potential flow streamlines. The monitored plate position shows an oscillatory component of the acceleration, which adds an additional test of the Josephson-Anderson relation. The instantaneous velocity field is measured using particle image velocimetry. It enables us to evaluate Eq. 3.50 from [1] and compare its prediction to the measured drag force. We find excellent agreement, and, most remarkably find that the added mass contribution to the drag force still stands out after the flow has turned vortical. We finally comment on the requirements on the experimental techniques for evaluating the Josephson-Anderson relation.
Learning towards Minimum Hyperspherical Energy
Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications. While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost. As a result, how to regularize the network to avoid undesired representation redundancy becomes an important issue. To this end, we draw inspiration from a well-known problem in physics -- Thomson problem, where one seeks to find a state that distributes N electrons on a unit sphere as evenly as possible with minimum potential energy. In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks. We also propose a few novel variants of MHE, and provide some insights from a theoretical point of view. Finally, we apply neural networks with MHE regularization to several challenging tasks. Extensive experiments demonstrate the effectiveness of our intuition, by showing the superior performance with MHE regularization.
The Duality of Whittaker Potential Theory: Fundamental Representations of Electromagnetism and Gravity, and Their Orthogonality
E. T. Whittaker produced two papers in 1903 and 1904 that, although sometimes considered mere mathematical statements (Barrett, 1993), held important implications for physical theory. The Whittaker 1903 paper united electrostatic and gravitational attraction as resulting from longitudinal waves - waves whose wavefronts propagate parallel to their direction. The Whittaker 1904 paper showed that electromagnetic waves resulted from the interference of two such longitudinal waves or scalar potential functions. Although unexplored, the implications of these papers are profound: gravitational lensing, gravitational waves, the Aharonov-Bohm effect, the existence of a hyperspace above or behind normal space, the elimination of gravitational and point charge singularities, MOND, and the expansion of the universe. This last implication can be related to the recent finding that black holes with posited vacuum energy interior solutions alongside cosmological boundaries have a cosmological coupling constant of k=3, meaning that black holes gain mass-proportional to a3 in a parameterization equation within a Robertson-Walker cosmology and are a cosmological accelerated expansion species (Farrah et al., 2023). This expansion and many features of General Relativity can be explained by the mass-proportionality and preferred direction of the longitudinal waves within the two underlying non-local Whittaker potentials (Titleman, 2022). Whittaker potential theory also offers a simple explanation for expansion of the universe - it is produced as longitudinal motion within the Whittaker potentials only when dynamic electromagnetism is separate from time-static gravity in intergalactic space.
Lectures on holographic methods for condensed matter physics
These notes are loosely based on lectures given at the CERN Winter School on Supergravity, Strings and Gauge theories, February 2009 and at the IPM String School in Tehran, April 2009. I have focused on a few concrete topics and also on addressing questions that have arisen repeatedly. Background condensed matter physics material is included as motivation and easy reference for the high energy physics community. The discussion of holographic techniques progresses from equilibrium, to transport and to superconductivity.
An operator preconditioning perspective on training in physics-informed machine learning
In this paper, we investigate the behavior of gradient descent algorithms in physics-informed machine learning methods like PINNs, which minimize residuals connected to partial differential equations (PDEs). Our key result is that the difficulty in training these models is closely related to the conditioning of a specific differential operator. This operator, in turn, is associated to the Hermitian square of the differential operator of the underlying PDE. If this operator is ill-conditioned, it results in slow or infeasible training. Therefore, preconditioning this operator is crucial. We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
Seeing the Wind from a Falling Leaf
A longstanding goal in computer vision is to model motions from videos, while the representations behind motions, i.e. the invisible physical interactions that cause objects to deform and move, remain largely unexplored. In this paper, we study how to recover the invisible forces from visual observations, e.g., estimating the wind field by observing a leaf falling to the ground. Our key innovation is an end-to-end differentiable inverse graphics framework, which jointly models object geometry, physical properties, and interactions directly from videos. Through backpropagation, our approach enables the recovery of force representations from object motions. We validate our method on both synthetic and real-world scenarios, and the results demonstrate its ability to infer plausible force fields from videos. Furthermore, we show the potential applications of our approach, including physics-based video generation and editing. We hope our approach sheds light on understanding and modeling the physical process behind pixels, bridging the gap between vision and physics. Please check more video results in our https://chaoren2357.github.io/seeingthewind/{project page}.
General-relativistic resistive-magnetohydrodynamics simulations of self-consistent magnetized rotating neutron stars
We present the first general-relativistic resistive magnetohydrodynamics simulations of self-consistent, rotating neutron stars with mixed poloidal and toroidal magnetic fields. Specifically, we investigate the role of resistivity in the dynamical evolution of neutron stars over a period of up to 100 ms and its effects on their quasi-equilibrium configurations. Our results demonstrate that resistivity can significantly influence the development of magnetohydrodynamic instabilities, resulting in markedly different magnetic field geometries. Additionally, resistivity suppresses the growth of these instabilities, leading to a reduction in the amplitude of emitted gravitational waves. Despite the variations in magnetic field geometries, the ratio of poloidal to toroidal field energies remains consistently 9:1 throughout the simulations, for the models we investigated.
Physics-Integrated Variational Autoencoders for Robust and Interpretable Generative Modeling
Integrating physics models within machine learning models holds considerable promise toward learning robust models with improved interpretability and abilities to extrapolate. In this work, we focus on the integration of incomplete physics models into deep generative models. In particular, we introduce an architecture of variational autoencoders (VAEs) in which a part of the latent space is grounded by physics. A key technical challenge is to strike a balance between the incomplete physics and trainable components such as neural networks for ensuring that the physics part is used in a meaningful manner. To this end, we propose a regularized learning method that controls the effect of the trainable components and preserves the semantics of the physics-based latent variables as intended. We not only demonstrate generative performance improvements over a set of synthetic and real-world datasets, but we also show that we learn robust models that can consistently extrapolate beyond the training distribution in a meaningful manner. Moreover, we show that we can control the generative process in an interpretable manner.
Unsteady and inertial dynamics of an active particle in a fluid
It is well known that the reversibility of Stokes flow makes it difficult for small microorganisms to swim. Inertial effects break this reversibility, allowing new mechanisms of propulsion and feeding. Therefore it is important to understand the effects of unsteady and fluid inertia on the dynamics of microorganisms in flow. In this work, we show how to translate known inertial effects for non-motile organisms to motile ones, from passive to active particles. The method relies on a principle used earlier by Legendre and Magnaudet (1997) to deduce inertial corrections to the lift force on a bubble from the inertial drag on a solid sphere, using the fact that small inertial effects are determined by the far field of the disturbance flow. The method allows for example to compute the inertial effect of unsteady fluid accelerations on motile organisms, and the inertial forces such organisms experience in steady shear flow. We explain why the method fails to describe the effect of convective fluid inertia.
Note: Stokes-Einstein relation without hydrodynamic diameter in the TIP4P/Ice water model
It is demonstrated that self-diffusion and shear viscosity data for the TIP4P/Ice water model reported recently [L. Baran, W. Rzysko and L. MacDowell, J. Chem. Phys. {\bf 158}, 064503 (2023)] obey the microscopic version of the Stokes-Einstein relation without the hydrodynamic diameter.
New type of solutions for a critical Grushin-type problem with competing potentials
In this paper, we consider a critical Grushin-type problem with double potentials. By applying the reduction argument and local Pohozaev identities, we construct a new family of solutions to this problem, which are concentrated at points lying on the top and the bottom circles of a cylinder.
Dynamic processes in superconductors and the laws of thermodynamics
The transition from the superconducting to the normal state in a magnetic field was considered as a irreversible thermodynamic process before 1933 because of Joule heating. But all physicists became to consider this transition as reversible after 1933 because of the obvious contradiction of the Meissner effect with the second law of thermodynamics if this transition is considered as a irreversible process. This radical change of the opinion contradicted logic since the dissipation of the kinetic energy of the surface screening current into Joule heat in the normal state cannot depend on how this current appeared in the superconducting state. The inconsistency of the conventional theory of superconductivity, created in the framework of the equilibrium thermodynamics, with Joule heating, on which Jorge Hirsch draws reader's attention, is a consequence of this history. In order to avoid contradiction with the second law of thermodynamics, physicists postulated in the thirties of the last century that the surface screening current is damped without the generation of Joule heat. This postulate contradicts not only logic and the conventional theory of superconductivity but also experimental results.
Unbalanced Stückelberg Holographic Superconductors with Backreaction
We numerically investigate some properties of unbalanced St\"{u}ckelberg holographic superconductors, by considering backreaction effects of fields on the background geometry. More precisely, we study the impacts of the chemical potential mismatch and St\"{u}ckelberg mechanism on the condensation and conductivity types (electrical, spin, mixed, thermo-electric, thermo-spin and thermal conductivity). Our results show that the St\"{u}ckelberg's model parameters C_{alpha} and alpha not only have significant impacts on the phase transition, but also affect the conductivity pseudo-gap and the strength of conductivity fluctuations. Moreover, the effects of these parameters on a system will be gradually reduced as the imbalance grows. We also find that the influence of alpha on the amplitude of conductivity fluctuations depends on the magnitude of the both C_{alpha} and deltamu/mu in the electric and thermal conductivity cases. This results in that increasing alpha can damp the conductivity fluctuations of an unbalanced system in contrast to balanced ones.
