基本信息

姓名:苏剑林

生日:1993年X月Y日

硕士:中山大学数学学院

本科:华南师范大学数学科学学院

坐标:广东广州

老家:广东云浮

爱好:阅读、研究、折腾

偶像:Richard Feynman

邮箱bojone@spaces.ac.cn

主页https://jianlin.su

微博https://weibo.com/bojone

推特https://x.com/Jianlin_S

代码https://github.com/bojone

学术Google Scholar

东扯西犊

中山大学基础数学研究生,本科为华南师范大学。93年从奥尔特星云移民地球,因忘记回家路线,遂仰望星空,希望找到时空之路。

兼爱各种科学,热衷钻牛角尖,因此经常碰壁,但偶然把牛角钻穿,也乐在其中。偏爱物理、天文、计算机,喜欢思考,企图打开科学的果壳。虽擅长理性分析,但也容易感情用事,崇拜Feynman。闲时无聊读金庸附庸风雅,没事偷懒玩象棋适情雅趣,时时兴起焖炖煮开水白菜,偶尔手痒也开开数据挖掘机仰望蓝翔。

明明要学基础数学,偏偏不务正业,沉溺神经网络,妄想人工智能,未曾在ACL、AAAI、CVPR、ICLR等发表多篇文章。目前专注于自然语言处理,企图破解语言奥秘。爱好写作,经常在博客天方夜谭,幸未被读者嫌弃。现科学空间(https://kexue.fm)恭候各位大驾光临,非诚亦可扰。

微言微语

  • 2026-04-11 23:59

    mark:知乎关注9万。

  • 2026-04-06 01:33

    步入中年,清明祭祖的意义已经不再是“拜太公”,而是一年一度看望一下那些曾经相熟的亲人。多年以后,那些曾经朝夕相伴的人都将化作一抔黃土葬于山间,一年也就只能相见一次,每每设想到此事,总是情不能自已。

  • 2026-03-27 17:02

    推荐论文:

    Attention Sinks Induce Gradient Sinks

    Beyond Test-Time Training_ Learning to Reason via Hardware-Efficient Optimal Control

    ByteFlow_ Language Modeling through Adaptive Byte Compression without a Tokenizer

    Column Normalization and Hyperparameter Transfer

    Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

    Lost in the Middle at Birth_ An Exact Theory of Transformer Position Bias

    On the Role of Batch Size in Stochastic Conditional Gradient Methods

    Scale Space Diffusion

    Spectral Condition for μP under Width-Depth Scaling

    Test-Time Training with KV Binding Is Secretly Linear Attention

    Why Adam Can Beat SGD_ Second-Moment Normalization Yields Sharper Tails

    https://papers.cool/arxiv/2603.17771,2603.09221,2603.03583,2603.09952,2603.15958,2603.10123,2603.21191,2603.08709,2603.00541,2602.21204,2603.03099

  • 2026-02-24 21:43

    推荐论文:

    Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent

    ARO_ A New Lens On Matrix Optimization For Large Models

    Convex Dominance in Deep Learning I_ A Scaling Law of Loss and Learning Rate

    Fast Catch-Up, Late Switching_ Optimal Batch Size Scheduling via Functional Scaling Laws

    Generative Modeling via Drifting

    Improving Flow Matching by Aligning Flow Divergence

    Power-SMC_ Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning

    PRISM_ Structured Optimization via Anisotropic Spectral Shaping

    Rotary Positional Embeddings as Phase Modulation_ Theoretical Bounds on the RoPE Base for Long-Context Transformers

    Stabilizing Consistency Training_ A Flow Map Analysis and Self-Distillation

    TEON_ Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training

    The Effect of Mini-Batch Noise on the Implicit Bias of Adam

    https://papers.cool/arxiv/2602.03001,2602.09006,2602.07145,2602.14208,2602.04770,2602.00869,2602.10273,2602.03096,2602.10959,2601.22679,2601.23261,2602.01642

  • 2026-01-31 15:46

    推荐论文:

    ECO_ Quantized Training without Full-Precision Master Weights

    How Do Transformers Learn to Associate Tokens_ Gradient Leading Terms Bring Mechanistic Interpretability

    Improved Convergence Rates of Muon Optimizer for Nonconvex Optimization

    In Search of Adam's Secret Sauce

    L3_ Large Lookup Layers

    Manifold constrained steepest descent

    R2k is Theoretically Large Enough for Embedding-based Top-k Retrieval

    Scalable Power Sampling_ Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening

    The Depth Delusion_ Why Transformers Should Be Wider, Not Deeper

    Why Adam Works Better with β1=β2- The Missing Gradient Scale Invariance Principle

    https://papers.cool/arxiv/2601.22101,2601.19208,2601.19400,2505.21829,2601.21461,2601.21487,2601.20844,2601.21590,2601.20994,2601.21739

  • 2026-01-18 21:47

    推荐论文:

    Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

    Let's (not) just put things in Context_ Test-Time Training for Long-Context LLMs

    Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD

    Sliding Window Recurrences for Sequence Models

    The Affine Divergence_ Aligning Activation Updates Beyond

    https://papers.cool/arxiv/2512.22382,2512.13898,2601.01465,2512.13921,2512.22247

  • 2025-12-13 23:42

    推荐论文:

    Dimension-free error estimate for diffusion model and optimal scheduling

    Equivalence of Context and Parameter Updates in Modern Transformer Blocks

    Optimizing Optimizers for Fast Gradient-Based Learning

    Provable Benefit of Sign Descent_ A Minimal Model Under Heavy-Tailed Class Imbalance

    When do spectral gradient updates help in deep learning?

    https://papers.cool/arxiv/2512.01820,2511.17864,2512.06370,2512.00763,2512.04299

  • 2025-11-19 01:16

    推荐论文:

    An Exploration of Non-Euclidean Gradient Descent_ Muon and its Many Variants

    Back to Basics_ Let Denoising Generative Models Denoise

    Branching Flows_ Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions

    Decoupling Positional and Symbolic Attention Behavior in Transformers

    How Memory in Optimization Algorithms Implicitly Modifies the Loss

    Isotropic Curvature Model for Understanding Deep Learning Optimization_ Is Gradient Orthogonalization Optimal?

    L2M_ Mutual Information Scaling Law for Long-Context Language Modeling

    Larger Datasets Can Be Repeated More_ A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

    On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication

    On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling

    Scaling Laws and In-Context Learning_ A Unified Theoretical Framework

    https://papers.cool/arxiv/2510.09827,2511.13720,2511.09465,2511.11579,2502.02132,2511.00674,2503.04725,2511.13421,2511.00025,2505.22491,2511.06232

  • 2025-11-13 13:00

    妈妈“现在的孩子怎么这么多病呢”
    儿子“不是病多了,是能治的病多了,搁以前都是夭折的”

    好精辟的回答,受教了!

    来源:https://www.zhihu.com/question/1926923396882621109/answer/1970943451643224638 评论区

  • 2025-11-10 17:06

    入职两年多,第二次到北京总部。

部分工作

title: Variational Inference: A Unified Framework of Generative Models and Some Revelations
author: Su Jianlin
journal: arXiv preprint arXiv:1807.05936
year: 2018

title: Using deep Residual Networks to search for galaxy-Ly $\alpha$ emitter lens candidates based on spectroscopic selection
author: Li Rui; Shu Yiping; Su Jianlin; Feng Haicheng; Zhang Guobao; Wang Jiancheng; Liu Hongtao
journal: Monthly Notices of the Royal Astronomical Society
volume: 482
number: 1
pages: 313--320
year: 2018
publisher: Oxford University Press

title: f-VAEs: Improve VAEs with Conditional Flows
author: Su Jianlin; Wu Guang
journal: arXiv preprint arXiv:1809.05861
year: 2018

title: Training Generative Adversarial Networks Via Turing Test
author: Su Jianlin
journal: arXiv preprint arXiv:1810.10948
year: 2018

title: Gan-qp: A novel gan framework without gradient vanishing and lipschitz constraint
author: Su Jianlin
journal: arXiv preprint arXiv:1811.07296
year: 2018

title: Evaluating Generalization Ability of Convolutional Neural Networks and Capsule Networks for Image Classification via Top-2 Classification
author: Ren Hao; Su Jianlin; Lu Hong
journal: arXiv preprint arXiv:1901.10112
year: 2019

title: Artist Style Transfer Via Quadratic Potential
author: Bhalley Rahul; Su Jianlin
journal: arXiv preprint arXiv:1902.11108
year: 2019

title: O-GAN: Extremely Concise Approach for Auto-Encoding Generative Adversarial Networks
author: Su Jianlin
journal: arXiv preprint arXiv:1903.01931
year: 2019

title: Rectified Exponential Units for Convolutional Neural Networks
author: Ying Yao; Su Jianlin; Shan Peng; Miao Ligang; Wang Xiaolian; Peng Silong
journal: IEEE Access
year: 2019
publisher: IEEE

title: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
author: Zhepei Wei; Jianlin Su; Yue Wang; Yuan Tian; Yi Chang
journal: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
year: 2020
publisher: ACL

title: Whitening Sentence Representations for Better Semantics and Faster Retrieval
author: Jianlin Su; Jiarun Cao; Weijie Liu; Yangyiwen Ou
journal: arXiv preprint arXiv:2103.15316
year: 2021

title: RoFormer: Enhanced Transformer with Rotary Position Embedding
author: Jianlin Su; Yu Lu; Shengfeng Pan; Bo Wen; Yunfeng Liu
journal: arXiv preprint arXiv:2104.09864
year: 2021

往事如烟

苏剑林,今年(2009)正好16岁,居住在广东省云浮市的一个小村庄。

我从小就对科学感兴趣,数学是我的强项,不过到了初三,还要加上一个“化学”。

我从2006.09开始接触电脑,而接触网络的时间就是2007.01,想想看,发展还是挺快的(接触电脑之前我可是一无所知)。2007.04接触到了BBS,后来曾经自行建立过IT类的BBS,后来因为IT而疏远了科学。到了2008.09以后,我开始重新专注科学,于是在努力下,便诞生了这个Blog。

现在(2012年7月)我已经是高中毕业了。经历了很多事情,也成熟了很多,自我感觉我更懂得珍惜了,也有了各种各样喜欢的东西。以前我的很内向、腼腆,现在相对来说开朗了很多,也懂得和朋友们一起闹、一起疯了。当然,我对科学的激情有增无减,但是兴趣方面有所变化。数学依然是我的核心,我爱好物理,陶醉于天文,之前的化学、生物于我而言成为了业余的兴趣了。^_^愿在科学空间一直和各位读者分享我的科学人生。

目前(2018年1月)中山大学研究生二年级,专业是基础数学(方向为生物应用数学),但花了较多时间在机器学习相关(尤其是自然语言处理)方面。各种东西都想学,都想弄清楚,无奈心有余而力不足~加油吧,再前进一点点。

如今(2019年7月)总算顺利毕业了,彻底入坑了机器学习。目前在追一科技的机器学习算法部门打杂~

(未完,但别待续了吧~)