Add support for GPT OSS

xenova · xenova · commit cb6c8b13cbc2 · 2025-12-08T16:51:38.000-05:00
diff --git a/README.md b/README.md
@@ -320,6 +320,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
 1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://huggingface.co/papers/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
 1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://huggingface.co/papers/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
+1. **[GPT OSS](https://huggingface.co/docs/transformers/model_doc/gpt_oss)** (from OpenAI) released with the blog [Introducing gpt-oss](https://openai.com/index/introducing-gpt-oss/) by Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, Casey Dvorak, Kevin Fives, Vlad Fomenko, Timur Garipov, Kristian Georgiev, Mia Glaese, Tarun Gogineni, Adam Goucher, Lukas Gross, Katia Gil Guzman, John Hallman, Jackie Hehir, Johannes Heidecke, Alec Helyar, Haitang Hu, Romain Huet, Jacob Huh, Saachi Jain, Zach Johnson, Chris Koch, Irina Kofman, Dominik Kundel, Jason Kwon, Volodymyr Kyrylov, Elaine Ya Le, Guillaume Leclerc, James Park Lennon, Scott Lessans, Mario Lezcano-Casado, Yuanzhi Li, Zhuohan Li, Ji Lin, Jordan Liss, Lily (Xiaoxuan) Liu, Jiancheng Liu, Kevin Lu, Chris Lu, Zoran Martinovic, Lindsay McCallum, Josh McGrath, Scott McKinney, Aidan McLaughlin, Song Mei, Steve Mostovoy, Tong Mu, Gideon Myles, Alexander Neitz, Alex Nichol, Jakub Pachocki, Alex Paino, Dana Palmie, Ashley Pantuliano, Giambattista Parascandolo, Jongsoo Park, Leher Pathak, Carolina Paz, Ludovic Peran, Dmitry Pimenov, Michelle Pokrass, Elizabeth Proehl, Huida Qiu, Gaby Raila, Filippo Raso, Hongyu Ren, Kimmy Richardson, David Robinson, Bob Rotsted, Hadi Salman, Suvansh Sanjeev, Max Schwarzer, D. Sculley, Harshit Sikchi, Kendal Simon, Karan Singhal, Yang Song, Dane Stuckey, Zhiqing Sun, Philippe Tillet, Sam Toizer, Foivos Tsimpourlas, Nikhil Vyas, Eric Wallace, Xin Wang, Miles Wang, Olivia Watkins, Kevin Weil, Amy Wendling, Kevin Whinnery, Cedric Whitney, Hannah Wong, Lin Yang, Yu Yang, Michihiro Yasunaga, Kristen Ying, Wojciech Zaremba, Wenting Zhan, Cyril Zhang, Brian Zhang, Eddie Zhang, Shengjia Zhao.
 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
 1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
diff --git a/docs/snippets/5_supported-models.snippet b/docs/snippets/5_supported-models.snippet
@@ -55,6 +55,7 @@
 1. **[GLPN](https://huggingface.co/docs/transformers/model_doc/glpn)** (from KAIST) released with the paper [Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth](https://huggingface.co/papers/2201.07436) by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim.
 1. **[GPT Neo](https://huggingface.co/docs/transformers/model_doc/gpt_neo)** (from EleutherAI) released in the repository [EleutherAI/gpt-neo](https://github.com/EleutherAI/gpt-neo) by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
 1. **[GPT NeoX](https://huggingface.co/docs/transformers/model_doc/gpt_neox)** (from EleutherAI) released with the paper [GPT-NeoX-20B: An Open-Source Autoregressive Language Model](https://huggingface.co/papers/2204.06745) by Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, Samuel Weinbach
+1. **[GPT OSS](https://huggingface.co/docs/transformers/model_doc/gpt_oss)** (from OpenAI) released with the blog [Introducing gpt-oss](https://openai.com/index/introducing-gpt-oss/) by Sandhini Agarwal, Lama Ahmad, Jason Ai, Sam Altman, Andy Applebaum, Edwin Arbus, Rahul K. Arora, Yu Bai, Bowen Baker, Haiming Bao, Boaz Barak, Ally Bennett, Tyler Bertao, Nivedita Brett, Eugene Brevdo, Greg Brockman, Sebastien Bubeck, Che Chang, Kai Chen, Mark Chen, Enoch Cheung, Aidan Clark, Dan Cook, Marat Dukhan, Casey Dvorak, Kevin Fives, Vlad Fomenko, Timur Garipov, Kristian Georgiev, Mia Glaese, Tarun Gogineni, Adam Goucher, Lukas Gross, Katia Gil Guzman, John Hallman, Jackie Hehir, Johannes Heidecke, Alec Helyar, Haitang Hu, Romain Huet, Jacob Huh, Saachi Jain, Zach Johnson, Chris Koch, Irina Kofman, Dominik Kundel, Jason Kwon, Volodymyr Kyrylov, Elaine Ya Le, Guillaume Leclerc, James Park Lennon, Scott Lessans, Mario Lezcano-Casado, Yuanzhi Li, Zhuohan Li, Ji Lin, Jordan Liss, Lily (Xiaoxuan) Liu, Jiancheng Liu, Kevin Lu, Chris Lu, Zoran Martinovic, Lindsay McCallum, Josh McGrath, Scott McKinney, Aidan McLaughlin, Song Mei, Steve Mostovoy, Tong Mu, Gideon Myles, Alexander Neitz, Alex Nichol, Jakub Pachocki, Alex Paino, Dana Palmie, Ashley Pantuliano, Giambattista Parascandolo, Jongsoo Park, Leher Pathak, Carolina Paz, Ludovic Peran, Dmitry Pimenov, Michelle Pokrass, Elizabeth Proehl, Huida Qiu, Gaby Raila, Filippo Raso, Hongyu Ren, Kimmy Richardson, David Robinson, Bob Rotsted, Hadi Salman, Suvansh Sanjeev, Max Schwarzer, D. Sculley, Harshit Sikchi, Kendal Simon, Karan Singhal, Yang Song, Dane Stuckey, Zhiqing Sun, Philippe Tillet, Sam Toizer, Foivos Tsimpourlas, Nikhil Vyas, Eric Wallace, Xin Wang, Miles Wang, Olivia Watkins, Kevin Weil, Amy Wendling, Kevin Whinnery, Cedric Whitney, Hannah Wong, Lin Yang, Yu Yang, Michihiro Yasunaga, Kristen Ying, Wojciech Zaremba, Wenting Zhan, Cyril Zhang, Brian Zhang, Eddie Zhang, Shengjia Zhao.
 1. **[GPT-2](https://huggingface.co/docs/transformers/model_doc/gpt2)** (from OpenAI) released with the paper [Language Models are Unsupervised Multitask Learners](https://blog.openai.com/better-language-models/) by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
 1. **[GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj)** (from EleutherAI) released in the repository [kingoflolz/mesh-transformer-jax](https://github.com/kingoflolz/mesh-transformer-jax/) by Ben Wang and Aran Komatsuzaki.
 1. **[GPTBigCode](https://huggingface.co/docs/transformers/model_doc/gpt_bigcode)** (from BigCode) released with the paper [SantaCoder: don't reach for the stars!](https://huggingface.co/papers/2301.03988) by Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra.
diff --git a/src/configs.js b/src/configs.js
@@ -108,6 +108,7 @@ function getNormalizedConfig(config) {
             mapping['num_layers'] = 'num_hidden_layers';
             mapping['hidden_size'] = 'hidden_size';
             break;
+        case 'gpt_oss':
         case 'llama':
         case 'llama4_text':
         case 'nanochat':
diff --git a/src/models.js b/src/models.js
@@ -2051,7 +2051,10 @@ export class PreTrainedModel extends Callable {
             // In most cases, this will be [batch_size, 1, vocab_size]
             // So, we select the last token's logits:
             // (equivalent to `logits = outputs.logits[:, -1, :]`)
-            const logits = outputs.logits.slice(null, -1, null);
+            // The `.to('float32')` is necessary for models with float16 logits,
+            // and is a no-op for float32 logits.
+            // TODO: Support float16 sampling in the sampler directly
+            const logits = outputs.logits.slice(null, -1, null).to('float32');
 
             const next_tokens_scores = prepared_logits_processor(all_input_ids, logits);
 
@@ -4676,6 +4679,15 @@ export class GPT2LMHeadModel extends GPT2PreTrainedModel {}
 // }
 //////////////////////////////////////////////////
 
+
+//////////////////////////////////////////////////
+// GPT OSS models
+export class GptOssPreTrainedModel extends PreTrainedModel {}
+export class GptOssModel extends GptOssPreTrainedModel {}
+export class GptOssForCausalLM extends GptOssPreTrainedModel {}
+//////////////////////////////////////////////////
+
+
 //////////////////////////////////////////////////
 // JAIS models
 export class JAISPreTrainedModel extends PreTrainedModel {}
@@ -8267,6 +8279,7 @@ const MODEL_MAPPING_NAMES_DECODER_ONLY = new Map([
     ['bloom', ['BloomModel', BloomModel]],
     ['jais', ['JAISModel', JAISModel]],
     ['gpt2', ['GPT2Model', GPT2Model]],
+    ['gpt_oss', ['GptOssModel', GptOssModel]],
     ['gptj', ['GPTJModel', GPTJModel]],
     ['gpt_bigcode', ['GPTBigCodeModel', GPTBigCodeModel]],
     ['gpt_neo', ['GPTNeoModel', GPTNeoModel]],
@@ -8380,6 +8393,7 @@ const MODEL_FOR_SEQ_TO_SEQ_CAUSAL_LM_MAPPING_NAMES = new Map([
 const MODEL_FOR_CAUSAL_LM_MAPPING_NAMES = new Map([
     ['bloom', ['BloomForCausalLM', BloomForCausalLM]],
     ['gpt2', ['GPT2LMHeadModel', GPT2LMHeadModel]],
+    ['gpt_oss', ['GptOssForCausalLM', GptOssForCausalLM]],
     ['jais', ['JAISLMHeadModel', JAISLMHeadModel]],
     ['gptj', ['GPTJForCausalLM', GPTJForCausalLM]],
     ['gpt_bigcode', ['GPTBigCodeForCausalLM', GPTBigCodeForCausalLM]],
diff --git a/src/utils/maths.js b/src/utils/maths.js
@@ -1044,3 +1044,61 @@ export function dynamic_time_warping(matrix) {
 
     return [text_indices, time_indices];
 }
+
+/**
+ * Efficiently converts a Uint16Array of float16 values to a Float32Array.
+ * This implementation uses a lazily initialized lookup table (LUT) for fast conversion.
+ */
+export const uint16_to_float32 = (function () {
+    let float16LUT = null; // The Lookup Table
+
+    return function (/** @type {Uint16Array} */ u16Array) {
+        if (!float16LUT) {
+            // Lazily initialize LUT
+            float16LUT = new Float32Array(65536);
+            const buffer = new ArrayBuffer(4);
+            const u32 = new Uint32Array(buffer);
+            const f32 = new Float32Array(buffer);
+
+            for (let i = 0; i < float16LUT.length; ++i) {
+                let outBits = 0;
+                const sign = (i & 0x8000) << 16;
+                const exp = (i & 0x7c00) >> 10;
+                let mantissa = i & 0x03ff;
+
+                if (exp === 0x1f) {
+                    // Infinity or NaN
+                    outBits = sign | 0x7f800000 | (mantissa << 13);
+                } else if (exp === 0) {
+                    // Zero or Subnormal
+                    if (mantissa === 0) {
+                        outBits = sign;
+                    } else {
+                        let renormExp = 113;
+                        while ((mantissa & 0x0400) === 0) {
+                            mantissa <<= 1;
+                            --renormExp;
+                        }
+                        mantissa &= ~0x0400;
+                        outBits = sign | (renormExp << 23) | (mantissa << 13);
+                    }
+                } else {
+                    // Normal
+                    outBits = sign | ((exp + 112) << 23) | (mantissa << 13);
+                }
+
+                u32[0] = outBits;
+                float16LUT[i] = f32[0];
+            }
+        }
+
+        const length = u16Array.length;
+        const lut = float16LUT;
+        const out = new Float32Array(length);
+        for (let i = 0; i < length; ++i) {
+            out[i] = lut[u16Array[i]];
+        }
+
+        return out;
+    };
+})();
diff --git a/src/utils/tensor.js b/src/utils/tensor.js
@@ -7,7 +7,7 @@
  * @module utils/tensor
  */
 
-import { interpolate_data, max, min, permute_data } from './maths.js';
+import { interpolate_data, max, min, permute_data, uint16_to_float32 } from './maths.js';
 
 import { Tensor as ONNXTensor, isONNXTensor } from '../backends/onnx.js';
 
@@ -835,6 +835,9 @@ export class Tensor {
             } else {
                 map_fn = BigInt;
             }
+        } else if (this.type === 'float16' && type == 'float32' && this.data instanceof Uint16Array) {
+            // Certain runtimes do not support Float16Array, so the values are stored in Uint16Array
+            return new Tensor(type, uint16_to_float32(this.data), this.dims);
         }
 
         // @ts-ignore

Original file line number	Diff line number	Diff line change
`@@ -7,7 +7,7 @@`
`7`	`7`	`* @module utils/tensor`
`8`	`8`	`*/`
`9`	`9`
`10`		`-import { interpolate_data, max, min, permute_data } from './maths.js';`
	`10`	`+import { interpolate_data, max, min, permute_data, uint16_to_float32 } from './maths.js';`
`11`	`11`
`12`	`12`	`import { Tensor as ONNXTensor, isONNXTensor } from '../backends/onnx.js';`
`13`	`13`
`@@ -835,6 +835,9 @@ export class Tensor {`
`835`	`835`	`} else {`
`836`	`836`	`map_fn = BigInt;`
`837`	`837`	`}`
	`838`	`+ } else if (this.type === 'float16' && type == 'float32' && this.data instanceof Uint16Array) {`
	`839`	`+ // Certain runtimes do not support Float16Array, so the values are stored in Uint16Array`
	`840`	`+ return new Tensor(type, uint16_to_float32(this.data), this.dims);`
`838`	`841`	`}`
`839`	`842`
`840`	`843`	`// @ts-ignore`