Skip to content

XGenerationLab/XiYan-SQL

Repository files navigation

image

image

中文版 | 英文版

XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL

Here is the new official Alibaba repository for XiYan-SQL. We will currently maintain synchronization between these two addresses.

News🔥

  • Nov. 21, 2025 🌟 New SOTA on BIRD-CRITIC-Open: XiYan-SQL-CRITIC technique has achieved an impressive 44.37% success rate on BIRD-CRITIC-Open, a highly challenging multi-dialect benchmark, securing the top position with SOTA performance!

  • Oct. 30, 2025 🌟 We are excited to release the XiYan-SQL training framework XiYan-SQLTraining! This framework is primarily designed for the training of SQL/general LLMs and includes capabilities such as SQL data processing, model training, and evaluation as proposed by XiYan. We will continue to enhance the framework in the future.

  • Oct. 20, 2025 🌟 New SOTA on BIRD-CRITIC: XiYan-SQL-CRITIC technique has achieved a remarkable 44.53% success rate on the BIRD-CRITIC-PG benchmark, securing the top position with SOTA performance! Additionally, it recorded an impressive 48.5% success rate on the BIRD-CRITIC-Flash benchmark, also establishing a new SOTA performance.

  • Oct. 20, 2025 🌟 The training framework of XiYan-SQL, XiYan-SQLTraining, will soon be released as open source in the official Alibaba repository of XiYan-SQL. Stay tuned!

  • ...

  • May. 22, 2025 🌟 New SOTA on BIRD-CRITIC-Flash: XiYanSQL-CRITIC algorithm achieves a 41% Pass Rate score on the BIRD-CRITIC-Flash benchmark, setting a new SOTA performance.

  • Apr. 29, 2025 🌟 We are excited to update our new XiYanSQL-QwenCoder-2504 series models. This version includes significant optimizations over the previous version, achieving new SOTA performance for single models. We welcome everyone to try it out.

  • Mar. 21, 2025 🌟 Want a high security data access? XiYan-MCP-server now supports the local mode, which runs XiYanSQL-QwenCoder-3B on your own PC/Mac.

  • Mar. 20, 2025 🌟 We are excited to announce that XiYanSQL-QwenCoder-32B can be used directly through ModelScope API. For details, see XiYanSQL-QwenCoder-32B-2412

  • Mar. 13, 2025 🌟We release a MCP server of XiyanSQL: XiYan-MCP-server.

  • Mar. 10, 2025 🌟We have fully released our XiYanSQL-QwenCoder series models on the HuggingFace platform: HuggingFace.

  • Mar. 04, 2025 🌟We release the method and source code of automatically generating database description for Text-to-SQL: Database Description Generation.

  • Feb. 26, 2025 🌟We are excited to open source the XiYanSQL-QwenCoder series model, dedicated to advancing the development of LLMs in the text-to-SQL domain. As of now, XiYanSQL-QwenCoder covers four mainstream model sizes: 3B, 7B, 14B, and 32B parameters, to meet the needs of different developers.

  • Jan. 22, 2025 🌟We release XiYanSQL-QwenCoder-32B and simultaneously open source the model weights.

  • Jan. 09, 2025 🌟XiYanSQL-QwenCoder-32B: XiYanSQL-QwenCoder-32B achieves an EX score of 69.03% on the BIRD test set, setting a new SOTA under only a single fine-tuned model.

  • Dec. 17, 2024 🌟New SOTA on Bird: XiYan-SQL reaches the top of Bird leaderboard with an EX score of 75.63%, outperforming the second place by 0.84 pt. It also achieves a new SOTA with an R-VES score of 71.41%.

  • Dec. 13, 2024 We release the model and source code of DateResolver.

  • Dec. 12, 2024 Try our model: ModelScope

Introduction in Short.

XiYan-SQL is an innovative framework for LLM based Text-to-SQL.

It contains:

  • XiYanSQL-QwenCoders Multiple different sizes of XiYanSQL models for SQL generation.

  • XiYan-SQLTraining A post-training framework specifically designed for the Text-to-SQL task developed by XiYan.

  • XiYan-mcp A MCP server that enables natural language queries to databases powered by XiYan-SQL, SOTA of text-to-sql on open benchmarks.

  • M-schema a semi-structured schema representation method.

  • Database Description Generation a method and corresponding code for automatic description generation for Text-to-SQL.

  • DateResolver a date understanding and reasoning enhanced model, major for Chinese.

  • MoMQ a multi-dialects Text-to-SQL MoE model based on Qwen.

  • ...

🌟 We welcome everyone to contribute to the XiYanSQL project !!!

Full Intro.

To tackle the challenges of large language model performance in natural language to SQL tasks, we introduce XiYan-SQL, an innovative framework that employs a multi-generator ensemble strategy to improve candidate generation. We introduce M-Schema, a semi-structured schema representation method designed to enhance the understanding of database structures. To enhance the quality and diversity of generated candidate SQL queries, XiYan-SQL integrates the significant potential of in-context learning (ICL) with the precise control of supervised fine-tuning. On one hand, we propose a series of training strategies to fine-tune models to generate high-quality candidates with diverse preferences. On the other hand, we implement the ICL approach with an example selection method based on named entity recognition to prevent overemphasis on entities. The refiner optimizes each candidate by correcting logical or syntactical errors. To address the challenge of identifying the best candidate, we fine-tune a selection model to distinguish nuances of candidate SQL queries. The experimental results on multiple dialect datasets demonstrate the robustness of XiYan-SQL in addressing challenges across different scenarios. Overall, our proposed XiYan-SQL achieves the state-of-the-art execution accuracy of 75.63% on Bird test, 89.65% on the Spider test set, 69.86% on SQL-Eval, 41.20% on NL2GQL. The proposed framework not only enhances the quality and diversity of SQL queries but also outperforms previous methods.

Timeline

The major events.

Date Event
2025-11 XiYan-SQL-CRITIC technique has achieved an impressive 44.37% success rate on BIRD-CRITIC-Open, a highly challenging real-world multi-dialect benchmark, securing the top position with SOTA performance!
2025-10 We are excited to release the XiYan-SQL training framework XiYan-SQLTraining !!! This framework is primarily designed for the training of SQL/general LLMs and includes capabilities such as SQL data processing, model training, and evaluation as proposed by XiYan. We will continue to enhance the framework in the future.
2025-10 XiYan-SQL-CRITIC technique has achieved a remarkable 44.53% success rate on the BIRD-CRITIC-PG benchmark, securing the top position with SOTA performance! Additionally, it recorded an impressive 48.5% success rate on the BIRD-CRITIC-Flash benchmark, also establishing a new SOTA performance.
2025-09 The download count for the XiYanSQL-QwenCoder series models on ModelScope has exceeded 100k , making it the most influential SQL model in the field.
2025-05 XiYanSQL-CRITIC algorithm achieves a 41% Pass Rate score on the BIRD-CRITIC-Flash benchmark, setting a new SOTA performance.
2025-04 We have released version 2504 of the XiYanSQL-QwenCoder series models, which features enhanced performance compared to the previous version. It still includes four different parameter sizes: 3B, 7B, 14B, and 32B. We encourage everyone to utilize these models.
2025-02 We have released the XiYanSQL-QwenCoder series model, which includes four different sizes: 3B, 7B, 14B, and 32B parameters, to meet the needs of different developers.
XiYanSQL-QwenCoder-32B has been released
2025-01 XiYanSQL-QwenCoder-32B achieves an EX score of 69.03% on BIRD test, new SOTA using only single fine-tuned model
2024-12 Reaching the top of Bird leaderboard with an EX score of 75.63% and R-VES of 71.41(new SOTA)
2024-11 Proposing XiYanSQL technology A Multi-Generator Ensemble Framework for Text-to-SQL
Achieving 41.20% on NL2GQL, and a competitive score of 72.23% on Bird dev (bird)
Achieving 89.65% on Spider test set (new SOTA), 69.86% on SQL-Eval (new SOTA)
2024-10 Proposing an SQL MoE model MoMQ
2024-09 Proposing DateSolver module
2024-05 Proposing M-schema, involving ICL in SQL generation
Achieving 86.98% on Spider test set (SOTA 86.6%)

Application

Welcome everyone to try the intelligent data querying solution based on XiYan-SQL, which is called XiYan GBI. We welcome any product experiences and suggestions for optimization.

For product introduction, please visit: https://help.aliyun.com/zh/model-studio/user-guide/brief-introduction-of-gbi-products

To try the product, please visit: https://bailian.console.aliyun.com/xiyan

Product DingTalk Group: 94725009401

Contact us:

If you are interested in our research or products, please feel free to contact us.

Contact Information:

Yifu Liu, zhencang.lyf@alibaba-inc.com

Join Our DingTalk Group

Ding Group钉钉群

Others

MseeP.ai Security Assessment Badge

Star History

Star History Chart

Citation

If you find our work helpful, feel free to give us a cite.

@article{XiYanSQL,
      title={XiYan-SQL: A Novel Multi-Generator Framework For Text-to-SQL}, 
      author={Yifu Liu and Yin Zhu and Yingqi Gao and Zhiling Luo and Xiaoxia Li and Xiaorong Shi and Yuntao Hong and Jinyang Gao and Yu Li and Bolin Ding and Jingren Zhou},
      year={2025},
      eprint={2507.04701},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.04701}, 
}
@article{xiyansql_pre,
      title={A Preview of XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL}, 
      author={Yingqi Gao and Yifu Liu and Xiaoxia Li and Xiaorong Shi and Yin Zhu and Yiming Wang and Shiqi Li and Wei Li and Yuntao Hong and Zhiling Luo and Jinyang Gao and Liyu Mou and Yu Li},
      year={2024},
      journal={arXiv preprint arXiv:2411.08599},
      url={https://arxiv.org/abs/2411.08599},
      primaryClass={cs.AI}
}

About

A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR NATURAL LANGUAGE TO SQL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •