DeepSeek today released a new model called DeepSeek-Prover-V2-671B on the AI open source community Hugging Face. It is reported that the DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports a variety of computational accuracy, which makes it easier to train and deploy the model faster and less resourceful. The parameters reach 671 billion, or an upgraded version of the Prover-V1.5 mathematical model released last year. In the model architecture, the model uses the DeepSeek-V3 architecture, adopts the MoE (Hybrid Expert) mode, has 61 layers of Transformer layer, and 7168 dimensions of hidden layer. At the same time, it supports super long context, the maximum position embedding reaches 163,800, which allows it to handle complex mathematical proofs, and adopts FP8 quantization, which can reduce the model size and improve the inference efficiency through quantization technology. (Jin Ten)
Web3 Desktop Trading Tool
Stay ahead of the game in the cryptocurrency space.