“`html
Inferact, a startup focused on accelerating AI inference, launched on January 23, 2024, with $150 million in seed funding. The round was lead by Andreessen Horowitz and Lightspeed, with participation from Sequoia, Databricks, and others. The company is built around vLLM, an open-source inference engine already used by Amazon, major cloud providers, and thousands of developers.
What is vLLM?
vLLM addresses a critical bottleneck in AI deployment: the speed and cost of inference. When you interact with a large language model (LLM) like ChatGPT,the process of generating a response – known as inference – can be slow and resource-intensive. vLLM aims to transform this process from a congested “traffic jam” to a streamlined “AI highway system.” It achieves this through two key innovations:
- PagedAttention: This technology optimizes memory management, similar to how a computer uses RAM. By efficiently allocating and deallocating memory, PagedAttention reduces memory waste by up to 24x compared to traditional methods.