This reduces the need for constant memory fetching, making the process faster and smoother. Instead of loading new data every time, the AI model reuses some of the data it already processed. Their method cleverly bypasses the limitation using two key techniques that minimize data transfer and maximize flash memory throughput: In a new research paper titled 'LLM in a flash: Efficient Large Language Model Inference with Limited Memory,' the authors note that flash storage is more abundant in mobile devices than the RAM traditionally used for running LLMs.
To tackle this issue, Apple researchers have developed a novel technique that uses flash memory – the same memory where your apps and photos live – to store the AI model's data. LLM-based chatbots like ChatGPT and Claude are incredibly data and memory-intensive, typically requiring vast amounts of memory to function, which is a challenge for devices like iPhones that have limited memory capacity. Apple AI researchers say they have made a key breakthrough in deploying large language models (LLMs) on iPhones and other Apple devices with limited memory by inventing an innovative flash memory utilization technique. Apple GPT in your pocket? It could be a reality sooner than you think.