Concatenating retrieved documents While using the query turns into infeasible since the sequence size and sample dimension grow.LLMs call for extensive computing and memory for inference. Deploying the GPT-three 175B model needs at the least 5x80GB A100 GPUs and 350GB of memory to retail store in FP16 format [281]. These types of demanding needs f