Skip to main navigation menu Skip to main content Skip to site footer

← Return to Article Details Download Download PDF

OPTIMIZING ATTENTION AND INFERENCE IN LARGE LANGUAGE MODELS: BALANCING EFFICIENCY, INTERPRETABILITY, AND ENERGY CONSUMPTION