Modern General Purpose Graphic Processing Unit (GPGPU) demands a large Register File (RF), which is typically organized into multiple banks to support the massive parallelism. Although heavy banking benefits RF throughput, its associated area and energy costs with diminishing performance gains greatly limit future RF s-caling. In this paper, we propose an improved RF design with a bank stealing technique, which enables a high RF throughput with compact area. By deeply investigating the GPGPU microarchitecture, we identify the deficiency in the state-of-the-art RF designs as the bank conflict problem, while the majority of conflicts can be eliminated leveraging the fact that the highly-banked RF oftentimes experiences under-utilization. This is especially true in GPGPU where multiple ready warps are available at the scheduling stage with their operands to be wisely coordinated. Our lightweight bank stealing technique can opportunistically fill the idle banks for better operand service, and the average GPGPU performance can be improved under smaller energy budget with significant area saving, which makes it promising for sustainable RF scaling.