You're right, we don't listen to paper. That was just an analogy.
A squarewave representation of sound is a less accurate representation than a sinewave. In order to convert a squarewave to a sinewave you have to fill in the missing information and that is accomplished in various ways by the digital to analog process. Upsampling, resampling, are just two ways to further smoothout the signal but all it is doing is creating smaller squares (or higher frequency) the same as a higher rez digital source.
Higher resolution digital comes closer to a sinewave as the conversion process better refines the signal as it is altered into an analog signal. The missing information that must be filled in to create a curvy sinewave is a best guess by the algorithm used to create a sinewave and hence, analog signal. If the digital signal were perfect and complete there would be no need to convert to an analog sinewave since it would already be there.
Yes, the digital signal is converted to analog to do that requires a sinewave but to convert squares to curves requires those missing areas be filled in with best guess information such as with a CD being converted the DAC.
A digital signal can converted to analog very accurately but is still only making an educated quess needed to fill in the gaps of missing info to create a sinewave for use in reproduction of sound. Its very close but not perfect or exact in computer science terms.
.