If you already have the commentary and it is timed correctly for the video you can simply combine the video and audio. This has the advantage of copying the video so there is no re-encoding and no loss of video quality. This will add another audio stream, but not combine an existing audio stream in video.mp4 with audio.wav.
If you want to merge existing audio with the commentary into one audio stream:
ffmpeg -i video.mp4 -i audio.wav -c:v copy -c:a libfaac -q:a 100 -shortest output.mp4
See the documentation for more info on amerge and pan.
ffmpeg -i video.mp4 -i audio.wav -filter_complex "amerge,pan=stereo:c0<c0+c2:c1<c1+c3" -c:a libfaac -q:a 100 -shortest output.mp4
The input audio does not have to be WAV. This is just an example and you can use any format that ffmpeg can decode.
Note that these examples are likely not to work with the libav fake "ffmpeg" from the repo. I'm using a recent compiled real ffmpeg. Get a static build if you don't feel like compiling ffmpeg.
I tested these commands using somewhat incongruent sources which made an interesting video...