I have zero clue about using Google's infra. Think they run an optimized, stripped-down, KVM hypervisor.
You've asked a question that requires detailed knowledge about what your program(s) do. Is the hash table in RAM, using a commonly used file-based DB, using a custom file-based DB, using NoSQL or using some RDMBS? The solution to improve performance for each would be vastly different.
A little background about the tools used would help greatly.
We have to figure out what is causing the slowdown. My eyes cannot read the color output in those images. Sorry. Blue, red and green are particularly bad for me. I use bright green/yellow on a black background myself with color output disabled for everything.
If you could post using code-tags and text, that would help greatly (while saving bandwidth for people who pay by the bit). Then I can scale the text here to read it. Or not - your choice.
Do you have system monitoring installed on the VM? Munin, monit, sysusage, nagios, cacti, anything? Then we could see where any throttling is happening.
CPU
RAM
Disk I/O
Network I/O
something else?
Often, if you are CPU bound, then either splitting up the task into separate processes or threads is the easiest answer. Just depends on the data involved. That's why facebook can scale out so easily compared to shared transactional DBs which have to go through effectively a single writing process. Splitting into separate processes can have the added benefit of helping with any per-CPU memory access limitations or performance issues. Ubuntu's kernel doesn't limit per process RAM from what I can see. I checked some of my 16.04 systems today. None have limits.
On servers, different RAM chips are closer to different CPUs, so accessing RAM on a foreign CPU is slower. To know the details around that, you'd need to have access about the physical hardware and how the VM was setup for RAM-to-CPU affinity. I've never had to deal with this myself, but know it exists. In a prior job, looking at detailed server architectures and picking the best for our needs was something I did, though I didn't do that for PC-servers, just PA-RISC, PowerX, UltraSPARC, and a few other commercial Unix providers.
Tuning is about handling each issue as they are discovered, then handling the next and the next until either time or money run out.

OTOH, we often spend a week trying to optimize a task that if we'd just let run the first day, it would have completed in under 2 days. Always consider the trade-off for "fastest solution time." But if you will need to do this task monthly for the next 3 yrs, then tuning it would be very worth while.
Bookmarks