Applications are generally built out of libraries which provide generic functionality. These libraries need to be flexible enough to play their necessary parts in your application, but not so generic that doing so requires excessive amounts of plumbing with complex abstractions.
It is this last point which contributes chiefly to the disk/memory 'bloat' of modern applications, rather than excessive user-facing functionality.
Developing abstractions which strike this balance is tricky, and perhaps a social/psychological issue as much as a technical one. Software which fits the bill needs to be able to expand and evolve as users demand more and different things from it.
I personally think it's worse for the software to be simple yet rigid, rather than complex yet comprehensive, but only slightly - both are undesirable.
Linux actually provides lots of useful abstractions to build software with such as processes, threads and network sockets, firewall rules, files, mountpoints...
But even if the interactions of these entities with the outside world are quite simple to explain and reason about, they can be highly complex from the application programmer and system integrator's point of view.
The Centralisation Problem
Something which contributes significantly to this complexity is the centralisation of resources in Linux.
Taking the filesystem as an example, a typical Linux user cannot know ahead of time what files or directories a given program is going to need access to. There's no limit to the scope of what an unknown program may want to do and no way to manage it with ignorance.
The developer of an application is similarly in the dark about what they may or may not be able to do in a given target environment.
There is the FHS and similar standards which give guidelines and are generally adhered to about what's 'polite' for an application to do and what's mandatory for an environment to provide.
Unfortunately, there are still questions without good answers until you have the system in front of you, such as:
- Is /usr read-only?
- Can I make a directory under /tmp and mount things to it?
- Are extended attributes available?
- Can I open /dev/ttyACM0?
- Is usbfs mounted somewhere I can use it?
- What local address and port should I bind to?
The result is that in order to be robust, applications must do a multitude of checks in their installers and binaries to figure out how the guest system is configured and then, hopefully, 'reach an agreement'.
Complimentary to checks is the use of hefty configuration files or environment variables, where many things which are irrelevant to the actual usage of the application must be explicitly specified to make sure the program can do whatever it wants to do, such as what user apache should run as, or where wpa-supplicant should put its control socket.
So to make things easier for both developers and users, functionality present in Linux is duplicated in the application, or libraries/servers the application depends on. Some examples I can think of:
- App lifecycle management (vs xinet.d/systemd)
- Userspace packet-processing/load-balancing (vs iptables)
- Document storage in servers (vs the filesystem).
Now, i'm not saying that Linux's support for these kinds of things is totally easy to use or that it's always flexible enough for any given application to leverage. I think that where the contrary is true however, it is at least partially due to the lack of usage that these features see, thanks to the above centralisation problem - if nobody tries to use it, nobody complains and nothing improves.
So we find ourselves where we are today, effectively giving up on native services, moving the problem of compatibility and conflict avoidance to new servers and frameworks to ensure that a given application will 'just work'.
I call it moving the problem, because these replacements for native services must of course be themselves configured by some poor soul - often the package maintainer - to play nicely with other things sitting directly on the system (including alternative replacements).
A way around the unfortunate situation outlined above where we can better leverage built in features of Linux and make application development and deployment easier, is the idea of the VM.
To package your application in a VM, you can start with a plain OS image, write a script to configure the environment exactly how you want it, add your own stuff and ship the VM image.
Though a VM may have lots of complexity inside it, the interface it presents to the outside world is indeed quite simple - one of storage volumes, network ports and some allocation of CPU time and memory - and entirely under the developer's control.
A user can deploy this application with peace of mind, with just a basic understanding of these high-level concepts and application documentation written in those terms.
They don't have to treat the application like a special snowflake which must be started and stopped in a specific way (Power on/ACPI off), nor do they need to worry about whether an untrusted application must run as root, or whether it can access all the files it needs.
In short: VMs allow you to avoid conflicts over centralized resources, turning deployment into a relatively high-level activity.
So why aren't we all just using VMs for most things? Some reasons I can think of right now:
- Speed - VM needs to boot up
- Resource requirements - you need a good chunk of memory and throughput can be impacted
- Some modes of desirable interaction are hard to achieve, such as using X11 and exposing host directories to the guest
- Difficulty of development - every time you change the source code, you can do one of the following undesirable things to test the result:
- Rebuild and reboot the VM
- Try to completely 'clean up' and rerun your vm-customizing script
- Install the application directly into your dev machine
- Develop inside the VM
It feels like some additional tooling could perhaps be used to bridge this gap...
I think containers are supposed to be the solution to this; they're just like VMs, offering the decentralisation of resources, providing a high-level application view to the user whilst simultaneously resolving developer unknowns about the target environment.
A popular solution for containers at the moment is the use of Docker with Linux's relatively recent namespacing functionality. This provides a very performant alternative to VMs, with docker laying out the high-level view of a container in terms quite similar to those of a VM.
It should be stated that alternative backends to docker exist which are based on VM technology, providing a far more easily securable form of isolation. Security is a whole other thing however, and here I am intentionally focusing on UX for developers and users.