Silverstripe in the Wild: Crawling the Web for Silverstripe Statistics

Some months ago, I was working on upgrading multiple Silverstripe websites from version 3 to 4. The process of doing so can be a major challenge depending on how complicated a website's code is. Even so, if you value security and performance, there's no real way around upgrading. But that whole process made me think: What is actually being used out in the wild? In the following post, we will look at different data about public Silverstripe installations. In order to do that, we will parse through 72TB of data to find public Silverstripe instances and then analyze them for more information. If you can't see the charts, please enable JavaScript!

What Installs Don't Tell You

A first clue can be found in Silverstripe's packagist statistics. Packagist counts all installs of the Silverstripe Framework over time.

As you can see, installs of version 3 dropped dramatically over time while version 4 accounts for most installs today. Sadly, this is not the whole picture. First of all, composer also counts updates as installs, so we don't know how many instances are actually running it. What if users of Silverstripe 4 just update three times more often than users of version 3? Next, composer can't track all installations. 10 years ago, it was normal to extract a .zip file onto your server instead of using a package manager. What if those instances where never updated over time? In the end, composer just gives you a little peek into a picture that goes far deeper. So how do we get a better look?

Well, if you've used Silverstripe before, you might have noticed a little HTML tag in your code.

<meta name="generator" content="SilverStripe - http://silverstripe.org" />

The generator tag tells you which system was used to generate the content on a website. In this case, it tells us the website is running Silverstripe. So how does that help us? Surely we can't crawl every website and check its generator tag. But what we can do is parse data that has been collected by big crawlers. One of those crawlers provides an archive of 72TB of data that you can scan for e.g. generator tags. By doing so, we can collect URLs of websites that use Silverstripe and then crawl only them for more information.

Parsing 72TB of data is a challenge itself, but 500 lines of efficient, parallizable Rust code later and we're able to jump straight in!

Silverstripe Versions in the Wild

In the following chart, we see the result of checking 11,469 Silverstripe instances for their corrosponding version.

Interestingly, the share of Silverstripe 3 is still higher than that of 4. But surprising might be the fact that there are still at least 872 installs running version 2.x out there. While I will never name any domains for obvious reasons, there are some domains that stand out.

  • A security company that claims to provide "cutting edge technology" to protect your home and office from burglars
  • IT/Media lawyer websites
  • Medical providers collecting sensitive user information via forms
  • Recently made websites that were implemented with 2.x instead of going with 4.x

But why is that a problem? Well, the newest Silverstripe 2 subversion is almost ten years old, but it's not just the age, it's also that multiple exploits already exist that could be abused against such an instance. Would you feel comfortable entering your medical information into such a website? If you're living in the EU, this would even be a violation of GDPR laws.

Sadly, in my experience, most clients don't really care about update schedules as long as their websites work. In addition, not all agencies make any plans to upgrade those websites once the initial contract is fulfilled. If you create websites for clients, you might charge an additional fee every so often to do upgrades, or maybe you just include it into the monthly hosting fee. While not caring is certainly an option, at least think about hardening your server's security, so it's not possible for a hacker to gain access to other client's websites stored on the same server.

Okay, we've talked much about Silverstripe 2.x, but what about Silverstripe 3? The problem is that Silverstripe 3 has been end-of-life since october 2021. While that might be more recent than 2.4, in some years Silverstripe 3 will become what Silverstripe 2 is today, an outdated security risk.
Even worse, the amount of work you have to put in to upgrade a version 3 instance to 4 can be really high. I've personally upgraded around 30 Silverstripe instances from 3 to 4 using Silverstripes excellent upgrade tool, but there is just too much else to consider, creating a bigger than necessary barrier for upgrading.

Hopefully, Silverstripe 4 will soon break the 50%, maybe with your help?

PHP Versions in the Wild

Talking about outdated versions, why don't we take a look at the PHP version data that we are able to collect?

Back in the days, and sometimes even today, the used PHP version was often sent as HTTP header to the client.

'Server': 'Apache/2.4.6 (CentOS) OpenSSL/1.0.2k-fips mod_fcgid/2.3.9 PHP/5.4.16'
'Powered-By': 'PHP/5.4.16'

This, for example, is a response from a Silverstripe 2 instance reporting the usage of PHP 5.4.16. By parsing these headers, we can paint a little, distorted, picture, that I will explain in the paragraph after the chart.

The good part: 2/3 of all Silverstripe instances I've analyzed don't expose their PHP version. The rest of the data is also not very expressive. While there are almost no reported users of PHP 8, that doesn't have to be true, because it's more probable that users of PHP 8 hide their version. Still, there's one statistic that we can make a statement with:

At least 28% of all instances use an end-of-life PHP version. Furthermore, if it were already november 2022, this number would rise to over 32%. Are you still using an outdated PHP version? Then you should think about making the switch. Updating your PHP version is often pretty simple and minimizes another security risk.

Top-Level-Domains in the Wild

When you think about Silverstripe, where do you think it is most popular? New Zealand, where it comes from? Today, we will not check where the websites are hosted but what domain distribution they have.

Overall leading the board we got .com, closely followed by .de. Only at the third place we start to see the .co.nz top-level-domain. Keep in mind that this data doesn't tell us that New Zealand isn't the biggest user of Silverstripe. If we look further down the list, we see multiple *.nz top-level-domains like govt.nz that are not summed together. Moreover, .com doesn't tell us where the website is coming from. To do so, we would have to check where the website is hosted, which we will do in the upcoming part 2 of this analysis.

Dev Mode in the Wild

Have you ever heard of Silverstripe's environment types? If not, you might want to check yours. It allows you to signal the CMS if you're running in a production or development environment. If Silverstripe is instructed to run in dev mode, it exposes a /dev route to any visitor. There, you can rebuilt the database and run pre-defined tasks. Older versions can even expose a route that will clear the whole database. If the environment type is set to live, you have no concerns, because only administrators can call /dev.

When your setup has gone completely wrong, you also expose /dev/config to any visitor. This route will list all config variables and values set through yaml files. For example, if you are using modules like S3, you risk exposing your AWS secret key.

As we can see above, 85.6% of all instances do it the right way. On the other hand, 14.4% expose the /dev route, while only 1.9% expose the /dev/config route. This is good and bad at the same time. Exposing /dev itself can be a way to make yourself vulnerable to DoS attacks, because some of the tasks can be very intensive. On the other hand, if you don't have any sensitive task itself, there's not that much to worry about. Still, try to change your environment type to live if possible.

Webservers in the Wild

Coming to a less controversial topic: Which webservers do Silverstripe users use?

Not that surprising, Apache comes out on top by a large margin. Due to its .htaccess system, it is especially popular for shared hostings and developers that want a quick and easy setup. Nginx, which is normally more performant than Apache, comes out on second place with 30%. Next we have Cloudflare's proxy, taking 6.7% of the chart. While Cloudflare itself is not a webserver, if you put your website behind Cloudflare, it will report itself as the webserver.

Modules in the Wild

In the current state of web development and web design, some modules are often needed in order to fulfill specific needs. Today, we will look at three of those.

  • Elemental lets you create blocks of differently structured content that can be easily arranged in the backend.
  • Fluent allows you to localize your website into different languages.
  • Translatable does the same as Fluent. It was very popular for Silverstripe 3 but has been deprecated.

The following chart shows on what percentage of all instances the modules are installed:

As we can see, Elemental has been very popular as a module. It's not too surprising when you check the install statistics on packagist. On the other hand, more than every tenth Silverstripe instance installed Fluent and a little less than that runs Translatable. Translatable might be a little more impressive because it's not available for Silverstripe 4, while Fluent has been available since Silverstripe 3. Keep in mind that this data only shows how often the modules are installed, not how often they are actively used.

Conclusion

As we have seen, there are still many outdated instances running today. While Silverstripe 3 has been end-of-life for a year, ressources and Silverstripe 3's complicated upgrade path to 4 does not exactly incentive developers to upgrade. I've tried my best to provide an object view onto the numbers, but of course, they rely on the availability of the meta tag. Because of that, the data does not represent a definite and complete picture, but tries to fill in as many spots as possible.