CERN EOS information

As many know, CloudStor is built upon CERN’s EOS filesystem, that underpins the LHC in Geneva. Many people ask about it in further detail, and want to give it a test run. Many met Jakub (aka Kuba), the head of storage systems at CERN last week at the eResearch conference.

Being primarily to support the science behind the LHC, public information and outreach isn’t at the same level as many other systems such as Ceph. Googling EOS and a problem, rarely produces a useful answer.

https://eos.web.cern.ch/ is the primary website of EOS at CERN, and https://github.com/cern-eos/eos is their GitHub code repo.

Within CloudStor, we have multiple name spaces running via Kubernetes, with raft consensus metadata masters spread around multiple machines, and our newer deployments have boot times that are in the order of seconds for petabytes. The older in-memory system (which CloudStor itself runs, and is migrating from), does take longer, but boots hot. The newer raft consensus system gains its boot speed, by not being hot with data at the point of availability. It does warm up though with data loading as it’s used.

As for scale, it’ll readily expand out to the tens of PB of presented storage per namespace, and it also has tape integration. The tape integration is not a HSM system, however with a little forethrough it can be made to behave in such a way.

It also handles latency well, and it’s quite possible to run a 340ms wide EOS cluster with a singular namespace, and make it feel as if it’s local. A paper published at CHEP2016 was jointly written by myself, L. Mascetti, and C-Y Hsu on this topic. https://www.researchgate.net/publication/321236974_Global_EOS_exploring_the_300-ms-latency_region

It’s not a HPC filesystem replacement, instead working on the design that many data problems are embarrassingly parallel, and thus obtains aggregate throughputs in the hundreds of GB/s to the terabyte/s speeds. In terms of many operations, it can operate in the MHz per core.

CERN are keen to see this solution grow, as despite the number of competing platforms within CERN to support the science, EOS is the only one to date that actually has managed to truly handle the big data problem without data loss.

I’m quite happy to answer any questions, and do any introductions as desired. The cloud services team of AARNet has spent this week collaborating with CERN in our Brisbane office, working through roadmaps, and other efforts to further the capability and deployment, and outreach is something we both groups wish to do.

2 Likes