ACSA SysAdmin report

iBug
May 27, 2023

Overview

  • Server administration
    • IP and hostname
    • Remote management (IPMI)
    • Internet access
  • NFS and storage
    • ZFS
    • Proxmox VE
  • Server authentication
  • Administrative policies
  • Miscellaneous

Server administration

Problem 1: Looking up server IP addresses every once in a while

  • A redundant step before starting working
  • Need to keep the IP info page up-to-date
  • Not friendly to automation

IP address and hostname

Solution 1: Assign static IP addresses to servers

  • Permission from USTCnet architecture
  • Slightly more reliable
  • Still not easy to remember
  • Still requires intervention in certain cases

IP address and hostname

Solution 2: Assign DNS resolution to servers

  • Minimal technical barrier
  • Our internal domain: acsalab.com
  • Easy to remember
  • Friendly to automation
  • IPv6 enabled

Current state: Both solutions applied

Server administration

Problem 2: Any server outage requires a visit to the datacenter

  • Tedious for humans
    • Extra traffic expenses
    • Pandemic control policies further aggravates the problem
  • Extended downtime

IPMI

Intelligent Platform Management Interface: Computer interface for remote management

  • Independent from host CPU, firmware and OS
  • Two-way access with the main system
  • Literally everything you need to manage a server
    • Remote control (KVM or serial)
    • Virtual Media
    • Event logging
    • SNMP

Usually implemented through a Baseboard Management Controller (BMC)

  • Network access through IPMI or web
  • Comes with dedicated NIC
root@rosemary:~# ipmitool lan print 1
Set in Progress         : Set Complete
Auth Type Support       : MD5
Auth Type Enable        : Callback : MD5
                : User     : MD5
                : Operator : MD5
                : Admin    : MD5
                : OEM      : MD5
IP Address Source       : Static Address
IP Address              : 10.38.79.1
Subnet Mask             : 255.255.255.0
MAC Address             : d0:50:99:f1:92:d4
SNMP Community String   : AMI
IP Header               : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control         : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl   : 0.0 seconds
Default Gateway IP      : 10.38.79.254
Default Gateway MAC     : d8:67:d9:70:e9:41
Backup Gateway IP       : 0.0.0.0
Backup Gateway MAC      : 00:00:00:00:00:00
802.1q VLAN ID          : Disabled
802.1q VLAN Priority    : 0
Bad Password Threshold  : 0
Invalid password disable: no
Attempt Count Reset Int.: 0
User Lockout Interval   : 0
root@rosemary:~#
      

Problem: Remote management

Solution: Obvious

Benefits:

  • Physical access required only for hardware maintenance
    • ... plus the air conditioner
  • Access hardware information without powering on main system
  • Provides additional information for troubleshooting

Server administration

Problem 3: Internet access for servers

  • A wide range of tasks on the server requires internet access
    • System update (although USTC Mirrors provides some of them)
    • Installing environments
    • Downloading datasets
    • Cloning Git repositories
  • WLT is limited to 1 IP per user

Internet access

  • (Almost) All servers are connected to NFS
  • Why not use NFS as a gateway?

Internet access

Linux has a full-fledged network stack with routing and NAT capabilities.

Care must be taken when setting up the network

  • Server-initiated connections go through NFS
  • Incoming connections go same way back
  • Nice to have: USTCnet is reachable directly
  • Nice to have: Google and GitHub access
ibug@snode6:~$ ip ru
0:      from all lookup local
2:      from all lookup main
3:      from 202.38.72.23 lookup 1
10:     from all lookup 2
32766:  from all lookup main
32767:  from all lookup default
ibug@snode6:~$ ip r s t 1
default via 202.38.72.126 dev enp0 proto static
ibug@snode6:~$ ip r s t 2
114.214.160.0/19 via 202.38.72.126 dev enp0 proto static
114.214.192.0/18 via 202.38.72.126 dev enp0 proto static
202.38.64.0/19 via 202.38.72.126 dev enp0 proto static
210.45.64.0/20 via 202.38.72.126 dev enp0 proto static
210.45.112.0/20 via 202.38.72.126 dev enp0 proto static
211.86.144.0/20 via 202.38.72.126 dev enp0 proto static
222.195.64.0/19 via 202.38.72.126 dev enp0 proto static
ibug@snode6:~$ ip r s t default
default via 10.1.13.1 dev ibs1 proto static metric 50
default via 202.38.72.126 dev enp0 proto static metric 100

Internet access

Solution: Route server internet access through NFS server. Further routing and splitting only needs to be done there.

NFS and storage

Problem: NFS was slow and frequently running out of space

  • Two-step migration (October 2022)
  • Old setup: 8× 4 TB SAS HDD
  • New setup: 6× 18 TB SATA HDD + 2× 4 TB SSD
  • Current usage (May 22): 14.3 TiB / 49.1 TiB (29%)
    • Compression ratio: 1.72
    • Used: 14.3T / Logical used: 24.2T

NFS: Old setup

  • 8 spinny boi:
  • 4 TB each
  • RAID 10 using built-in RAID controller (13.4 TiB usable)
  • Single-partition layout, using ext4

NFS: New setup

  • 6 spinny boi + 2 SSD:
  • HDDs are 18 TB each, SSDs are 4 TB each
  • HDD : RAID 10 using ZFS (49.1 TiB usable)
  • SSD : OS (16 GiB), Read cache (4 TB), Write cache (64 GiB)
  • NFS over RDMA

ZFS

  • Zettabyte File System with volume management features
  • Originally developed by Sun Microsystems, for Solaris
  • Open-source implementation by the OpenZFS community

ZFS

  • Separate logical and physical layers
    • Datasets (subvolumes) and ZVOLs
    • Striped, Mirrored, RAIDZ, RAIDZ2, RAIDZ3 vdevs
  • Log-structured filesystem design
    • Automatically consistent
    • Instant snapshots & restoration
  • Data integrity
    • Hierarchical checksum
    • Self-healing (in mirrored and RAID modes)
    • No fsck required

ZFS

  • Separate logical and physical layers
  • Log-structured filesystem design
  • Data integrity
  • Efficient RAID rebuilding
  • Intelligent caching
    • Separate read and write caching strategies
    • Multi-layered caching (Tiered storage)
    • High cache hit rate (typ. >90%)
  • Tunable
  • Hidden perks?

ZFS

  • High CPU and memory usage
    • Native transparent compression
    • Native data deduplication
    • Native encryption
  • Fragmentation after long run
    • Inherent problem to LFS
    • Mitigated by large cache

NFS server

  • 2× Xeon Silver 4208 CPU
  • 128 GB RAM
  • Dedicated to storage: Good for ZFS
  • Daily snapshots: ls ~/.zfs

Performance

  • Test file span: 4 GiB
  • 1 MiB Sequential
    • 830 MiB/s Read (1.15±0.75 ms)
    • 493 MiB/s Write (0.89±1.84 ms)
  • 512K Random
    • 327 MiB/s Read (1.47±0.6 ms)
    • 480 MiB/s Write (0.62±5.15 ms)
  • 4K Random QD32
    • 36.1 MiB/s Read (9200 IOPS, 0.1±0.06 ms)
    • 213 MiB/s Write (55k IOPS, 8±480 μs)
  • 4K Random QD1
    • 26.5 MiB/s Read (6800 IOPS, 0.14±0.1 ms)
    • 443 MiB/s Write (113k IOPS, 3.8±1.5 μs)

Proxmox VE

  • Proxmox Virtual Environment (Proxmox VE or PVE) is an open-source software server for virtualization management.
  • Based on Debian GNU/Linux, featuring kernel support for virtualization, containers and networking
  • Provides ZFS kmod out-of-the-box (thanks to Ubuntu)

Server authentication

Problem: Server access is very inconsistent.

  • Access to any server, by anyone, must be provided by the administrator.
  • UID and GID assignment is manually done, and sometimes inconsistent.
  • Revoking access to departed members is another messy job.
  • Synology?

LDAP

The Lightweight Directory Access Protocol is an industry standard protocol for managing directory information services.

  • Originally designed for hostnames
  • Also stores people, groups, and other objects
  • Centralized management
  • Client-server protocol

PAM and NSS

  • Pluggable Authentication Modules: configurable user authentication
  • The Name Service Switch is a libc module for integrating various information providers
    Hosts, users (passwd), groups, and other identifiers

LDAP setup

  • Server: OpenLDAP slapd server
  • Client: libpam-ldapd + libnss-ldapd

Where to install server software?

Policies

Sudo rule: Trust-based, granted on request.

  • Admin sudo is granted from LDAP (i.e. on all servers).
  • Normal users' sudo granted on per-node basis (via usermod -aG).

The famous "sudo warning":

We trust you have received the usual lecture from the local System Administrator. It usually boils down to these three things:

#1) Respect the privacy of others.

#2) Think before you type.

#3) With great power comes great responsibility.

Miscellaneous

Thank you!