Update - ARCCA Hawk (University Supercomputer Service) - Maintenance Update

Please be advised that the planned maintenance to the ARCCA Hawk supercomputer (OCF Hawk software stack deployment - OCF Steel) is currently behind the advertised schedule such that the timetable for service restart will not be completed by close of play on Friday, 27th January.

OCF encountered issues during setting up the base configuration that delayed the subsequent installation tasks, leading to a negative impact on the project’s timetable. Our regular contact with OCF, with end of day meetings on a daily basis, has led us to issue this update.

We envisage the restoration of the full service proceeding along the 3 steps summarised below:

1. Pilot service on “core” partitions (Expected: close of play, Tuesday 31st January):

We anticipate that OCF will have the core partition nodes and slurm (workload manager) configured by close of play on Friday 27th January, with the commencement of a pilot service for users to follow by the close of play on Tuesday, 31st January.
Note that this pilot service will display the following features:

a. Provide users with full access to their data.
b. Enable users to run jobs on the “core” compute partitions.
c. Enable any issues to be identified before the service moves to full production status.
d. To save time, by aligning this service with the Software Stack Acceptance tests. With the latter taking preference, all users need to be aware and accept that any job may be killed at short notice.

2. Researcher expansion systems. Expected configuration activity to take place week commencing 30th January.

Researcher expansion systems will be configured throughout the week commencing 30th January. These include the LIGO and Ser Cymru systems as well as the additional dedicated research expansion slurm partitions.

3. Production service (Expected w/c 6th February)

We are confident that all works will be completed by the end of next week i.e. by close of play on Friday, 3rd February. This will result in the service entering full production mode once again on Monday 6th February.

Should any further issue emerge in the days ahead, we will notify the user community through the traditional channels.

In the meantime, we apologise for the inconvenience caused by this delay.

Christopher Dickson – Service Manager

Jan 27, 2023 - 16:41 GMT
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jan 16, 2023 - 08:01 GMT
Scheduled -

Hawk (University Supercomputer Service)


Planned Maintenance - Monday, 16th January 2023 to Friday, 27th January 2023



What is being done?
ARCCA and OCF, the new software supplier for Hawk, are undertaking planned maintenance on the Hawk supercomputer and its supporting infrastructure.

When is this being done?
Following discussions with OCF and taking account community feedback, we have agreed the timetable for the schedule of work to install the new software environment onto Hawk.

This work will require a full outage beginning 08:00 Monday, 16th January, lasting up until 17:00 Friday, 27th January 2023.

Should the work be completed early we will of course endeavour to bring Hawk back into service as soon as possible.

Who will be affected?
Users of Research Computing services provided by ARCCA or Supercomputing Wales (SCW).

How will this affect service users?
Hawk and its supporting infrastructure will be unavailable during the specified window.

Why is this being done?
This is a contractual obligation to upgrade the High-Performance Computing software environment (HPC) to OCF's supported deployment.

Communication & Collaboration Operational
Desktop Telephony Operational
Email & Calendar (Outlook) Operational
Microsoft Teams Operational
Netcall (Call Centre functionality) Operational
Skype for Business (Telephony & Instant Messaging) Operational
Yammer Operational
Zoom Operational
Computers, Software, & Printing Operational
Desktop, Laptops, & Mobile devices Operational
Printing, Copying, & Scanning Operational
Published Applications (Horizon) Operational
Published Desktops (Horizon) Operational
File Storage Operational
Home & Shared drive (H: & S:) Operational
OneDrive for Business Operational
Finance Systems Operational
Financial and Procurement System (Oracle-EBS) Operational
Help & Support Operational
Service Desk Support (IT single point of contact) Operational
IT Portal (Self Service ticket logging) Operational
Student Enquiry (GECKO) Operational
LinkedIn Learning Operational
HR & Student Record Systems Operational
PeopleXD (Core HR) Operational
SIMS (Admin) Operational
Library Systems Operational
Electronic Resources ? Operational
Library Management System (ALMA) Operational
LibrarySearch (Primo) Operational
Reading Lists (Leganto) Operational
Network Operational
Wired Network Connectivity Operational
Wireless Network Connectivity Operational
Virtual Private Networks (VPN) Operational
Public Website & Intranet Operational
Public Website (Squiz) Operational
Intranet (Squiz) Operational
Research Under Maintenance
ARCCA Processing and Analysis (Hawk) ? Under Maintenance
ORCA & Manage My Publications ? Operational
Research Data Store ? Operational
Research Portal (Converis) ? Operational
Room & Equipment Bookings Operational
Resource Booker Operational
Room Bookings (Syllabus Plus / Scientia) Operational
Student Systems Operational
Blackboard Collaborate Operational
Learn Plus (Panopto) Operational
Mahara Operational
Residences Student Portal (Accommodation) Operational
Student App Operational
Student Records (SIMS Online) Operational
Student Timetable Operational
Turnitin Operational
VLE (Learning Central) Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Jan 30, 2023

No incidents reported today.

Jan 29, 2023

No incidents reported.

Jan 28, 2023

No incidents reported.

Jan 27, 2023

Unresolved incident: Hawk (University Supercomputer Service) - Planned Maintenance on Monday, 16th January 2023 - extended to Friday, 3rd February 2023..

Jan 26, 2023

No incidents reported.

Jan 25, 2023
Resolved - Microsoft have informed us that the Microsoft 365 services have been restored.
Jan 25, 14:45 GMT
Monitoring - Microsoft have discovered the cause of the issue affecting their Microsoft 365 services and have implemented a fix. They are monitoring the service as it recovers.
Jan 25, 10:23 GMT
Investigating - University IT have received reports from Microsoft that customers may be unable to access multiple Microsoft 365 services, including the following:

- Microsoft Teams
- Exchange Online
- Outlook
- SharePoint Online
- OneDrive for Business
- Microsoft Graph

We are awaiting further information from Microsoft and will provide updates when we have them.

Jan 25, 08:23 GMT
Jan 24, 2023
Resolved - The issue with the Web Print and Follow-Me printing services has been resolved.

Apologies for any disruption this has caused.

Jan 24, 09:20 GMT
Identified - Problems with the Web Print and Follow-Me printing services have been identified. Engineers are working on the issue. Print jobs are currently processing, albeit slowly.

We appreciate your patience whilst we resolve this issue, and we ask that you check whether your submitted print job is in the queue before attempting to submit it again.

Jan 23, 14:49 GMT
Investigating - University IT are aware of potential issues with Web Print and Follow-Me printing, and technical staff are currently investigating.

An update will be provided as soon as more information is available.

Jan 23, 14:11 GMT
Jan 23, 2023
Jan 22, 2023

No incidents reported.

Jan 21, 2023

No incidents reported.

Jan 20, 2023

No incidents reported.

Jan 19, 2023

No incidents reported.

Jan 18, 2023

No incidents reported.

Jan 17, 2023

No incidents reported.

Jan 16, 2023
Resolved - Our suppliers have indicated that the issue with the Turnitin service has now been resolved.

Apologies for any disruption this has caused.

Jan 16, 15:57 GMT
Investigating - University IT are aware of an issue with the Turnitin service where there are reports of it being very slow. We are awaiting updates from the suppliers around this issue.

An update will be provided as soon as more information is available.

Jan 16, 10:51 GMT