Node Group
- Implemented a historical job list view for individual nodes, enabling users to review the jobs executed on each node.
- Added a timeline view with a detailed operation history of a node group, showing what operations were performed, who executed them, and their durations.
- Enhanced node operations with the following improvements:
- Users can now add comments to operations (e.g., release, replace, reboot, drain, and scale) to explain the reasoning behind each action.
- Users can now enable node release protection to prevent nodes from being accidentally released by using the dashboard or calling the API. (Only for reserved nodes.)
Endpoint
- Added support for loading models from the file system with vLLM and SGLang engines, offering users greater flexibility in model management.
- Added a feature to show the reason for scaling an endpoint deployment to zero, enabling users to view the scale-down reason and timestamp.
Batch Jobs
- Introduced a new metric chart to monitor InfiniBand usage, allowing users to inspect detailed InfiniBand usage for each replica of the job.
Billing
- Introduced a cumulative display mode for usage charts, enabling users to view data in both periodic and cumulative formats.