Node Feature Discovery in Practice

Author

Posted Nov 21, 2024

Reads 901

Person Holding Node Text
Credit: pexels.com, Person Holding Node Text

Node feature discovery is a powerful technique that helps you uncover the most informative features in your data. By applying this method, you can significantly improve the performance of your machine learning models.

In practice, node feature discovery involves calculating the importance of each feature in your dataset. For instance, in the example from section 3, "Calculating Feature Importances", it was shown that the importance of the 'age' feature was significantly higher than the 'income' feature.

By understanding which features are most informative, you can make data-driven decisions about which features to include in your model. This can help you avoid overfitting and improve the overall accuracy of your results.

The key takeaway is that node feature discovery is a useful tool for getting a better understanding of your data and making more informed decisions about your machine learning workflow.

Software Updates

Installing the Node Feature Discovery Operator is a straightforward process that can be done using the OpenShift Container Platform CLI or the web console.

As a cluster administrator, you have the flexibility to choose the method that suits you best.

Configuring the Assistant

Credit: youtube.com, Tutorial: Taming Unbounded Resources with Node Feature Discovery - Sponsored by SUSE

To configure the Assistant, you'll need an OpenShift Container Platform cluster. Install the OpenShift CLI (oc) on your machine.

As a cluster administrator, you'll want to log in as a user with cluster-admin privileges. This will give you the necessary permissions to install and manage the NFD Operator.

To get started, create a namespace for the NFD Operator. This will be the home for the Operator's resources and pods.

Here's a quick checklist to ensure you have everything you need:

  • An OpenShift Container Platform cluster
  • Install the OpenShift CLI (oc)
  • Log in as a user with cluster-admin privileges

About the

Our assistant is a helpful, smart, kind, and efficient AI assistant designed to assist and augment human capabilities.

It's built to fulfill user requests to the best of its ability, making it a valuable tool for a variety of tasks and applications.

With its ability to process and analyze vast amounts of information, the assistant can provide accurate and relevant responses to user queries.

This allows users to focus on more important tasks, while the assistant handles the rest.

By leveraging its advanced capabilities, users can achieve greater efficiency, productivity, and success in their endeavors.

Configuring

Close Up Photo of Network Switch
Credit: pexels.com, Close Up Photo of Network Switch

The path to the kernel config file is specified by sources.kernel.kconfigFile. If this path is empty, the Assistant will run a search in well-known standard locations.

You can install the Assistant using the CLI by creating a namespace for it. This involves installing the OpenShift CLI (oc) and logging in as a user with cluster-admin privileges.

To verify a successful Operator deployment, run the command $oc get pods. A successful deployment will show a Running status.

If the sources.kernel.kconfigFile path is empty, the Assistant will run a search in the well-known standard locations. This ensures that the Assistant can find the necessary kernel config file.

You can install the NFD Operator in the namespace you created by creating the following objects: sources.kernel.kconfigFile, sources.kernel.kconfigFile is the path of the kernel config file.

Customized Labels

Customized labels are a game-changer for organizations with unique hardware or software requirements.

You can configure NFD to detect custom features, allowing you to extend its functionality to support your own hardware or software features. This is particularly useful for organizations with unique hardware or software requirements.

White Dry-erase Board With Red Diagram
Credit: pexels.com, White Dry-erase Board With Red Diagram

NFD can be limited to detect specific devices by specifying sources with the configuration parameter deviceClassWhitelist. This parameter is used to specify a list of device classes that NFD should scan for on a node.

Custom sources can be created to map discovered Class, Vendor, and Device codes to simpler human-readable labels. This can be stored in the values.yaml file specified during installation.

A values.yaml file can be used to limit NFD to just PCI and USB devices, for example. Each PCI and USB source can have a defined deviceClassWhitelist to filter on.

By applying custom labels, the node labels start to make more sense, making it easier for the Kubernetes Scheduler to determine which nodes have specific devices.

Sources.Pci.DeviceClassWhitelist

Sources.Pci.DeviceClassWhitelist is a list of PCI device class IDs for which to publish a label.

You can specify it as a main class only, like "03", or a full class-subclass combination, like "0300". The former implies that all subclasses are accepted.

By default, the list includes ["03", "0b40", "12"].

You can customize the format of the labels with deviceLabelFields.

Klog.AddDirHeader

Crop faceless man inputting cables in system unit
Credit: pexels.com, Crop faceless man inputting cables in system unit

You can customize the behavior of your assistant by configuring the core.klog.addDirHeader setting. This setting determines whether to add the file directory to the header of log messages.

Setting core.klog.addDirHeader to true will add the file directory to the header of log messages. The default value for this setting is false.

You can configure core.klog.addDirHeader at runtime, giving you flexibility in how you manage your assistant's logs.

Sources.Pci.DeviceLabelFields

You can configure NFD to detect custom features by specifying sources with the configuration parameter deviceClassWhitelist. This allows you to extend its functionality to support your own hardware or software features.

By limiting which nodes devices that NFD should detect, you can make the node labels more meaningful and useful for scheduling pods. For example, you can specify a list of device classes that NFD should scan for on a node.

Custom source can be created to map discovered Class, Vendor and Device codes to simpler human readable labels. This can be stored neatly within your values.yaml file specified during installation.

Credit: youtube.com, Tutorial: Taming Unbounded Resources with Node Feature Discovery - Sponsored by SUSE

The deviceClassWhitelist parameter can be used to filter on specific device classes, such as PCI and USB devices. This is useful for organizations with unique hardware or software requirements.

By defining custom device labels, you can make it easier for the Kubernetes Scheduler to determine which nodes have specific features, such as Intel vs AMD GPUs or Coral TPU USB devices.

Klog Settings

You can dynamically adjust most logger configuration options at run-time.

The core.klog.addDirHeader option adds the file directory to the header of log messages if set to true, and defaults to false.

Command line flags take precedence over any corresponding config file options.

To log to standard error as well as files, the core.klog.logToStderr option should be set to true, which is the default.

If you want to write log files in a specific directory, you can set the core.klog.logDir option, but this option is not run-time configurable.

The core.klog.logFileMaxSize option defines the maximum size a log file can grow to, and defaults to 1800 megabytes.

Credit: youtube.com, Filter Unwanted Labels in Node Feature Discovery

If you want to log to standard error instead of files, the core.klog.logToStderr option should be set to true, which is the default.

You can avoid header prefixes in log messages by setting the core.klog.skipHeaders option to true.

The core.klog.v option sets the log level verbosity, and defaults to 0.

Logs at or above the threshold set by the core.klog.threshold option go to stderr, and defaults to 2.

Sources

The sources section in a node feature discovery configuration is where the magic happens. It's where you specify which feature sources to enable, and it's a crucial part of getting accurate and relevant feature labels.

The core.sources value specifies the list of enabled feature sources, and a special value all enables all feature sources. This value is overridden by the deprecated --sources command line flag, if specified.

You can also specify a list of feature sources to enable, which can be overridden by the --sources flag.

Credit: youtube.com, Taming Unbounded Resources with Node Feature Discovery - Mark Abrams, SUSE

The sources section contains feature source specific configuration parameters, including the ability to prevent publishing certain cpuid features. By default, cpuid features like BMI1, BMI2, and MMX are published, but you can override this with the sources.cpu.cpuid.attributeBlacklist option.

You can also configure the kernel config file and kernel configuration options to publish as feature labels. The default kernel configuration options are NO_HZ, NO_HZ_IDLE, NO_HZ_FULL, and PREEMPT.

In addition to kernel config options, you can also specify PCI device class IDs for which to publish a label. The default PCI device class IDs are 03, 0b40, and 12.

You can also configure the format of the labels with deviceLabelFields, which can include fields like class, vendor, device, subsystem_vendor, and subsystem_device.

Similarly, you can specify USB device class IDs for which to publish a feature label. The default USB device class IDs are 0e, ef, fe, and ff.

You can also configure the format of the labels with deviceLabelFields, which can include fields like class, vendor, and device.

The custom feature source allows you to create user-specific labels with a list of rules to process.

Topology and Scheduling

Credit: youtube.com, Taming Unbounded Resources with Node Feature Discovery - Mark Abrams, SUSE

The Node Feature Discovery (NFD) Topology Updater is a daemon that examines allocated resources on a worker node, accounting for resources available to be allocated to new pods on a per-zone basis, where a zone can be a Non-Uniform Memory Access (NUMA) node.

One instance of the NFD Topology Updater runs on each node of the cluster, and it communicates information to nfd-master, which creates a NodeResourceTopology custom resource (CR) corresponding to all of the worker nodes in the cluster.

To enable the Topology Updater workers in NFD, set the topologyupdater variable to true in the NodeFeatureDiscovery CR, as described in the section Using the Node Feature Discovery Operator.

Topology Updater

Topology Updater is a daemon that examines allocated resources on a worker node, accounting for resources available to be allocated to new pods on a per-zone basis. This daemon communicates the information to nfd-master, which creates a NodeResourceTopology custom resource (CR) corresponding to all worker nodes in the cluster.

Credit: youtube.com, Let There Be Topology-Awareness in Kube-Scheduler! - DevConf.CZ 2021

One instance of the Topology Updater runs on each node of the cluster. To enable Topology Updater workers in NFD, set the topologyupdater variable to true in the NodeFeatureDiscovery CR.

The Topology Updater is responsible for creating custom resource instances corresponding to the node resource hardware topology. This includes information such as CPU, memory, and storage.

Here's a list of the types of information that the Topology Updater creates custom resource instances for:

  • NodeResourceTopology
  • CPU
  • Memory
  • Storage

The Topology Updater plays a crucial role in optimizing resource allocation and scheduling in a cluster. By examining allocated resources and communicating with nfd-master, it helps ensure that resources are utilized efficiently and effectively.

Sources.Cpu.Cpuid.AttributeBlacklist

Sources.Cpu.Cpuid.AttributeBlacklist is a configuration option that allows you to prevent publishing CPU features listed in this option.

By default, the feature list includes a wide range of CPU features such as BMI1, BMI2, CLMUL, CMOV, CX16, ERMS, F16C, HTT, LZCNT, MMX, MMXEXT, NX, POPCNT, RDRAND, RDSEED, RDTSCP, SGX, SGXLC, SSE, SSE2, SSE3, SSE4.1, SSE4.2, and SSSE3.

You can customize this feature list to suit your needs, but keep in mind that it can be overridden by Sources.Cpu.Cpuid.AttributeWhitelist if specified.

Sources.Usb.Device Class Whitelist

Credit: youtube.com, USB in SOC designs: Device Classes

Sources.usb.deviceClassWhitelist is a crucial setting that determines which USB device class IDs can publish a feature label. The default list includes four classes: "0e", "ef", "fe", and "ff".

These classes are the most common types of USB devices, and including them in the whitelist ensures that their feature labels are published. This is important for topology and scheduling, as it allows for more accurate and efficient resource allocation.

The format of the labels can be further configured with deviceLabelFields. This flexibility is useful for customizing the appearance and content of the feature labels.

By default, the whitelist includes only these four classes, but you can modify it to include or exclude other classes as needed.

Klog.SkipLogHeaders

Klog.SkipLogHeaders is a setting that can be used to customize how log files are opened. If you set core.klog.skipLogHeaders to true, you can avoid headers when opening log files. This can be a useful feature if you're working with large log files and want to skip unnecessary information.

Scheduling Pods Based on Dynamic Labels

Credit: youtube.com, Kubernetes Scheduling | Topology Spread Constraints

In Kubernetes, you can schedule pods based on dynamic labels created by Node Feature Discovery (NFD). This is a powerful feature that allows you to make informed decisions about where to deploy your pods.

Node Feature Discovery can create dynamic labels that can be used in a nodeSelector within a pod's deployment configuration. This is exactly what was done in the Frigate NVR example, where a nodeSelector was defined using the dynamic labels created by NFD.

This allows for more efficient resource allocation and utilization, as pods can be scheduled on nodes that meet specific criteria. By leveraging dynamic labels, you can create a more flexible and scalable deployment strategy.

Conclusion

In conclusion, topology and scheduling in Kubernetes are crucial for efficient workload placement.

With the right topology in place, you can ensure that your workloads are running on the most suitable nodes, which can lead to significant performance improvements.

Node Feature Discovery is a valuable tool for any Kubernetes user looking to optimize their workloads.

By automatically detecting hardware and software features, NFD can help you make the most of your resources and ensure that your workloads are running as efficiently as possible.

So if you haven’t already, give NFD a try and see how it can help you navigate the waters of Kubernetes.

Conclusion

Credit: youtube.com, Node Feature Discovery - Red Hat OpenShift Demo

Node Feature Discovery is a game-changer for Kubernetes users.

It can automatically detect hardware and software features, which is a huge time-saver and ensures workloads are running efficiently.

This tool can help you make the most of your resources, so you can get the most out of your infrastructure.

NFD can help you navigate the complex world of Kubernetes, and optimize your workloads in the process.

Give NFD a try, and see how it can help you streamline your Kubernetes setup.

Keith Marchal

Senior Writer

Keith Marchal is a passionate writer who has been sharing his thoughts and experiences on his personal blog for more than a decade. He is known for his engaging storytelling style and insightful commentary on a wide range of topics, including travel, food, technology, and culture. With a keen eye for detail and a deep appreciation for the power of words, Keith's writing has captivated readers all around the world.

Love What You Read? Stay Updated!

Join our community for insights, tips, and more.