This page explains what a feature store is and what benefits it provides, and the specific advantages of Databricks Feature Store.
The Databricks Feature Store library is available only on Databricks Runtime for Machine Learning and is accessible through Databricks notebooks and workflows.
Databricks Runtime 9.1 LTS ML or above.
At this time, Databricks Feature Store does not support writing to a Unity Catalog metastore. In Unity Catalog-enabled workspaces, you can write feature tables to the default Hive metastore.
A feature store is a centralized repository that enables data scientists to find and share features and also ensures that the same code used to compute the feature values is used for model training and inference.
Machine learning uses existing data to build a model to predict future outcomes. In almost all cases, the raw data requires preprocessing and transformation before it can be used to build a model. This process is called featurization or feature engineering, and the outputs of this process are called features - the building blocks of the model.
Developing features is complex and time-consuming. An additional complication is that for machine learning, the featurization calculations need to be done for model training, and then again when the model is used to make predictions. These implementations may not be done by the same team or using the same code environment, which can lead to delays and errors. Also, different teams in an organization will often have similar feature needs but may not be aware of work that other teams have done. A feature store is designed to address these problems.
Databricks Feature Store is fully integrated with other components of Databricks.
Lineage. When you create a feature table with Databricks Feature Store, the data sources used to create the feature table are saved and accessible. For each feature in a feature table, you can also access the models, notebooks, jobs, and endpoints that use the feature.
Discoverability. The Databricks Feature Store UI, accessible from the Databricks workspace, lets you browse and search for existing features.
Integration with model scoring and serving. When you use features from Databricks Feature Store to train a model, the model is packaged with feature metadata. When you use the model for batch scoring or online inference, it automatically retrieves features from Feature Store. The caller does not need to know about them or include logic to look up or join features to score new data. This makes model deployment and updates much easier.