How a Feature Dictionary Can Uplift the Modern ML Architecture
Feature engineering is the most critical component in a machine learning pipeline. The quality of data-features determines the quality of the ML output. Garbage in, garbage out. Also, the strategy used to maintain data-features affects the plug-ability and flexibility of the ML platform. Small scale ML architectures usually have independent pipelines for each ML model. On the other hand, modern enterprise ML architectures abstract out ETL and feature engineering jobs from ML jobs.
In enterprise ML architectures, it’s wise to maintain the outputs of the feature jobs in a sharable format without encoding. These features can be later cherrypicked, encoded, and fed into an ML model that needs it. This approach has several advantages.
- In big enterprises, there will be multiple ML models that are solving problems within the same business realm. Which means they will have a lot of common features. If these common features don’t have to be regenerated each time for each model, it saves time, cost and hardware.
- Plugging in new models and experimenting will be easier as data-features are readily available.
- The data-scientist can focus more on the modelling parts and less on the engineering aspects.