abacusai.project
Module Contents
Classes
A project is a container which holds datasets, models and deployments |
- class abacusai.project.Project(client, projectId=None, name=None, useCase=None, problemType=None, createdAt=None, featureGroupsEnabled=None)
Bases:
abacusai.return_class.AbstractApiClass
A project is a container which holds datasets, models and deployments
- Parameters
client (ApiClient) – An authenticated API Client instance
projectId (str) – The ID of the project.
name (str) – The name of the project.
useCase (str) – The Use Case associated with the project.
problemType (str) –
createdAt (str) – The date and time when the project was created.
featureGroupsEnabled (bool) – Project uses feature groups instead of datasets.
- __repr__()
Return repr(self).
- to_dict()
Get a dict representation of the parameters in this class
- Returns
The dict value representation of the class parameters
- Return type
- refresh()
Calls describe and refreshes the current object’s fields
- Returns
The current object
- Return type
- describe()
Returns a description of a project.
- list_datasets()
Retrieves all dataset(s) attached to a specified project. This API returns all attributes of each dataset, such as its name, type, and ID.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array representing all of the datasets attached to the project.
- Return type
- get_schema(dataset_id)
[DEPRECATED] Returns a schema given a specific dataset in a project. The schema of the dataset consists of the columns in the dataset, the data type of the column, and the column’s column mapping.
- rename(name)
This method renames a project after it is created.
- Parameters
name (str) – The new name for the project.
- delete()
Deletes a specified project from your organization.
This method deletes the project, trained models and deployments in the specified project. The datasets attached to the specified project remain available for use with other projects in the organization.
This method will not delete a project that contains active deployments. Be sure to stop all active deployments before you use the delete option.
Note: All projects, models, and deployments cannot be recovered once they are deleted.
- Parameters
project_id (str) – The unique ID of the project to delete.
- set_feature_mapping(feature_group_id, feature_name, feature_mapping, nested_column_name=None)
Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.
- Parameters
- Returns
A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.
- Return type
- validate(feature_group_ids=None)
Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.
- Parameters
feature_group_ids (list) – The feature group IDS to validate
- Returns
The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.
- Return type
- set_column_data_type(dataset_id, column, data_type)
Set a dataset’s column type.
- Parameters
dataset_id (str) – The unique ID associated with the dataset.
column (str) – The name of the column.
data_type (str) – The type of the data in the column. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.
- Returns
A list of objects that describes the resulting dataset’s schema after the column’s dataType is set.
- Return type
- set_column_mapping(dataset_id, column, column_mapping)
Set a dataset’s column mapping. If the column mapping is single-use and already set in another column in this dataset, this call will first remove the other column’s mapping and move it to this column.
- Parameters
- Returns
A list of columns that describes the resulting dataset’s schema after the column’s columnMapping is set.
- Return type
- remove_column_mapping(dataset_id, column)
Removes a column mapping from a column in the dataset. Returns a list of all columns with their mappings once the change is made.
- list_feature_groups(filter_feature_group_use=None)
List all the feature groups associated with a project
- Parameters
filter_feature_group_use (str) – The feature group use filter, when given as an argument, only allows feature groups in this project to be returned if they are of the given use. DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT, BATCH_PREDICTION_OUTPUT
- Returns
All the Feature Groups in the Organization
- Return type
- list_feature_group_templates(limit=100, start_after_id=None)
List feature group templates for feature groups associated with the project.
- Parameters
- Returns
All the feature groups in the organization, optionally limited by the feature group that created the template(s).
- Return type
- get_training_config_options(feature_group_ids=None, for_retrain=False)
Retrieves the full description of the model training configuration options available for the specified project.
The configuration options available are determined by the use case associated with the specified project. Refer to the (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for more information on use cases and use case specific configuration options.
- Parameters
- Returns
An array of options that can be specified when training a model in this project.
- Return type
- train_model(name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, cpu_size=None, memory=None)
Trains a model for the specified project.
Use this method to train a model in this project. This method supports user-specified training configurations defined in the getTrainingConfigOptions method.
- Parameters
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
training_config (dict) – The training config key/value pairs used to train this model.
feature_group_ids (list) – List of feature group ids provided by the user to train the model on.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.
custom_algorithms (list) – List of user-defined algorithms to train.
custom_algorithms_only (bool) – Whether only run custom algorithms.
custom_algorithm_configs (dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json
cpu_size (str) – Size of the cpu for the user-defined algorithms during train.
memory (int) – Memory (in GB) for the user-defined algorithms during train.
- Returns
The new model which is being trained.
- Return type
- create_model_from_python(function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None)
Initializes a new Model from user provided Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.
train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.
initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
training_config (dict) – Training configuration
exclusive_run (bool) – Decides if this model will be run exclusively OR along with other Abacus.ai algorithms
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
The new model, which has not been trained.
- Return type
- create_model_from_zip(train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, name=None, cpu_size=None, memory=None, package_requirements=None)
Initializes a new Model from a user provided zip file containing Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
None
- Return type
- create_model_from_git(application_connector_id, branch_name, train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, python_root=None, name=None, cpu_size=None, memory=None, package_requirements=None)
Initializes a new Model from a user provided git repository containing Python code. If a list of input feature groups are supplied,
we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.
This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything
- Parameters
application_connector_id (str) – The unique ID associated with the git application connector.
branch_name (str) – Name of the branch in the git repository to be used for training.
train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.
train_module_name (str) – Full path of the module that contains the train function from the root of the zip.
predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.
training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).
predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.
predict_many_function_name (str) –
python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.
name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.
cpu_size (str) – Size of the cpu for the model training function
memory (int) – Memory (in GB) for the model training function
package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency
- Returns
None
- Return type
- list_models()
Retrieves the list of models in the specified project.
- get_custom_train_function_info(feature_group_names_for_training=None, training_data_parameter_name_override=None)
Returns the information about how to call the custom train function.
- Parameters
- Returns
Information about how to call the customer provided train function.
- Return type
- create_model_monitor(training_feature_group_id, prediction_feature_group_id, name=None, refresh_schedule=None, target_value=None, feature_mappings=None, model_id=None, training_feature_mappings=None)
Runs a model monitor for the specified project.
- Parameters
training_feature_group_id (str) – The unique ID of the training data feature group
prediction_feature_group_id (str) – The unique ID of the prediction data feature group
name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.
refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor
target_value (str) – A target positive value for the label to compute bias for
feature_mappings (dict) – A json map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.
model_id (str) – The Unique ID of the Model
training_feature_mappings (dict) – ” A json map to override features for training_fature_group, where keys are column names and the values are feature data use types.
- Returns
The new model monitor that was created.
- Return type
- list_model_monitors()
Retrieves the list of models monitors in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of model monitors.
- Return type
- create_deployment_token()
Creates a deployment token for the specified project.
Deployment tokens are used to authenticate requests to the prediction APIs and are scoped on the project level.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
The deployment token.
- Return type
- list_deployments()
Retrieves a list of all deployments in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployments.
- Return type
- list_deployment_tokens()
Retrieves a list of all deployment tokens in the specified project.
- Parameters
project_id (str) – The unique ID associated with the project.
- Returns
An array of deployment tokens.
- Return type
- list_refresh_policies(dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], prediction_metric_ids=[])
List the refresh policies for the organization
- Parameters
dataset_ids (list) – Comma separated list of Dataset IDs
model_ids (list) – Comma separated list of Model IDs
deployment_ids (list) – Comma separated list of Deployment IDs
batch_prediction_ids (list) – Comma separated list of Batch Prediction IDs
model_monitor_ids (list) – Comma separated list of Model Monitor IDs.
prediction_metric_ids (list) – Comma separated list of Prediction Metric IDs,
- Returns
List of all refresh policies in the organization
- Return type
- list_batch_predictions()
Retrieves a list for the batch predictions in the project
- Parameters
project_id (str) – The unique identifier of the project
- Returns
A list of batch prediction jobs.
- Return type
- attach_dataset(dataset_id, project_dataset_type)
Attaches dataset to the project.
- Parameters
dataset_id (unique string identifier) – A unique identifier for the dataset.
project_dataset_type (enum of type string) – The unique use case specific dataset type that might be required or recommended for the specific use case.
- Returns
The schema of the attached dataset.
- Return type
- remove_dataset(dataset_id)
Removes dataset from the project.
- Parameters
dataset_id (unique string identifier) – A unique identifier for the dataset.
- create_model_from_functions(train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False)
Creates a model using python.
- Parameters
train_function (callable) – The train function is passed.
predict_function (callable) – The prediction function is passed.
training_input_tables (list, optional) – The input tables to be used for training the model. Defaults to None.
predict_many_function (callable) – Prediction function for batch input
cpu_size (str) – Size of the cpu for the feature group function
memory (int) – Memory (in GB) for the feature group function
initialize_function (callable) –
training_config (dict) –
exclusive_run (bool) –
- Returns
The model object.
- Return type