abacusai

Submodules

Package Contents

Classes

ApiClient

Abacus.AI API Client

ClientOptions

Options for configuring the ApiClient

ReadOnlyClient

Abacus.AI Read Only API Client. Only contains GET methods

PredictionClient

Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods

Attributes

__version__

class abacusai.ApiClient(api_key=None, server=None, client_options=None, skip_version_check=False)

Bases: ReadOnlyClient

Abacus.AI API Client

Parameters
  • api_key (str) – The api key to use as authentication to the server

  • server (str) – The base server url to use to send API requets to

  • client_options (ClientOptions) – Optional API client configurations

  • skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client

create_dataset_from_pandas(feature_group_table_name, df, name=None)

[Deprecated] Creates a Dataset from a pandas dataframe

Parameters
  • feature_group_table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

  • name (str) – The name to give to the dataset

Returns

The dataset object created

Return type

Dataset

create_dataset_version_from_pandas(table_name_or_id, df)

[Deprecated] Updates an existing dataset from a pandas dataframe

Parameters
  • table_name_or_id (str) – The table name of the feature group or the ID of the dataset to update

  • df (pandas.DataFrame) – The dataframe to upload

Returns

The dataset updated

Return type

Dataset

create_feature_group_from_pandas_df(table_name, df)

Create a Feature Group from a local Pandas DataFrame.

Parameters
  • table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

Return type

abacusai.feature_group.FeatureGroup

update_feature_group_from_pandas_df(table_name, df)

Updates a DATASET Feature Group from a local Pandas DataFrame.

Parameters
  • table_name (str) – The table name to assign to the feature group created by this call

  • df (pandas.DataFrame) – The dataframe to upload

Return type

abacusai.feature_group.FeatureGroup

create_feature_group_from_spark_df(table_name, df)

Create a Feature Group from a local Spark DataFrame.

Parameters
  • df (pyspark.sql.DataFrame) – The dataframe to upload

  • table_name (str) – The table name to assign to the feature group created by this call

Return type

abacusai.feature_group.FeatureGroup

update_feature_group_from_spark_df(table_name, df)

Create a Feature Group from a local Spark DataFrame.

Parameters
  • df (pyspark.sql.DataFrame) – The dataframe to upload

  • table_name (str) – The table name to assign to the feature group created by this call

  • should_wait_for_upload (bool) – Wait for dataframe to upload before returning. Some FeatureGroup methods, like materialization, may not work until upload is complete.

  • timeout (int, optional) – If waiting for upload, time out after this limit.

Return type

abacusai.feature_group.FeatureGroup

create_spark_df_from_feature_group_version(session, feature_group_version)

Create a Spark Dataframe in the provided Spark Session context, for a materialized Abacus Feature Group Version.

Parameters
  • session (pyspark.sql.SparkSession) – Spark session

  • feature_group_version (str) – Feature group version to load from

Returns

pyspark.sql.DataFrame

create_model_from_functions(project_id, train_function, predict_function=None, training_input_tables=None, predict_many_function=None, initialize_function=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False)

Creates a model from a python function

Parameters
  • project_id (str) – The project to create the model in

  • train_function (callable) – The training fucntion callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_input_tables (list) – The input table names of the feature groups to pass to the train function

  • cpu_size (str) – Size of the cpu for the training function

  • memory (int) – Memory (in GB) for the training function

  • training_config (dict) –

  • exclusive_run (bool) –

create_feature_group_from_python_function(function, table_name, input_tables, cpu_size=None, memory=None)

Creates a feature group from a python function

Parameters
  • function (callable) – The function callable for the feature group

  • table_name (str) – The table name to give the feature group

  • input_tables (list) – The input table names of the feature groups as input to the feature group function

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

create_algorithm_from_function(name, problem_type, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=False, project_id=None)

Create a new algorithm, or update existing algorithm if the name already exists

Parameters
  • name (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • problem_type (Enum string) – The type of the problem this algorithm will work on

  • train_function (callable) – The training fucntion callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (string) – The train config parameter name in the train function

  • config_options (Dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

  • project_id (Unique String Identifier) – The unique version ID of the project

update_algorithm_from_function(algorithm, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function=None, predict_function=None, predict_many_function=None, initialize_function=None, config_options=None, is_default_enabled=None)

Create a new algorithm, or update existing algorithm if the name already exists

Parameters
  • algorithm (String) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • train_function (callable) – The training fucntion callable to serialize and upload

  • predict_function (callable) – The predict function callable to serialize and upload

  • predict_many_function (callable) – The predict many function callable to serialize and upload

  • initialize_function (callable) – The initialize function callable to serialize and upload

  • training_data_parameter_names_mapping (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (string) – The train config parameter name in the train function

  • config_options (Dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

get_train_function_input(project_id, training_table_names=None, training_data_parameter_name_override=None, training_config_parameter_name_override=None, training_config=None)

Get the input data for the train function to test locally.

Parameters
  • project_id (String) – The id of the project

  • training_table_names (List) – A list of feature group tables used for training

  • training_data_parameter_name_override (Dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name_override (String) – The train config parameter name in the train function

  • training_config (Dict) – A dictionary for training parameters for the algorithm

train_model_with_algorithms(project_id, model_name, user_defined_algorithms, training_table_names, cpu_size='SMALL', memory=3, user_defined_algorithms_only=False, training_config=None, user_defined_algorithm_configs=None)

Train a model with provided user-defined algorithms.

Parameters
  • project_id (String) – The id of the project

  • model_name (String) – The name of the model to train

  • user_defined_algorithms (List) – A list of user-defined algorithm names

  • training_table_names (List) – A list of feature group tables used for training

  • cpu_size (Enum) – How much cpu is needed for the user-defined algorithms during training

  • memory (Int) – How much memory in GB is needed for the user-defined algorithms during training

  • user_defined_algorithms_only (Boolean) – Whether only train with user-defined algorithms, or also include Abacus.AI algorithms

  • training_config (Dict) – A dictionary for model training parameters

  • user_defined_algorithm_configs (Dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json

add_user_to_organization(email)

Invites a user to your organization. This method will send the specified email address an invitation link to join your organization.

Parameters

email (str) – The email address to invite to your Organization.

create_organization_group(group_name, permissions, default_group=False)

Creates a new Organization Group.

Parameters
  • group_name (str) – The name of the group

  • permissions (list) – The list of permissions to initialize the group with

  • default_group (bool) – If true, this group will replace the current default group

Returns

Information about the created Organization Group

Return type

OrganizationGroup

add_organization_group_permission(organization_group_id, permission)

Adds a permission to the specified Organization Group

Parameters
  • organization_group_id (str) – The ID of the Organization Group

  • permission (str) – The permission to add to the Organization Group

remove_organization_group_permission(organization_group_id, permission)

Removes a permission from the specified Organization Group

Parameters
  • organization_group_id (str) – The ID of the Organization Group

  • permission (str) – The permission to remove from the Organization Group

delete_organization_group(organization_group_id)

Deletes the specified Organization Group from the organization.

Parameters

organization_group_id (str) – The ID of the Organization Group

add_user_to_organization_group(organization_group_id, email)

Adds a user to the specified Organization Group

Parameters
  • organization_group_id (str) – The ID of the Organization Group

  • email (str) – The email of the user that is added to the group

remove_user_from_organization_group(organization_group_id, email)

Removes a user from an Organization Group

Parameters
  • organization_group_id (str) – The ID of the Organization Group

  • email (str) – The email of the user to remove

set_default_organization_group(organization_group_id)

Sets the default Organization Group that all new users that join an organization are automatically added to

Parameters

organization_group_id (str) – The ID of the Organization Group

delete_api_key(api_key_id)

Delete a specified API Key. You can use the “listApiKeys” method to find the list of all API Key IDs.

Parameters

api_key_id (str) – The ID of the API key to delete.

remove_user_from_organization(email)

Removes the specified user from the Organization. You can remove yourself, Otherwise you must be an Organization Administrator to use this method to remove other users from the organization.

Parameters

email (str) – The email address of the user to remove from the Organization.

create_project(name, use_case)

Creates a project with your specified project name and use case. Creating a project creates a container for all of the datasets and the models that are associated with a particular problem/project that you would like to work on. For example, if you want to create a model to detect fraud, you have to first create a project, upload datasets, create feature groups, and then create one or more models to get predictions for your use case.

Parameters
  • name (str) – The project’s name

  • use_case (str) – The use case that the project solves. You can refer to our (guide on use cases)[https://api.abacus.ai/app/help/useCases] for further details of each use case. The following enums are currently available for you to choose from: LANGUAGE_DETECTION, NLP_SENTIMENT, NLP_QA, NLP_SEARCH, NLP_SENTENCE_BOUNDARY_DETECTION, NLP_CLASSIFICATION, NLP_SUMMARIZATION, NLP_DOCUMENT_VISUALIZATION, EMBEDDINGS_ONLY, MODEL_WITH_EMBEDDINGS, TORCH_MODEL_WITH_EMBEDDINGS, PYTHON_MODEL, NOTEBOOK_PYTHON_MODEL, DOCKER_MODEL, DOCKER_MODEL_WITH_EMBEDDINGS, CUSTOMER_CHURN, ENERGY, FINANCIAL_METRICS, CUMULATIVE_FORECASTING, FRAUD_ACCOUNT, FRAUD_THREAT, FRAUD_TRANSACTIONS, OPERATIONS_CLOUD, CLOUD_SPEND, TIMESERIES_ANOMALY_DETECTION, OPERATIONS_MAINTENANCE, OPERATIONS_INCIDENT, PERS_PROMOTIONS, PREDICTING, FEATURE_STORE, RETAIL, SALES_FORECASTING, SALES_SCORING, FEED_RECOMMEND, USER_RANKINGS, NAMED_ENTITY_RECOGNITION, USER_ITEM_AFFINITY, USER_RECOMMENDATIONS, USER_RELATED, VISION_SEGMENTATION, VISION, FEATURE_DRIFT, SCHEDULING, GENERIC_FORECASTING.

Returns

This object represents the newly created project. For details refer to

Return type

Project

rename_project(project_id, name)

This method renames a project after it is created.

Parameters
  • project_id (str) – The unique ID for the project.

  • name (str) – The new name for the project.

delete_project(project_id)

Deletes a specified project from your organization.

This method deletes the project, trained models and deployments in the specified project. The datasets attached to the specified project remain available for use with other projects in the organization.

This method will not delete a project that contains active deployments. Be sure to stop all active deployments before you use the delete option.

Note: All projects, models, and deployments cannot be recovered once they are deleted.

Parameters

project_id (str) – The unique ID of the project to delete.

add_feature_group_to_project(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE', feature_group_use=None)

Adds a feature group to a project,

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

  • feature_group_use (str) – The user assigned feature group use which allows for organizing project feature groups DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT

remove_feature_group_from_project(feature_group_id, project_id)

Removes a feature group from a project.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

set_feature_group_type(feature_group_id, project_id, feature_group_type='CUSTOM_TABLE')

Update the feature group type in a project. The feature group must already be added to the project.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • feature_group_type (str) – The feature group type to set the feature group as. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

use_feature_group_for_training(feature_group_id, project_id, use_for_training=True)

Use the feature group for model training input

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

  • use_for_training (bool) – Boolean variable to include or exclude a feature group from a model’s training. Only one feature group per type can be used for training

set_feature_mapping(project_id, feature_group_id, feature_name, feature_mapping, nested_column_name=None)

Set a column’s feature mapping. If the column mapping is single-use and already set in another column in this feature group, this call will first remove the other column’s mapping and move it to this column.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature.

  • feature_mapping (str) – The mapping of the feature in the feature group.

  • nested_column_name (str) – The name of the nested column.

Returns

A list of objects that describes the resulting feature group’s schema after the feature’s featureMapping is set.

Return type

Feature

set_column_data_type(project_id, dataset_id, column, data_type)

Set a dataset’s column type.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • data_type (str) – The type of the data in the column. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.

Returns

A list of objects that describes the resulting dataset’s schema after the column’s dataType is set.

Return type

Schema

set_column_mapping(project_id, dataset_id, column, column_mapping)

Set a dataset’s column mapping. If the column mapping is single-use and already set in another column in this dataset, this call will first remove the other column’s mapping and move it to this column.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • column_mapping (str) – The mapping of the column in the dataset. See a list of columns mapping enums here.

Returns

A list of columns that describes the resulting dataset’s schema after the column’s columnMapping is set.

Return type

Schema

remove_column_mapping(project_id, dataset_id, column)

Removes a column mapping from a column in the dataset. Returns a list of all columns with their mappings once the change is made.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

Returns

A list of objects that describes the resulting dataset’s schema after the column’s columnMapping is set.

Return type

Schema

create_feature_group(table_name, sql, description=None)

Creates a new feature group from a SQL statement.

Parameters
  • table_name (str) – The unique name to be given to the feature group.

  • sql (str) – Input SQL statement for forming the feature group.

  • description (str) – The description about the feature group.

Returns

The created feature group

Return type

FeatureGroup

create_feature_group_from_template(table_name, feature_group_template_id, template_bindings=None, should_attach_feature_group_to_template=True, description=None)

Creates a new feature group from a SQL statement.

Parameters
  • table_name (str) – The unique name to be given to the feature group.

  • feature_group_template_id (str) – template_info.template_sqlThe unique ID associated with the template that will be used to create this feature group.

  • template_bindings (list) – Variable bindings that override the template’s variable values.

  • should_attach_feature_group_to_template (bool) – Set to False to create a feature group but not leave it attached the template that created it.

  • description (str) – A user-friendly description of this feature group.

Returns

The created feature group

Return type

FeatureGroup

create_feature_group_from_function(table_name, function_source_code, function_name, input_feature_groups=[], description=None, cpu_size=None, memory=None, package_requirements=None)

Creates a new feature in a Feature Group from user provided code. Code language currently supported is Python.

If a list of input feature groups are supplied, we will provide as arguments to the function DataFrame’s (pandas in the case of Python) with the materialized feature groups for those input feature groups.

This method expects function_source_code to be a valid language source file which contains a function named `function_name. This function needs return a DataFrame when it is executed and this DataFrame will be used as the materialized version of this feature group table.

Parameters
  • table_name (str) – The unique name to be given to the feature group.

  • function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • description (str) – The description for this feature group.

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The created feature group

Return type

FeatureGroup

create_feature_group_from_zip(table_name, function_name, module_name, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None)

Creates a new feature group from a ZIP file.

Parameters
  • table_name (str) – The unique name to be given to the feature group.

  • function_name (str) – Name of the function found in the module that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • description (str) – The description about the feature group.

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The Upload to upload the zip file to

Return type

Upload

create_feature_group_from_git(application_connector_id, branch_name, table_name, function_name, module_name, python_root=None, input_feature_groups=None, description=None, cpu_size=None, memory=None, package_requirements=None)

Creates a new feature group from a ZIP file.

Parameters
  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • table_name (str) – The unique name to be given to the feature group.

  • function_name (str) – Name of the function found in the module that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • description (str) – The description about the feature group.

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The created feature group

Return type

FeatureGroup

create_sampling_feature_group(feature_group_id, table_name, sampling_config, description=None)

Creates a new feature group defined as a sample of rows from another feature group.

For efficiency, sampling is approximate unless otherwise specified. (E.g. the number of rows may vary slightly from what was requested).

Parameters
  • feature_group_id (str) – The unique ID associated with the pre-existing feature group that will be sampled by this new feature group. I.e. the input for sampling.

  • table_name (str) – The unique name to be given to this sampling feature group.

  • sampling_config (dict) – JSON object (aka map) defining the sampling method and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns

The created feature group.

Return type

FeatureGroup

create_merge_feature_group(source_feature_group_id, table_name, merge_config, description=None)

Creates a new feature group defined as the union of other feature group versions.

Parameters
  • source_feature_group_id (str) – ID corresponding to the dataset feature group that will have its versions merged into this feature group.

  • table_name (str) – The unique name to be given to this merge feature group.

  • merge_config (dict) – JSON object (aka map) defining the merging method and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns

The created feature group.

Return type

FeatureGroup

create_transform_feature_group(source_feature_group_id, table_name, transform_config, description=None)

Creates a new feature group defined as a pre-defined transform on another feature group.

Parameters
  • source_feature_group_id (str) – ID corresponding to the feature group that will have the transformation applied.

  • table_name (str) – The unique name to be given to this transform feature group.

  • transform_config (dict) – JSON object (aka map) defining the transform and its parameters.

  • description (str) – A human-readable description of this feature group.

Returns

The created feature group.

Return type

FeatureGroup

create_snapshot_feature_group(feature_group_version, table_name)

Creates a Snapshot Feature Group corresponding to a specific feature group version.

Parameters
  • feature_group_version (str) – The unique ID associated with the feature group version being snapshotted.

  • table_name (str) – The name for the newly created Snapshot Feature Group table.

Returns

Feature Group corresponding to the newly created Snapshot.

Return type

FeatureGroup

set_feature_group_sampling_config(feature_group_id, sampling_config)

Set a FeatureGroup’s sampling to the config values provided, so that the rows the FeatureGroup returns will be a sample of those it would otherwise have returned.

Currently, sampling is only for Sampling FeatureGroups, so this API only allows calling on that kind of FeatureGroup.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • sampling_config (dict) – A json object string specifying the sampling method and parameters specific to that sampling method. Empty sampling_config means no sampling.

Returns

The updated feature group.

Return type

FeatureGroup

set_feature_group_merge_config(feature_group_id, merge_config)

Set a MergeFeatureGroup’s merge config to the values provided, so that the feature group only returns a bounded range of an incremental dataset.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • merge_config (dict) – A json object string specifying the merge rule. An empty mergeConfig will default to only including the latest Dataset Version.

Return type

None

set_feature_group_transform_config(feature_group_id, transform_config)

Set a TransformFeatureGroup’s transform config to the values provided.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • transform_config (dict) – A json object string specifying the pre-defined transformation.

Return type

None

set_feature_group_schema(feature_group_id, schema)

Creates a new schema and points the feature group to the new feature group schema id.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • schema (list) – An array of json objects with ‘name’ and ‘dataType’ properties.

create_feature(feature_group_id, name, select_expression)

Creates a new feature in a Feature Group from a SQL select statement

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to add

  • select_expression (str) – SQL select expression to create the feature

Returns

A feature group object with the newly added feature.

Return type

FeatureGroup

add_feature_group_tag(feature_group_id, tag)

Adds a tag to the feature group

Parameters
  • feature_group_id (str) – The feature group

  • tag (str) – The tag to add to the feature group

remove_feature_group_tag(feature_group_id, tag)

Removes a tag from the feature group

Parameters
  • feature_group_id (str) – The feature group

  • tag (str) – The tag to add to the feature group

add_feature_tag(feature_group_id, feature, tag)
Parameters
  • feature_group_id (str) –

  • feature (str) –

  • tag (str) –

remove_feature_tag(feature_group_id, feature, tag)
Parameters
  • feature_group_id (str) –

  • feature (str) –

  • tag (str) –

create_nested_feature(feature_group_id, nested_feature_name, table_name, using_clause, where_clause=None, order_clause=None)

Creates a new nested feature in a feature group from a SQL statements to create a new nested feature.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature.

  • table_name (str) – The table name of the feature group to nest

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

update_nested_feature(feature_group_id, nested_feature_name, table_name=None, using_clause=None, where_clause=None, order_clause=None, new_nested_feature_name=None)

Updates a previously existing nested feature in a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature to be updated.

  • table_name (str) – The name of the table.

  • using_clause (str) – The SQL join column or logic to join the nested table with the parent

  • where_clause (str) – A SQL where statement to filter the nested rows

  • order_clause (str) – A SQL clause to order the nested rows

  • new_nested_feature_name (str) – New name for the nested feature.

Returns

A feature group object with the updated nested feature.

Return type

FeatureGroup

delete_nested_feature(feature_group_id, nested_feature_name)

Delete a nested feature.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • nested_feature_name (str) – The name of the feature to be updated.

Returns

A feature group object without the deleted nested feature.

Return type

FeatureGroup

create_point_in_time_feature(feature_group_id, feature_name, history_table_name, aggregation_keys, timestamp_key, historical_timestamp_key, expression, lookback_window_seconds=None, lookback_window_lag_seconds=0, lookback_count=None, lookback_until_position=0)

Creates a new point in time feature in a feature group using another historical feature group, window spec and aggregate expression.

We use the aggregation keys, and either the lookbackWindowSeconds or the lookbackCount values to perform the window aggregation for every row in the current feature group. If the window is specified in seconds, then all rows in the history table which match the aggregation keys and with historicalTimeFeature >= lookbackStartCount and < the value of the current rows timeFeature are considered. An option lookbackWindowLagSeconds (+ve or -ve) can be used to offset the current value of the timeFeature. If this value is negative, we will look at the future rows in the history table, so care must be taken to make sure that these rows are available in the online context when we are performing a lookup on this feature group. If window is specified in counts, then we order the historical table rows aligning by time and consider rows from the window where the rank order is >= lookbackCount and includes the row just prior to the current one. The lag is specified in term of positions using lookbackUntilPosition.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature to create

  • history_table_name (str) – The table name of the history table.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

update_point_in_time_feature(feature_group_id, feature_name, history_table_name=None, aggregation_keys=None, timestamp_key=None, historical_timestamp_key=None, expression=None, lookback_window_seconds=None, lookback_window_lag_seconds=None, lookback_count=None, lookback_until_position=None, new_feature_name=None)

Updates an existing point in time feature in a feature group. See createPointInTimeFeature for detailed semantics.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_name (str) – The name of the feature.

  • history_table_name (str) – The table name of the history table. If not specified, we use the current table to do a self join.

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • timestamp_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • historical_timestamp_key (str) – Name of feature which contains the historical timestamp.

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

  • lookback_window_seconds (float) – If window is specified in terms of time, number of seconds in the past from the current time for start of the window.

  • lookback_window_lag_seconds (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

  • new_feature_name (str) – New name for the point in time feature.

Returns

A feature group object with the newly added nested feature.

Return type

FeatureGroup

create_point_in_time_group(feature_group_id, group_name, window_key, aggregation_keys, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=0, lookback_count=None, lookback_until_position=0)

Create point in time group

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group to add the point in time group to.

  • group_name (str) – The name of the point in time group

  • window_key (str) – Name of feature to use for ordering the rows on the source table

  • aggregation_keys (list) – List of keys to perform on the source table for the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • lookback_window (float) – Number of seconds in the past from the current time for start of the window. If 0, the lookback will include all rows.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns

The feature group after the point in time group has been created

Return type

FeatureGroup

update_point_in_time_group(feature_group_id, group_name, window_key=None, aggregation_keys=None, history_table_name=None, history_window_key=None, history_aggregation_keys=None, lookback_window=None, lookback_window_lag=None, lookback_count=None, lookback_until_position=None)

Update point in time group

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • window_key (str) – Name of feature which contains the timestamp value for the point in time feature

  • aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation.

  • history_table_name (str) – The table to use for aggregating, if not provided, the source table will be used

  • history_window_key (str) – Name of feature to use for ordering the rows on the history table. If not provided, the windowKey from the source table will be used

  • history_aggregation_keys (list) – List of keys to use for join the historical table and performing the window aggregation. If not provided, the aggregationKeys from the source table will be used. Must be the same length and order as the source table’s aggregationKeys

  • lookback_window (float) – Number of seconds in the past from the current time for start of the window.

  • lookback_window_lag (float) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window. If it is negative, we are looking at the “future” rows in the history table.

  • lookback_count (int) – If window is specified in terms of count, the start position of the window (0 is the current row)

  • lookback_until_position (int) – Optional lag to offset the closest point for the window. If it is positive, we delay the start of window by that many rows. If it is negative, we are looking at those many “future” rows in the history table.

Returns

The feature group after the update has been applied

Return type

FeatureGroup

delete_point_in_time_group(feature_group_id, group_name)

Delete point in time group

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

Returns

The feature group after the point in time group has been deleted

Return type

FeatureGroup

create_point_in_time_group_feature(feature_group_id, group_name, name, expression)

Create point in time group feature

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • name (str) – The name of the feature to add to the point in time group

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

Returns

The feature group after the update has been applied

Return type

FeatureGroup

update_point_in_time_group_feature(feature_group_id, group_name, name, expression)

Update a feature’s SQL expression in a point in time group

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • group_name (str) – The name of the point in time group

  • name (str) – The name of the feature to add to the point in time group

  • expression (str) – SQL Aggregate expression which can convert a sequence of rows into a scalar value.

Returns

The feature group after the update has been applied

Return type

FeatureGroup

set_feature_type(feature_group_id, feature, feature_type)

Set a feature’s type in a feature group/. Specify the feature group ID, feature name and feature type, and the method will return the new column with the resulting changes reflected.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature (str) – The name of the feature.

  • feature_type (str) – The machine learning type of the data in the feature. CATEGORICAL, CATEGORICAL_LIST, NUMERICAL, TIMESTAMP, TEXT, EMAIL, LABEL_LIST, JSON, OBJECT_REFERENCE Refer to the (guide on feature types)[https://api.abacus.ai/app/help/class/FeatureType] for more information. Note: Some FeatureMappings will restrict the options or explicitly set the FeatureType.

Returns

The feature group after the data_type is applied

Return type

Schema

invalidate_streaming_feature_group_data(feature_group_id, invalid_before_timestamp)

Invalidates all streaming data with timestamp before invalidBeforeTimestamp

Parameters
  • feature_group_id (str) – The Streaming feature group to record data to

  • invalid_before_timestamp (int) – The unix timestamp, any data which has a timestamp before this time will be deleted

concatenate_feature_group_data(feature_group_id, source_feature_group_id, merge_type='UNION', replace_until_timestamp=None, skip_materialize=False)

Concatenates data from one feature group to another. Feature groups can be merged if their schema’s are compatible and they have the special updateTimestampKey column and if set, the primaryKey column. The second operand in the concatenate operation will be appended to the first operand (merge target).

Parameters
  • feature_group_id (str) – The destination feature group.

  • source_feature_group_id (str) – The feature group to concatenate with the destination feature group.

  • merge_type (str) – UNION or INTERSECTION

  • replace_until_timestamp (int) – The unix timestamp to specify the point till which we will replace data from the source feature group.

  • skip_materialize (bool) – If true, will not materialize the concatenated feature group

remove_concatenation_config(feature_group_id)

Removes the concatenation config on a destination feature group.

Parameters

feature_group_id (str) – Removes the concatenation configuration on a destination feature group

set_feature_group_indexing_config(feature_group_id, primary_key=None, update_timestamp_key=None, lookup_keys=None)

Sets various attributes of the feature group used for deployment lookups and streaming updates.

Parameters
  • feature_group_id (str) – The feature group

  • primary_key (str) – Name of feature which defines the primary key of the feature group.

  • update_timestamp_key (str) – Name of feature which defines the update timestamp of the feature group - used in concatenation and primary key deduplication.

  • lookup_keys (list) – List of feature names which can be used in the lookup api to restrict the computation to a set of dataset rows. These feature names have to correspond to underlying dataset columns.

update_feature_group(feature_group_id, description=None)

Modifies an existing feature group

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • description (str) – The description about the feature group.

Returns

The updated feature group object.

Return type

FeatureGroup

detach_feature_group_from_template(feature_group_id)

Update a feature group to detach it from a template.

Currently, this converts the feature group into a SQL feature group rather than a template feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

The updated feature group

Return type

FeatureGroup

update_feature_group_template_bindings(feature_group_id, template_bindings=None)

Update the feature group template bindings for a template feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • template_bindings (list) – Values in these bindings override values set in the template.

Returns

The updated feature group

Return type

FeatureGroup

update_feature_group_sql_definition(feature_group_id, sql)

Updates the SQL statement for a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • sql (str) – Input SQL statement for the feature group.

Returns

The updated feature group

Return type

FeatureGroup

update_dataset_feature_group_feature_expression(feature_group_id, feature_expression)

Updates the SQL feature expression for a dataset feature group’s custom features

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • feature_expression (str) – Input SQL statement for the feature group.

Returns

The updated feature group

Return type

FeatureGroup

update_feature_group_function_definition(feature_group_id, function_source_code=None, function_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)

Updates the function definition for a feature group created using createFeatureGroupFromFunction

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • function_source_code (str) – Contents of a valid source code file in a supported Feature Group specification language (currently only Python). The source code should contain a function called function_name. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The updated feature group

Return type

FeatureGroup

update_feature_group_zip(feature_group_id, function_name, module_name, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)

Updates the zip for a feature group created using createFeatureGroupFromZip

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The Upload to upload the zip file to

Return type

Upload

update_feature_group_git(feature_group_id, application_connector_id=None, branch_name=None, python_root=None, function_name=None, module_name=None, input_feature_groups=None, cpu_size=None, memory=None, package_requirements=None)

Updates a feature group created using createFeatureGroupFromGit

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • function_name (str) – Name of the function found in the source code that will be executed (on the optional inputs) to materialize this feature group.

  • module_name (str) – Path to the file with the feature group function.

  • input_feature_groups (list) – List of feature groups that are supplied to the function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the feature group function

  • memory (int) – Memory (in GB) for the feature group function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The updated FeatureGroup

Return type

FeatureGroup

update_feature(feature_group_id, name, select_expression=None, new_name=None)

Modifies an existing feature in a feature group. A user needs to specify the name and feature group ID and either a SQL statement or new name tp update the feature.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to be updated.

  • select_expression (str) – Input SQL statement for modifying the feature.

  • new_name (str) – The new name of the feature.

Returns

The updated feature group object.

Return type

FeatureGroup

export_feature_group_version_to_file_connector(feature_group_version, location, export_file_format, overwrite=False)

Export Feature group to File Connector.

Parameters
  • feature_group_version (str) – The Feature Group instance to export.

  • location (str) – Cloud file location to export to.

  • export_file_format (str) – File format to export to.

  • overwrite (bool) – If true and a file exists at this location, this process will overwrite the file.

Returns

The FeatureGroupExport instance

Return type

FeatureGroupExport

export_feature_group_version_to_database_connector(feature_group_version, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Export Feature group to Database Connector.

Parameters
  • feature_group_version (str) – The Feature Group instance id to export.

  • database_connector_id (str) – Database connector to export to.

  • object_name (str) – The database object to write to

  • write_mode (str) – Either INSERT or UPSERT

  • database_feature_mapping (dict) – A key/value pair JSON Object of “database connector column” -> “feature name” pairs.

  • id_column (str) – Required if mode is UPSERT. Indicates which database column should be used as the lookup key for UPSERT

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting

Returns

The FeatureGroupExport instance

Return type

FeatureGroupExport

export_feature_group_version_to_console(feature_group_version, export_file_format)

Export Feature group to console.

Parameters
  • feature_group_version (str) – The Feature Group instance to export.

  • export_file_format (str) – File format to export to.

Returns

The FeatureGroupExport instance

Return type

FeatureGroupExport

set_feature_group_modifier_lock(feature_group_id, locked=True)

To lock a feature group to prevent it from being modified.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • locked (bool) – True or False to disable or enable feature group modification.

add_user_to_feature_group_modifiers(feature_group_id, email)

Adds user to a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • email (str) – The email address of the user to be removed.

add_organization_group_to_feature_group_modifiers(feature_group_id, organization_group_id)

Add Organization to a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • organization_group_id (str) – The unique ID associated with the organization group.

remove_user_from_feature_group_modifiers(feature_group_id, email)

Removes user from a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • email (str) – The email address of the user to be removed.

remove_organization_group_from_feature_group_modifiers(feature_group_id, organization_group_id)

Removes Organization from a feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • organization_group_id (str) – The unique ID associated with the organization group.

delete_feature(feature_group_id, name)

Removes an existing feature from a feature group. A user needs to specify the name of the feature to be deleted and the feature group ID.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • name (str) – The name of the feature to be deleted.

Returns

The updated feature group object.

Return type

FeatureGroup

delete_feature_group(feature_group_id)

Removes an existing feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

create_feature_group_version(feature_group_id, variable_bindings=None)

Creates a snapshot for a specified feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • variable_bindings (dict) – (JSON Object): JSON object (aka map) defining variable bindings that override parent feature group values.

Returns

A feature group version.

Return type

FeatureGroupVersion

create_feature_group_template(feature_group_id, name, template_sql, template_variables, description=None, template_bindings=None, should_attach_feature_group_to_template=False)

Create a feature group template.

Parameters
  • feature_group_id (str) – Identifier of the feature group this template was created from.

  • name (str) – The user-friendly of for this feature group template.

  • template_sql (str) – The template sql that will be resolved by applying values from the template variables to generate sql for a feature group.

  • template_variables (list) – The template variables for resolving the template.

  • description (str) – A description of this feature group template

  • template_bindings (list) – If the feature group will be attached to the newly created template, set these variable bindings on that feature group.

  • should_attach_feature_group_to_template (bool) – Set to True to convert the feature group to a template feature group and attach it to the newly created template.

Returns

The created feature group template

Return type

FeatureGroupTemplate

delete_feature_group_template(feature_group_template_id)

Delete an existing feature group template.

Parameters

feature_group_template_id (str) – The unique ID associated with the feature group template.

update_feature_group_template(feature_group_template_id, template_sql=None, template_variables=None)

Update a feature group template.

Parameters
  • feature_group_template_id (str) – Identifier of the feature group template to update.

  • template_sql (str) – If provided, the new value to use for the template sql.

  • template_variables (list) – If provided, the new value to use for the template variables.

Returns

The updated feature group template.

Return type

FeatureGroupTemplate

preview_feature_group_template_resolution(feature_group_template_id=None, template_bindings=None, template_sql=None, template_variables=None, should_validate=True)

Resolve template sql using template variables and template bindings.

Parameters
  • feature_group_template_id (str) – If specified, use this template, otherwise assume an empty template.

  • template_bindings (list) – Values that overide the template variable values specified by the template.

  • template_sql (str) – If specified, use this as the template sql instead of the feature group template’s sql.

  • template_variables (list) – Template variables to use. If a template is provided, this overrides the template’s template variables.

  • should_validate (bool) –

Returns

None

Return type

ResolvedFeatureGroupTemplate

cancel_upload(upload_id)

Cancels an upload

Parameters

upload_id (str) – The Upload ID

upload_part(upload_id, part_number, part_data)

Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.

Parameters
  • upload_id (str) – A unique identifier for this upload

  • part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.

  • part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.

Returns

The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.

Return type

UploadPart

mark_upload_complete(upload_id)

Marks an upload process as complete.

Parameters

upload_id (str) – A unique identifier for this upload

Returns

The upload object associated with the upload process for the full file. The details of the object are described below:

Return type

Upload

create_dataset_from_file_connector(name, table_name, location, file_format=None, refresh_schedule=None, csv_delimiter=None, filename_column=None, start_prefix=None, until_prefix=None, location_date_format=None, date_format_lookback_days=None, incremental=False)

Creates a dataset from a file located in a cloud storage, such as Amazon AWS S3, using the specified dataset name and location.

Parameters
  • name (str) – The name for the dataset.

  • table_name (str) – Organization-unique table name or the name of the feature group table to create using the source table.

  • location (str) – The URI location format of the dataset source. The URI location format needs to be specified to match the location_date_format when location_date_format is specified. Ex. Location = s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/* when The URI location format needs to include both the start_prefix and until_prefix when both are specified. Ex. Location s3://bucket1/dir1/* includes both s3://bucket1/dir1/dir2/event_date=2021-08-02/* and s3://bucket1/dir1/dir2/event_date=2021-08-08/*

  • file_format (str) – The file format of the dataset.

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

  • filename_column (str) – Adds a new column to the dataset with the external URI path.

  • start_prefix (str) – The start prefix (inclusive) for a range based search on a cloud storage location URI.

  • until_prefix (str) – The end prefix (exclusive) for a range based search on a cloud storage location URI.

  • location_date_format (str) – The date format in which the data is partitioned in the cloud storage location. E.g., if the data is partitioned as s3://bucket1/dir1/dir2/event_date=YYYY-MM-DD/dir4/filename.parquet, then the location_date_format is YYYY-MM-DD This format needs to be consistent across all files within the specified location.

  • date_format_lookback_days (int) – The number of days to look back from the current day for import locations that are date partitioned. E.g., import date, 2021-06-04, with date_format_lookback_days = 3 will retrieve data for all the dates in the range [2021-06-02, 2021-06-04].

  • incremental (bool) – Signifies if the dataset is an incremental dataset.

Returns

The dataset created.

Return type

Dataset

create_dataset_version_from_file_connector(dataset_id, location=None, file_format=None, csv_delimiter=None)

Creates a new version of the specified dataset.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • location (str) – A new external URI to import the dataset from. If not specified, the last location will be used.

  • file_format (str) – The fileFormat to be used. If not specified, the service will try to detect the file format.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

Returns

The new Dataset Version created.

Return type

DatasetVersion

create_dataset_from_database_connector(name, table_name, database_connector_id, object_name=None, columns=None, query_arguments=None, refresh_schedule=None, sql_query=None, incremental=False, timestamp_column=None)

Creates a dataset from a Database Connector

Parameters
  • name (str) – The name for the dataset to be attached.

  • table_name (str) – Organization-unique table name

  • database_connector_id (str) – The Database Connector to import the dataset from

  • object_name (str) – If applicable, the name/id of the object in the service to query.

  • columns (str) – The columns to query from the external service object.

  • query_arguments (str) – Additional query arguments to filter the data

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

  • sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, timestampColumn, and queryArguments

  • incremental (bool) – Signifies if the dataset is an incremental dataset.

  • timestamp_column (str) – If dataset is incremental, this is the column name of the required column in the dataset. This column must contain timestamps in descending order which are used to determine the increments of the incremental dataset.

Returns

The created dataset.

Return type

Dataset

create_dataset_from_application_connector(name, table_name, application_connector_id, object_id=None, start_timestamp=None, end_timestamp=None, refresh_schedule=None)

Creates a dataset from an Application Connector

Parameters
  • name (str) – The name for the dataset

  • table_name (str) – Organization-unique table name

  • application_connector_id (str) – The unique application connector to download data from

  • object_id (str) – If applicable, the id of the object in the service to query.

  • start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.

  • end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

Returns

The created dataset.

Return type

Dataset

create_dataset_version_from_database_connector(dataset_id, object_name=None, columns=None, query_arguments=None, sql_query=None)

Creates a new version of the specified dataset

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • object_name (str) – If applicable, the name/id of the object in the service to query. If not specified, the last name will be used.

  • columns (str) – The columns to query from the external service object. If not specified, the last columns will be used.

  • query_arguments (str) – Additional query arguments to filter the data. If not specified, the last arguments will be used.

  • sql_query (str) – The full SQL query to use when fetching data. If present, this parameter will override objectName, columns, and queryArguments

Returns

The new Dataset Version created.

Return type

DatasetVersion

create_dataset_version_from_application_connector(dataset_id, object_id=None, start_timestamp=None, end_timestamp=None)

Creates a new version of the specified dataset

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • object_id (str) – If applicable, the id of the object in the service to query. If not specified, the last name will be used.

  • start_timestamp (int) – The Unix timestamp of the start of the period that will be queried.

  • end_timestamp (int) – The Unix timestamp of the end of the period that will be queried.

Returns

The new Dataset Version created.

Return type

DatasetVersion

create_dataset_from_upload(name, table_name, file_format=None, csv_delimiter=None)

Creates a dataset and return an upload Id that can be used to upload a file.

Parameters
  • name (str) – The name for the dataset.

  • table_name (str) – Organization-unique table name for this dataset.

  • file_format (str) – The file format of the dataset.

  • csv_delimiter (str) – If the file format is CSV, use a specific csv delimiter.

Returns

A refernce to be used when uploading file parts.

Return type

Upload

create_dataset_version_from_upload(dataset_id, file_format=None)

Creates a new version of the specified dataset using a local file upload.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • file_format (str) – The file_format to be used. If not specified, the service will try to detect the file format.

Returns

A token to be used when uploading file parts.

Return type

Upload

create_streaming_dataset(name, table_name, project_id=None, dataset_type=None)

Creates a streaming dataset. Use a streaming dataset if your dataset is receiving information from multiple sources over an extended period of time.

Parameters
  • name (str) – The name for the dataset.

  • table_name (str) – The feature group table name to create for this dataset

  • project_id (str) – The project to create the streaming dataset for.

  • dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.

Returns

The streaming dataset created.

Return type

Dataset

snapshot_streaming_data(dataset_id)

Snapshots the current data in the streaming dataset for training.

Parameters

dataset_id (str) – The unique ID associated with the dataset.

Returns

The new Dataset Version created.

Return type

DatasetVersion

set_dataset_column_data_type(dataset_id, column, data_type)

Set a column’s type in a specified dataset.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • column (str) – The name of the column.

  • data_type (str) – The type of the data in the column. INTEGER, FLOAT, STRING, DATE, DATETIME, BOOLEAN, LIST, STRUCT Refer to the (guide on data types)[https://api.abacus.ai/app/help/class/DataType] for more information. Note: Some ColumnMappings will restrict the options or explicity set the DataType.

Returns

The dataset and schema after the data_type has been set

Return type

Dataset

create_dataset_from_streaming_connector(name, table_name, streaming_connector_id, streaming_args=None, refresh_schedule=None)

Creates a dataset from a Streaming Connector

Parameters
  • name (str) – The name for the dataset to be attached.

  • table_name (str) – Organization-unique table name

  • streaming_connector_id (str) – The Streaming Connector to import the dataset from

  • streaming_args (dict) – Dict of arguments to read data from the streaming connector

  • refresh_schedule (str) – The Cron time string format that describes a schedule to retrieve the latest version of the imported dataset. The time is specified in UTC.

Returns

The created dataset.

Return type

Dataset

set_streaming_retention_policy(dataset_id, retention_hours=None, retention_row_count=None)

Sets the streaming retention policy

Parameters
  • dataset_id (str) – The Streaming dataset

  • retention_hours (int) – The number of hours to retain streamed data in memory

  • retention_row_count (int) – The number of rows to retain streamed data in memory

rename_database_connector(database_connector_id, name)

Renames a Database Connector

Parameters
  • database_connector_id (str) – The unique identifier for the database connector.

  • name (str) – The new name for the Database Connector

rename_application_connector(application_connector_id, name)

Renames an Application Connector

Parameters
  • application_connector_id (str) – The unique identifier for the application connector.

  • name (str) – A new name for the application connector

verify_database_connector(database_connector_id)

Checks to see if Abacus.AI can access the database.

Parameters

database_connector_id (str) – The unique identifier for the database connector.

verify_file_connector(bucket)

Checks to see if Abacus.AI can access the bucket.

Parameters

bucket (str) – The bucket to test.

Returns

The Result of the verification.

Return type

FileConnectorVerification

delete_database_connector(database_connector_id)

Delete a database connector.

Parameters

database_connector_id (str) – The unique identifier for the database connector.

delete_application_connector(application_connector_id)

Delete a application connector.

Parameters

application_connector_id (str) – The unique identifier for the application connector.

delete_file_connector(bucket)

Removes a connected service from the specified organization.

Parameters

bucket (str) – The fully qualified URI of the bucket to remove.

verify_application_connector(application_connector_id)

Checks to see if Abacus.AI can access the Application.

Parameters

application_connector_id (str) – The unique identifier for the application connector.

set_azure_blob_connection_string(bucket, connection_string)

Authenticates specified Azure Blob Storage bucket using an authenticated Connection String.

Parameters
  • bucket (str) – The fully qualified Azure Blob Storage Bucket URI

  • connection_string (str) – The Connection String {product_name} should use to authenticate when accessing this bucket

Returns

An object with the roleArn and verification status for the specified bucket.

Return type

FileConnectorVerification

verify_streaming_connector(streaming_connector_id)

Checks to see if Abacus.AI can access the streaming connector.

Parameters

streaming_connector_id (str) – The unique identifier for the streaming connector.

rename_streaming_connector(streaming_connector_id, name)

Renames a Streaming Connector

Parameters
  • streaming_connector_id (str) – The unique identifier for the streaming connector.

  • name (str) – A new name for the streaming connector

delete_streaming_connector(streaming_connector_id)

Delete a streaming connector.

Parameters

streaming_connector_id (str) – The unique identifier for the streaming connector.

create_streaming_token()

Creates a streaming token for the specified project. Streaming tokens are used to authenticate requests to append data to streaming datasets.

Returns

The streaming token.

Return type

StreamingAuthToken

delete_streaming_token(streaming_token)

Deletes the specified streaming token.

Parameters

streaming_token (str) – The streaming token to delete.

attach_dataset_to_project(dataset_id, project_id, dataset_type)

[DEPRECATED] Attaches the dataset to the project.

Use this method to attach a dataset that is already in the organization to another project. The dataset type is required to let the AI engine know what type of schema should be used.

Parameters
  • dataset_id (str) – The dataset to attach.

  • project_id (str) – The project to attach the dataset to.

  • dataset_type (str) – The dataset has to be a type that is associated with the use case of your project. Please see (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for the datasetTypes that are supported per use case.

Returns

An array of columns descriptions.

Return type

Schema

remove_dataset_from_project(dataset_id, project_id)

[DEPRECATED] Removes a dataset from a project.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • project_id (str) – The unique ID associated with the project.

rename_dataset(dataset_id, name)

Rename a dataset.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • name (str) – The new name for the dataset.

delete_dataset(dataset_id)

Deletes the specified dataset from the organization.

The dataset cannot be deleted if it is currently attached to a project.

Parameters

dataset_id (str) – The dataset to delete.

train_model(project_id, name=None, training_config=None, feature_group_ids=None, refresh_schedule=None, custom_algorithms=None, custom_algorithms_only=False, custom_algorithm_configs=None, cpu_size=None, memory=None)

Trains a model for the specified project.

Use this method to train a model in this project. This method supports user-specified training configurations defined in the getTrainingConfigOptions method.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.

  • training_config (dict) – The training config key/value pairs used to train this model.

  • feature_group_ids (list) – List of feature group ids provided by the user to train the model on.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model.

  • custom_algorithms (list) – List of user-defined algorithms to train.

  • custom_algorithms_only (bool) – Whether only run custom algorithms.

  • custom_algorithm_configs (dict) – Configs for each user-defined algorithm, key is algorithm name, value is the config serialized to json

  • cpu_size (str) – Size of the cpu for the user-defined algorithms during train.

  • memory (int) – Memory (in GB) for the user-defined algorithms during train.

Returns

The new model which is being trained.

Return type

Model

create_model_from_python(project_id, function_source_code, train_function_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, name=None, cpu_size=None, memory=None, training_config=None, exclusive_run=False, package_requirements=None)

Initializes a new Model from user provided Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • project_id (str) – The unique ID associated with the project.

  • function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • training_config (dict) – Training configuration

  • exclusive_run (bool) – Decides if this model will be run exclusively OR along with other Abacus.ai algorithms

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The new model, which has not been trained.

Return type

Model

create_model_from_zip(project_id, train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, name=None, cpu_size=None, memory=None, package_requirements=None)

Initializes a new Model from a user provided zip file containing Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • project_id (str) – The unique ID associated with the project.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

None

Return type

Upload

create_model_from_git(project_id, application_connector_id, branch_name, train_function_name, train_module_name, predict_module_name, training_input_tables, predict_function_name=None, predict_many_function_name=None, python_root=None, name=None, cpu_size=None, memory=None, package_requirements=None)

Initializes a new Model from a user provided git repository containing Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • project_id (str) – The unique ID associated with the project.

  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) –

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • name (str) – The name you want your model to have. Defaults to “<Project Name> Model”.

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

None

Return type

Model

rename_model(model_id, name)

Renames a model

Parameters
  • model_id (str) – The ID of the model to rename

  • name (str) – The name to apply to the model

update_python_model(model_id, function_source_code=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)

Updates an existing python Model using user provided Python code. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects functionSourceCode to be a valid language source file which contains the functions named trainFunctionName and predictFunctionName. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • function_source_code (str) – Contents of a valid python source code file. The source code should contain the functions named trainFunctionName and predictFunctionName. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed to run batch predictions through model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The updated model

Return type

Model

update_python_model_zip(model_id, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None, package_requirements=None)

Updates an existing python Model using a provided zip file. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

  • package_requirements (dict) – Json with key value pairs corresponding to package: version for each dependency

Returns

The updated model

Return type

Upload

update_python_model_git(model_id, application_connector_id=None, branch_name=None, python_root=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, train_module_name=None, predict_module_name=None, training_input_tables=None, cpu_size=None, memory=None)

Updates an existing python Model using an existing git application connector. If a list of input feature groups are supplied,

we will provide as arguments to the train and predict functions with the materialized feature groups for those input feature groups.

This method expects trainModuleName and predictModuleName to be valid language source files which contains the functions named trainFunctionName and predictFunctionName, respectively. trainFunctionName returns the ModelVersion that is the result of training the model using trainFunctionName and predictFunctionName has no well defined return type, as it returns the prediction made by the predictFunctionName, which can be anything

Parameters
  • model_id (str) – The unique ID associated with the Python model to be changed.

  • application_connector_id (str) – The unique ID associated with the git application connector.

  • branch_name (str) – Name of the branch in the git repository to be used for training.

  • python_root (str) – Path from the top level of the git repository to the directory containing the Python source code. If not provided, the default is the root of the git repository.

  • train_function_name (str) – Name of the function found in train module that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the predict module that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the predict module that will be executed run batch predictions through model. It is not executed when this function is run.

  • train_module_name (str) – Full path of the module that contains the train function from the root of the zip.

  • predict_module_name (str) – Full path of the module that contains the predict function from the root of the zip.

  • training_input_tables (list) – List of feature groups that are supplied to the train function as parameters. Each of the parameters are materialized Dataframes (same type as the functions return value).

  • cpu_size (str) – Size of the cpu for the model training function

  • memory (int) – Memory (in GB) for the model training function

Returns

The updated model

Return type

Model

set_model_training_config(model_id, training_config)

Edits the default model training config

Parameters
  • model_id (str) – The unique ID of the model to update

  • training_config (dict) – The training config key/value pairs used to train this model.

Returns

The model object correspoding after the training config is applied

Return type

Model

set_model_prediction_params(model_id, prediction_config)

Sets the model prediction config for the model

Parameters
  • model_id (str) – The unique ID of the model to update

  • prediction_config (dict) – The prediction config for the model

Returns

The model object correspoding after the prediction config is applied

Return type

Model

retrain_model(model_id, deployment_ids=[], feature_group_ids=None, custom_algorithm_configs=None, cpu_size=None, memory=None)

Retrains the specified model. Gives you an option to choose the deployments you want the retraining to be deployed to.

Parameters
  • model_id (str) – The model to retrain.

  • deployment_ids (list) – List of deployments to automatically deploy to.

  • feature_group_ids (list) – List of feature group ids provided by the user to train the model on.

  • custom_algorithm_configs (dict) – The user-defined training configs for each custom algorithm.

  • cpu_size (str) – Size of the cpu for the user-defined algorithms during train.

  • memory (int) – Memory (in GB) for the user-defined algorithms during train.

Returns

The model that is being retrained.

Return type

Model

delete_model(model_id)

Deletes the specified model and all its versions. Models which are currently used in deployments cannot be deleted.

Parameters

model_id (str) – The ID of the model to delete.

delete_model_version(model_version)

Deletes the specified model version. Model Versions which are currently used in deployments cannot be deleted.

Parameters

model_version (str) – The ID of the model version to delete.

export_model_artifact_as_feature_group(model_version, table_name, artifact_type)

Exports metric artifact data for a model as a feature group.

Parameters
  • model_version (str) – The version of the model.

  • table_name (str) – The name of the feature group table to create.

  • artifact_type (str) – An EvalArtifact enum of which artifact to export.

Returns

The created feature group.

Return type

FeatureGroup

get_custom_train_function_info(project_id, feature_group_names_for_training=None, training_data_parameter_name_override=None)

Returns the information about how to call the custom train function.

Parameters
  • project_id (str) – The unique version ID of the project

  • feature_group_names_for_training (list) – A list of feature group table names that will be used for training

  • training_data_parameter_name_override (dict) – Override from feature group type to parameter name in train function.

Returns

Information about how to call the customer provided train function.

Return type

CustomTrainFunctionInfo

create_model_monitor(project_id, training_feature_group_id, prediction_feature_group_id, name=None, refresh_schedule=None, target_value=None, feature_mappings=None, model_id=None, training_feature_mappings=None)

Runs a model monitor for the specified project.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • training_feature_group_id (str) – The unique ID of the training data feature group

  • prediction_feature_group_id (str) – The unique ID of the prediction data feature group

  • name (str) – The name you want your model monitor to have. Defaults to “<Project Name> Model Monitor”.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically retrain the created model monitor

  • target_value (str) – A target positive value for the label to compute bias for

  • feature_mappings (dict) – A json map to override features for prediction_feature_group, where keys are column names and the values are feature data use types.

  • model_id (str) – The Unique ID of the Model

  • training_feature_mappings (dict) – ” A json map to override features for training_fature_group, where keys are column names and the values are feature data use types.

Returns

The new model monitor that was created.

Return type

ModelMonitor

rerun_model_monitor(model_monitor_id)

Reruns the specified model monitor.

Parameters

model_monitor_id (str) – The model monitor to rerun.

Returns

The model monitor that is being rerun.

Return type

ModelMonitor

rename_model_monitor(model_monitor_id, name)

Renames a model monitor

Parameters
  • model_monitor_id (str) – The ID of the model monitor to rename

  • name (str) – The name to apply to the model monitor

delete_model_monitor(model_monitor_id)

Deletes the specified model monitor and all its versions.

Parameters

model_monitor_id (str) – The ID of the model monitor to delete.

delete_model_monitor_version(model_monitor_version)

Deletes the specified model monitor version.

Parameters

model_monitor_version (str) – The ID of the model monitor version to delete.

create_deployment(name=None, model_id=None, feature_group_id=None, project_id=None, description=None, calls_per_second=None, auto_deploy=True, start=True, enable_batch_streaming_updates=False)

Creates a deployment with the specified name and description for the specified model or feature group.

A Deployment makes the trained model or feature group available for prediction requests.

Parameters
  • name (str) – The name of the deployment.

  • model_id (str) – The unique ID associated with the model.

  • feature_group_id (str) – The unique ID associated with a feature group.

  • project_id (str) – The unique ID associated with a project.

  • description (str) – The description for the deployment.

  • calls_per_second (int) – The number of calls per second the deployment could handle.

  • auto_deploy (bool) – Flag to enable the automatic deployment when a new Model Version finishes training.

  • start (bool) –

  • enable_batch_streaming_updates (bool) – Flag to enable marking the feature group deployment to have a background process cache streamed in rows for quicker lookup

Returns

The new model or feature group deployment.

Return type

Deployment

create_deployment_token(project_id)

Creates a deployment token for the specified project.

Deployment tokens are used to authenticate requests to the prediction APIs and are scoped on the project level.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

The deployment token.

Return type

DeploymentAuthToken

update_deployment(deployment_id, description=None)

Updates a deployment’s description.

Parameters
  • deployment_id (str) – The deployment to update.

  • description (str) – The new deployment description.

rename_deployment(deployment_id, name)

Updates a deployment’s name and/or description.

Parameters
  • deployment_id (str) – The deployment to update.

  • name (str) – The new deployment name.

set_auto_deployment(deployment_id, enable=None)

Enable/Disable auto deployment for the specified deployment.

When a model is scheduled to retrain, deployments with this enabled will be marked to automatically promote the new model version. After the newly trained model completes, a check on its metrics in comparison to the currently deployed model version will be performed. If the metrics are comparable or better, the newly trained model version is automatically promoted. If not, it will be marked as a failed model version promotion with an error indicating poor metrics performance.

Parameters
  • deployment_id (str) – The unique ID associated with the deployment

  • enable (bool) – Enable/disable the autoDeploy property of the Deployment.

set_deployment_model_version(deployment_id, model_version)

Promotes a Model Version to be served in the Deployment

Parameters
  • deployment_id (str) – The unique ID for the Deployment

  • model_version (str) – The unique ID for the Model Version

set_deployment_feature_group_version(deployment_id, feature_group_version)

Promotes a Feature Group Version to be served in the Deployment

Parameters
  • deployment_id (str) – The unique ID for the Deployment

  • feature_group_version (str) – The unique ID for the Feature Group Version

start_deployment(deployment_id)

Restarts the specified deployment that was previously suspended.

Parameters

deployment_id (str) – The unique ID associated with the deployment.

stop_deployment(deployment_id)

Stops the specified deployment.

Parameters

deployment_id (str) – The Deployment ID

delete_deployment(deployment_id)

Deletes the specified deployment. The deployment’s models will not be affected. Note that the deployments are not recoverable after they are deleted.

Parameters

deployment_id (str) – The ID of the deployment to delete.

delete_deployment_token(deployment_token)

Deletes the specified deployment token.

Parameters

deployment_token (str) – The deployment token to delete.

set_deployment_feature_group_export_file_connector_output(deployment_id, file_format=None, output_location=None)

Sets the export output for the Feature Group Deployment to be a file connector.

Parameters
  • deployment_id (str) – The deployment for which the export type is set

  • file_format (str) –

  • output_location (str) – the file connector (cloud) location of where to export

set_deployment_feature_group_export_database_connector_output(deployment_id, database_connector_id, object_name, write_mode, database_feature_mapping, id_column=None, additional_id_columns=None)

Sets the export output for the Feature Group Deployment to be a Database connector.

Parameters
  • deployment_id (str) – The deployment for which the export type is set

  • database_connector_id (str) – The database connector ID used

  • object_name (str) – The database connector’s object to write to

  • write_mode (str) – UPSERT or INSERT for writing to the database connector

  • database_feature_mapping (dict) – The column/feature pairs mapping the features to the database columns

  • id_column (str) – The id column to use as the upsert key

  • additional_id_columns (list) – For database connectors which support it, additional ID columns to use as a complex key for upserting

remove_deployment_feature_group_export_output(deployment_id)

Removes the export type that is set for the Feature Group Deployment

Parameters

deployment_id (str) – The deployment for which the export type is set

create_refresh_policy(name, cron, refresh_type, project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], prediction_metric_ids=[])

Creates a refresh policy with a particular cron pattern and refresh type.

A refresh policy allows for the scheduling of a particular set of actions at regular intervals. This can be useful for periodically updated data which needs to be re-imported into the project for re-training.

Parameters
  • name (str) – The name for the refresh policy

  • cron (str) – A cron-like string specifying the frequency of a refresh policy

  • refresh_type (str) – The Refresh Type is used to determine what is being refreshed, whether its a single dataset, or dataset and a model, or more.

  • project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created

  • dataset_ids (list) – Comma separated list of Dataset IDs

  • model_ids (list) – Comma separated list of Model IDs

  • deployment_ids (list) – Comma separated list of Deployment IDs

  • batch_prediction_ids (list) – Comma separated list of Batch Predictions

  • prediction_metric_ids (list) – Comma separated list of Prediction Metrics

Returns

The refresh policy created

Return type

RefreshPolicy

delete_refresh_policy(refresh_policy_id)

Delete a refresh policy

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

pause_refresh_policy(refresh_policy_id)

Pauses a refresh policy

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

resume_refresh_policy(refresh_policy_id)

Resumes a refresh policy

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

run_refresh_policy(refresh_policy_id)

Force a run of the refresh policy.

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

update_refresh_policy(refresh_policy_id, name=None, cron=None)

Update the name or cron string of a refresh policy

Parameters
  • refresh_policy_id (str) – The unique ID associated with this refresh policy

  • name (str) – Optional, specify to update the name of the refresh policy

  • cron (str) – Optional, specify to update the cron string describing the schedule from the refresh policy

Returns

The updated refresh policy

Return type

RefreshPolicy

lookup_features(deployment_token, deployment_id, query_data={})

Returns the feature group deployed in the feature store project.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict(deployment_token, deployment_id, query_data={})

Returns a prediction for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_multiple(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_from_datasets(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows

Return type

Dict

predict_lead(deployment_token, deployment_id, query_data)

Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).

Return type

Dict

predict_churn(deployment_token, deployment_id, query_data)

Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_takeover(deployment_token, deployment_id, query_data)

Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).

Return type

Dict

predict_fraud(deployment_token, deployment_id, query_data)

Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).

Return type

Dict

predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a classification prediction

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • threshold (float) – float value that is applied on the popular class label.

  • threshold_class (str) – label upon which the threshold is added (Binary labels only)

  • thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type

Dict

predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a prediction from a classification or regression model. Optionally, includes explanations.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type

Dict

get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)

Returns a list of anomalies from the training dataset

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.

  • histogram (bool) – If True, will return a histogram of the distribution of all points

Return type

io.BytesIO

is_anomaly(deployment_token, deployment_id, query_data=None)

Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – The input data for the prediction.

Return type

Dict

get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None)

Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.

  • future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.

  • num_predictions (int) – The number of timestamps to predict in the future.

  • prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).

Return type

Dict

get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)

Returns the k nearest neighbors for the provided embedding vector.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • vector (list) – Input vector to perform the k nearest neighbors with.

  • k (int) – Overrideable number of items to return

  • distance (str) – Specify the distance function to use when finding nearest neighbors

  • include_score (bool) – If True, will return the score alongside the resulting embedding value

Return type

Dict

get_multiple_k_nearest(deployment_token, deployment_id, queries)

Returns the k nearest neighbors for the queries provided

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters

get_labels(deployment_token, deployment_id, query_data, threshold=None)

Returns a list of scored labels from

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

  • threshold (None) – Deprecated

Return type

Dict

get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)

Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • exclude_item_ids (list) – [DEPRECATED]

  • score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

  • explore_fraction (float) – The fraction of recommendations that is to be new items.

Return type

Dict

get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type

Dict

get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type

Dict

Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

Return type

Dict

get_feature_group_rows(deployment_token, deployment_id, query_data)
Parameters
  • deployment_token (str) –

  • deployment_id (str) –

  • query_data (dict) –

get_search_results(deployment_token, deployment_id, query_data)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

Return type

Dict

get_sentiment(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_entailment(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_classification(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_summary(deployment_token, deployment_id, query_data)

Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.

Return type

Dict

predict_language(deployment_token, deployment_id, query_data)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (str) – # TODO

Return type

Dict

get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)

Get all positive assignments that match a query.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – specifies the set of assignments being requested.

  • forced_assignments (dict) – set of assignments to force and resolve before returning query results.

Return type

Dict

check_constraints(deployment_token, deployment_id, query_data)

Check for any constraints violated by the overrides.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – assignment overrides to the solution.

Return type

Dict

create_prediction_metric(feature_group_id, prediction_metric_config, project_id=None)

Create a prediction metric job description for the given prediction and actual-labels data.

Parameters
  • feature_group_id (str) – The feature group to use as input to the prediction metrics.

  • prediction_metric_config (dict) – Specification for prediction metric to run in this job.

  • project_id (str) – Project to use for the prediction metrics. Defaults to the project for the input feature_group, if the feature_group has exactly one project.

Returns

The Prediction Metric job description.

Return type

PredictionMetric

describe_prediction_metric(prediction_metric_id, should_include_latest_version_description=True)

Describe a Prediction Metric.

Parameters
  • prediction_metric_id (str) – The unique ID associated with the prediction metric.

  • should_include_latest_version_description (bool) – include the description of the latest prediction metric version

Returns

The prediction metric object.

Return type

PredictionMetric

delete_prediction_metric(prediction_metric_id)

Removes an existing PredictionMetric.

Parameters

prediction_metric_id (str) – The unique ID associated with the prediction metric.

run_prediction_metric(prediction_metric_id)

Creates a new prediction metrics job run for the given prediction metric job description, and starts that job.

Configures and starts the computations running to compute the prediciton metric.

Parameters

prediction_metric_id (str) – The prediction metric job description to apply for configuring a prediction metric job.

Returns

A prediction metric version. For more information, please refer to the details on the object (below).

Return type

PredictionMetricVersion

delete_prediction_metric_version(prediction_metric_version)

Removes an existing prediction metric version.

Parameters

prediction_metric_version (str) –

list_prediction_metric_versions(prediction_metric_id, limit=100, start_after_id=None)

List the prediction metric versions for a prediction metric.

Parameters
  • prediction_metric_id (str) – The prediction metric whose instances will be listed.

  • limit (int) – The the number of prediction metric instances to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all prediction metric versions till the specified prediction metric ID.

Returns

The prediction metric instances for this prediction metric.

Return type

PredictionMetricVersion

create_batch_prediction(deployment_id, table_name=None, name=None, global_prediction_args=None, explanations=False, output_format=None, output_location=None, database_connector_id=None, database_output_config=None, refresh_schedule=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)

Creates a batch prediction job description for the given deployment.

Parameters
  • deployment_id (str) – The unique identifier to a deployment.

  • table_name (str) – If specified, the name of the feature group table to write the results of the batch prediction. Can only be specified iff outputLocation and databaseConnectorId are not specified. If tableName is specified, the outputType will be enforced as CSV

  • name (str) – The name of batch prediction job.

  • global_prediction_args (dict) – Argument(s) to pass on every prediction call.

  • explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON)

  • output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.

  • database_connector_id (str) – The unique identifier of an Database Connection to write predictions to. Cannot be specified in conjunction with outputLocation.

  • database_output_config (dict) – A key-value pair of columns/values to write to the database connector. Only available if databaseConnectorId is specified.

  • refresh_schedule (str) – A cron-style string that describes a schedule in UTC to automatically run the batch prediction.

  • csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV

  • csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV

  • csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV

  • output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version

  • result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list

Returns

The batch prediction description.

Return type

BatchPrediction

start_batch_prediction(batch_prediction_id)

Creates a new batch prediction version job for a given batch prediction job description

Parameters

batch_prediction_id (str) – The unique identifier of the batch prediction to create a new version of

Returns

The batch prediction version started by this method call.

Return type

BatchPredictionVersion

update_batch_prediction(batch_prediction_id, deployment_id=None, global_prediction_args=None, explanations=None, output_format=None, csv_input_prefix=None, csv_prediction_prefix=None, csv_explanations_prefix=None, output_includes_metadata=None, result_input_columns=None)

Updates a batch prediction job description

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction.

  • deployment_id (str) – The unique identifier to a deployment.

  • global_prediction_args (dict) – Argument(s) to pass on every prediction call.

  • explanations (bool) – If true, will provide SHAP Explanations for each prediction, if supported by the use case.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).

  • csv_input_prefix (str) – A prefix to prepend to the input columns, only applies when output format is CSV

  • csv_prediction_prefix (str) – A prefix to prepend to the prediction columns, only applies when output format is CSV

  • csv_explanations_prefix (str) – A prefix to prepend to the explanation columns, only applies when output format is CSV

  • output_includes_metadata (bool) – If true, output will contain columns including prediction start time, batch prediction version, and model version

  • result_input_columns (list) – If present, will limit result files or feature groups to only include columns present in this list

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_file_connector_output(batch_prediction_id, output_format=None, output_location=None)

Updates the file connector output configuration of the batch prediction

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction.

  • output_format (str) – If specified, sets the format of the batch prediction output (CSV or JSON).

  • output_location (str) – If specified, the location to write the prediction results. Otherwise, results will be stored in Abacus.AI.

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_database_connector_output(batch_prediction_id, database_connector_id=None, database_output_config=None)

Updates the database connector output configuration of the batch prediction

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • database_connector_id (str) – The unique identifier of an Database Connection to write predictions to.

  • database_output_config (dict) – A key-value pair of columns/values to write to the database connector

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_feature_group_output(batch_prediction_id, table_name)

Creates a feature group and sets it to be the batch prediction output

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • table_name (str) – The name of the feature group table to create

Returns

The batch prediction after the output has been applied

Return type

BatchPrediction

set_batch_prediction_output_to_console(batch_prediction_id)

Sets the batch prediction output to the console, clearing both the file connector and database connector config

Parameters

batch_prediction_id (str) – The unique identifier of the batch prediction

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_dataset(batch_prediction_id, dataset_type, dataset_id=None)

[Deprecated] Sets the batch prediction input dataset. Only applicable for legacy dataset-based projects

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • dataset_type (str) – The dataset type to set

  • dataset_id (str) – The dataset to set

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_feature_group(batch_prediction_id, feature_group_type, feature_group_id=None)

Sets the batch prediction input feature group.

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • feature_group_type (str) – The feature group type to set. The feature group type of the feature group. The type is based on the use case under which the feature group is being created. For example, Catalog Attributes can be a feature group type under personalized recommendation use case.

  • feature_group_id (str) – The feature group to set as input to the batch prediction

Returns

The batch prediction description.

Return type

BatchPrediction

set_batch_prediction_dataset_remap(batch_prediction_id, dataset_id_remap)

For the purpose of this batch prediction, will swap out datasets in the input feature groups

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • dataset_id_remap (dict) – Key/value pairs of dataset_ids to replace during batch predictions

Returns

Batch Prediction object

Return type

BatchPrediction

delete_batch_prediction(batch_prediction_id)

Deletes a batch prediction

Parameters

batch_prediction_id (str) – The unique identifier of the batch prediction

add_user_item_interaction(streaming_token, dataset_id, timestamp, user_id, item_id, event_type, additional_attributes)

Adds a user-item interaction record (data row) to a streaming dataset.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • timestamp (int) – The unix timestamp of the event.

  • user_id (str) – The unique identifier for the user.

  • item_id (list) – The unique identifier for the items

  • event_type (str) – The event type.

  • additional_attributes (dict) – Attributes of the user interaction.

upsert_user_attributes(streaming_token, dataset_id, user_id, user_attributes)

Adds a user attributes record (data row) to a streaming dataset.

Either the streaming dataset ID or the project ID is required.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • user_id (str) – The unique identifier for the user.

  • user_attributes (dict) – Attributes of the user interaction.

upsert_item_attributes(streaming_token, dataset_id, item_id, item_attributes)

Adds an item attributes record (data row) to a streaming dataset.

Either the streaming dataset ID or the project ID is required.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • item_id (str) – The unique identifier for the item.

  • item_attributes (dict) – Attributes of the item interaction.

add_multiple_user_item_interactions(streaming_token, dataset_id, interactions)

Adds a user-item interaction record (data row) to a streaming dataset.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • interactions (list) – List of interactions, each interaction of format {‘userId’: userId, ‘timestamp’: timestamp, ‘itemId’: itemId, ‘eventType’: eventType, ‘additionalAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}

upsert_multiple_user_attributes(streaming_token, dataset_id, upserts)

Adds multiple user attributes records (data row) to a streaming dataset.

The streaming dataset ID is required.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • upserts (list) – List of upserts, each upsert of format {‘userId’: userId, ‘userAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.

upsert_multiple_item_attributes(streaming_token, dataset_id, upserts)

Adds multiple item attributes records (data row) to a streaming dataset.

The streaming dataset ID is required.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the dataset.

  • dataset_id (str) – The streaming dataset to record data to.

  • upserts (list) – List of upserts, each upsert of format {‘itemId’: itemId, ‘itemAttributes’: {‘attribute1’: ‘abc’, ‘attribute2’: 123}}.

upsert_item_embeddings(streaming_token, model_id, item_id, vector, catalog_id=None)

Upserts an embedding vector for an item id for a model_id.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to upsert item embeddings to.

  • item_id (str) – The item id for which its embeddings will be upserted.

  • vector (list) – The embedding vector.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

delete_item_embeddings(streaming_token, model_id, item_ids, catalog_id=None)

Deletes knn embeddings for a list of item ids for a model_id.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to delete item embeddings from.

  • item_ids (list) – A list of item ids for which its embeddings will be deleted.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

upsert_multiple_item_embeddings(streaming_token, model_id, upserts, catalog_id=None)

Upserts a knn embedding for multiple item ids for a model_id.

Parameters
  • streaming_token (str) – The streaming token for authenticating requests to the model.

  • model_id (str) – The model id to upsert item embeddings to.

  • upserts (list) – A list of {‘itemId’: …, ‘vector’: […]} dicts for each upsert.

  • catalog_id (str) – Optional name to specify which catalog in a model to update.

upsert_data(feature_group_id, streaming_token, data)

Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.

Parameters
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

append_data(feature_group_id, streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record

upsert_multiple_data(feature_group_id, streaming_token, data)

Updates new data into the feature group for a given lookup key recordId if the recordID is found otherwise inserts new data into the feature group.

Parameters
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (dict) – The data to record, as an array of JSON Objects

append_multiple_data(feature_group_id, streaming_token, data)

Appends new data into the feature group for a given lookup key recordId.

Parameters
  • feature_group_id (str) – The Streaming feature group to record data to

  • streaming_token (str) – The streaming token for authenticating requests

  • data (list) – The data to record, as an array of JSON objects

create_algorithm(name, problem_type, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=False, project_id=None)

Creates a custome algorithm that’s re-usable for model training

Parameters
  • name (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • problem_type (str) – The type of the problem this algorithm will work on

  • source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (str) – The train config parameter name in the train function

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • config_options (dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

  • project_id (str) – The unique version ID of the project

Returns

The new customer model can be used for training

Return type

Algorithm

delete_algorithm(algorithm)

Deletes the specified customer algorithm.

Parameters

algorithm (str) – The name of the algorithm to delete.

update_algorithm(algorithm, source_code=None, training_data_parameter_names_mapping=None, training_config_parameter_name=None, train_function_name=None, predict_function_name=None, predict_many_function_name=None, initialize_function_name=None, config_options=None, is_default_enabled=None)

Update custome algorithm for the given algorithm name. If source_code is provided, also need to provide all the function names in the source_code.

Parameters
  • algorithm (str) – The name to identify the algorithm, only uppercase letters, numbers and underscore allowed

  • source_code (str) – Contents of a valid python source code file. The source code should contain the train/predict/predict_many/initialize functions. A list of allowed import and system libraries for each language is specified in the user functions documentation section.

  • training_data_parameter_names_mapping (dict) – The mapping from feature group types to training data parameter names in the train function

  • training_config_parameter_name (str) – The train config parameter name in the train function

  • train_function_name (str) – Name of the function found in the source code that will be executed to train the model. It is not executed when this function is run.

  • predict_function_name (str) – Name of the function found in the source code that will be executed run predictions through model. It is not executed when this function is run.

  • predict_many_function_name (str) – Name of the function found in the source code that will be executed for batch prediction of the model. It is not executed when this function is run.

  • initialize_function_name (str) – Name of the function found in the source code to initialize the trained model before using it to make predictions using the model

  • config_options (dict) – Map dataset types and configs to train function parameter names

  • is_default_enabled (bool) – Whether train with the algorithm by default

Returns

The new customer model can be used for training

Return type

Algorithm

exception abacusai.ApiException(message, http_status, exception=None)

Bases: Exception

Default ApiException raised by APIs

Parameters
  • message (str) – The error message

  • http_status (int) – The https status code raised by the server

  • exception (str) – The exception class raised by the server

__str__()

Return str(self).

class abacusai.ClientOptions(exception_on_404=True, server=DEFAULT_SERVER)

Options for configuring the ApiClient

Parameters
  • exception_on_404 (bool) – If true, will raise an exception on a 404 from the server, else will return None.

  • server (str) – The default server endpoint to use for API requests

class abacusai.ReadOnlyClient(api_key=None, server=None, client_options=None, skip_version_check=False)

Bases: BaseApiClient

Abacus.AI Read Only API Client. Only contains GET methods

Parameters
  • api_key (str) – The api key to use as authentication to the server

  • server (str) – The base server url to use to send API requets to

  • client_options (ClientOptions) – Optional API client configurations

  • skip_version_check (bool) – If true, will skip checking the server’s current API version on initializing the client

list_api_keys()

Lists all of the user’s API keys the user’s organization.

Returns

List of API Keys for this user.

Return type

ApiKey

list_organization_users()

Retrieves a list of all users in the organization.

This method will retrieve a list containing all the users in the organization. The list includes pending users who have been invited to the organization.

Returns

Array of all of the users in the Organization

Return type

User

describe_user()

Get the current user’s information, such as their name, email, admin status, etc.

Returns

Information about the current User

Return type

User

list_organization_groups()

Lists all Organizations Groups within this Organization

Returns

List of Groups in this Organization

Return type

OrganizationGroup

describe_organization_group(organization_group_id)

Returns the specific organization group passes in by the user.

Parameters

organization_group_id (str) – The unique ID of the organization group to that needs to be described.

Returns

Information about a specific Organization Group

Return type

OrganizationGroup

list_use_cases()

Retrieves a list of all use cases with descriptions. Use the given mappings to specify a use case when needed.

Returns

A list of UseCase objects describing all the use cases addressed by the platform. For details, please refer to

Return type

UseCase

describe_problem_type(problem_type)
Parameters

problem_type (str) –

Returns

None

Return type

ProblemType

describe_use_case_requirements(use_case)

This API call returns the feature requirements for a specified use case

Parameters

use_case (str) – This will contain the Enum String for the use case whose dataset requirements are needed.

Returns

The feature requirements of the use case are returned. This includes all the feature group required for the use case along with their descriptions and feature mapping details.

Return type

UseCaseRequirements

describe_project(project_id)

Returns a description of a project.

Parameters

project_id (str) – The unique project ID

Returns

The project description is returned.

Return type

Project

list_projects(limit=100, start_after_id=None)

Retrieves a list of all projects in the current organization.

Parameters
  • limit (int) – The max length of the list of projects.

  • start_after_id (str) – The ID of the project after which the list starts.

Returns

An array of all projects in the Organization the user is currently logged in to.

Return type

Project

list_project_datasets(project_id)

Retrieves all dataset(s) attached to a specified project. This API returns all attributes of each dataset, such as its name, type, and ID.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array representing all of the datasets attached to the project.

Return type

ProjectDataset

get_schema(project_id, dataset_id)

[DEPRECATED] Returns a schema given a specific dataset in a project. The schema of the dataset consists of the columns in the dataset, the data type of the column, and the column’s column mapping.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • dataset_id (str) – The unique ID associated with the dataset.

Returns

An array of objects for each column in the specified dataset.

Return type

Schema

validate_project(project_id, feature_group_ids=None)

Validates that the specified project has all required feature group types for its use case and that all required feature columns are set.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • feature_group_ids (list) – The feature group IDS to validate

Returns

The project validation. If the specified project is missing required columns or feature groups, the response includes an array of objects for each missing required feature group and the missing required features in each feature group.

Return type

ProjectValidation

get_feature_group_schema(feature_group_id, project_id=None)

Returns a schema given a specific FeatureGroup in a project.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • project_id (str) – The unique ID associated with the project.

Returns

An array of objects for each column in the specified feature group.

Return type

Feature

describe_feature_group(feature_group_id)

Describe a Feature Group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

The feature group object.

Return type

FeatureGroup

describe_feature_group_by_table_name(table_name)

Describe a Feature Group by the feature group’s table name

Parameters

table_name (str) – The unique table name of the Feature Group to lookup

Returns

The Feature Group

Return type

FeatureGroup

list_feature_groups(limit=100, start_after_id=None, feature_group_template_id=None, is_including_detached_from_template=False)

Enlist all the feature groups associated with a project. A user needs to specify the unique project ID to fetch all attached feature groups.

Parameters
  • limit (int) – The the number of feature groups to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all feature groups till a specified ID.

  • feature_group_template_id (str) – If specified, limit results to feature groups attached to this template id.

  • is_including_detached_from_template (bool) – When feature_group_template_id is specified, include feature groups that were detached from that template id.

Returns

All the feature groups in the organization

Return type

FeatureGroup

list_project_feature_groups(project_id, filter_feature_group_use=None)

List all the feature groups associated with a project

Parameters
  • project_id (str) – The unique ID associated with the project.

  • filter_feature_group_use (str) – The feature group use filter, when given as an argument, only allows feature groups in this project to be returned if they are of the given use. DATA_WRANGLING, TRAINING_INPUT, BATCH_PREDICTION_INPUT, BATCH_PREDICTION_OUTPUT

Returns

All the Feature Groups in the Organization

Return type

FeatureGroup

get_feature_group_version_export_download_url(feature_group_export_id)

Get a link to download the feature group version.

Parameters

feature_group_export_id (str) – The Feature Group Export to get signed url for.

Returns

The FeatureGroupExportDownloadUrl instance, which contains the download URL and expiration time.

Return type

FeatureGroupExportDownloadUrl

describe_feature_group_export(feature_group_export_id)

A feature group export

Parameters

feature_group_export_id (str) – The ID of the feature group export.

Returns

The feature group export

Return type

FeatureGroupExport

list_feature_group_exports(feature_group_id)

Lists all of the feature group exports for a given feature group

Parameters

feature_group_id (str) – The ID of the feature group

Returns

The feature group exports

Return type

FeatureGroupExport

list_feature_group_modifiers(feature_group_id)

To list users who can modify a feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

Returns

Modification lock status and groups and organizations added to the feature group.

Return type

ModificationLockInfo

get_materialization_logs(feature_group_version, stdout=False, stderr=False)

Returns logs for materialized feature group version.

Parameters
  • feature_group_version (str) – The Feature Group instance to export

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns

A function logs.

Return type

FunctionLogs

list_feature_group_versions(feature_group_id, limit=100, start_after_version=None)

Retrieves a list of all feature group versions for the specified feature group.

Parameters
  • feature_group_id (str) – The unique ID associated with the feature group.

  • limit (int) – The max length of the returned versions

  • start_after_version (str) – Results will start after this version

Returns

An array of feature group version.

Return type

FeatureGroupVersion

describe_feature_group_version(feature_group_version)

Get a specific feature group version.

Parameters

feature_group_version (str) – The unique ID associated with the feature group version.

Returns

A feature group version.

Return type

FeatureGroupVersion

describe_feature_group_template(feature_group_template_id)

Describe a Feature Group Template.

Parameters

feature_group_template_id (str) – The unique ID of a feature group template.

Returns

The feature group template object.

Return type

FeatureGroupTemplate

list_feature_group_templates(limit=100, start_after_id=None, feature_group_id=None)

List feature group templates, optionally scoped by the feature group that created the templates.

Parameters
  • limit (int) – The maximum number of templates to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all templates till the specified feature group template ID.

  • feature_group_id (str) – If specified, limit to templates created from this feature group.

Returns

All the feature groups in the organization, optionally limited by the feature group that created the template(s).

Return type

FeatureGroupTemplate

list_project_feature_group_templates(project_id, limit=100, start_after_id=None)

List feature group templates for feature groups associated with the project.

Parameters
  • project_id (str) – Limit to templates associated with this project, e.g. templates associated with feature groups in this project.

  • limit (int) – The maximum number of templates to be retrieved.

  • start_after_id (str) – An offset parameter to exclude all templates till the specified feature group template ID.

Returns

All the feature groups in the organization, optionally limited by the feature group that created the template(s).

Return type

FeatureGroupTemplate

suggest_feature_group_template_for_feature_group(feature_group_id)

Suggest values for a feature gruop template, based on a feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group to use for suggesting values to use for the template.

Returns

None

Return type

FeatureGroupTemplate

get_dataset_schema(dataset_id)

Retrieves the column schema of a dataset

Parameters

dataset_id (str) – The Dataset schema to lookup.

Returns

List of Column schema definitions

Return type

DatasetColumn

get_file_connector_instructions(bucket, write_permission=False)

Retrieves verification information to create a data connector to a cloud storage bucket.

Parameters
  • bucket (str) – The fully qualified URI of the storage bucket to verify.

  • write_permission (bool) – If true, instructions will include steps for allowing Abacus.AI to write to this service.

Returns

An object with full description of the cloud storage bucket authentication options and bucket policy. Returns an error message if the parameters are invalid.

Return type

FileConnectorInstructions

list_database_connectors()

Retrieves a list of all of the database connectors along with all their attributes.

Returns

The database Connector

Return type

DatabaseConnector

list_file_connectors()

Retrieves a list of all connected services in the organization and their current verification status.

Returns

An array of cloud storage buckets connected to the organization.

Return type

FileConnector

list_database_connector_objects(database_connector_id)

Lists querable objects in the database connector.

Parameters

database_connector_id (str) – The unique identifier for the database connector.

Return type

List[str]

get_database_connector_object_schema(database_connector_id, object_name=None)

Get the schema of an object in an database connector.

Parameters
  • database_connector_id (str) – The unique identifier for the database connector.

  • object_name (str) – The unique identifier for the object in the external system.

Return type

List[str]

list_application_connectors()

Retrieves a list of all of the application connectors along with all their attributes.

Returns

The appplication Connector

Return type

ApplicationConnector

list_application_connector_objects(application_connector_id)

Lists querable objects in the application connector.

Parameters

application_connector_id (str) – The unique identifier for the application connector.

Return type

List[str]

list_streaming_connectors()

Retrieves a list of all of the streaming connectors along with all their attributes.

Returns

The streaming Connector

Return type

StreamingConnector

list_streaming_tokens()

Retrieves a list of all streaming tokens along with their attributes.

Returns

An array of streaming tokens.

Return type

StreamingAuthToken

get_recent_feature_group_streamed_data(feature_group_id)

Returns recently streamed data to a streaming feature group.

Parameters

feature_group_id (str) – The unique ID associated with the feature group.

list_uploads()

Lists all ongoing uploads in the organization

Returns

An array of uploads.

Return type

Upload

describe_upload(upload_id)

Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.

Parameters

upload_id (str) – The unique ID associated with the file uploaded or being uploaded in parts.

Returns

The details associated with the large dataset file uploaded in parts.

Return type

Upload

list_datasets(limit=100, start_after_id=None, exclude_streaming=False)

Retrieves a list of all of the datasets in the organization.

Parameters
  • limit (int) – The max length of the list of projects.

  • start_after_id (str) – The ID of the project after which the list starts.

  • exclude_streaming (bool) – Exclude streaming datasets from result.

Returns

A list of datasets.

Return type

Dataset

describe_dataset(dataset_id)

Retrieves a full description of the specified dataset, with attributes such as its ID, name, source type, etc.

Parameters

dataset_id (str) – The unique ID associated with the dataset.

Returns

The dataset.

Return type

Dataset

describe_dataset_version(dataset_version)

Retrieves a full description of the specified dataset version, with attributes such as its ID, name, source type, etc.

Parameters

dataset_version (str) – The unique ID associated with the dataset version.

Returns

The dataset version.

Return type

DatasetVersion

list_dataset_versions(dataset_id, limit=100, start_after_version=None)

Retrieves a list of all dataset versions for the specified dataset.

Parameters
  • dataset_id (str) – The unique ID associated with the dataset.

  • limit (int) – The max length of the list of all dataset versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns

A list of dataset versions.

Return type

DatasetVersion

get_training_config_options(project_id, feature_group_ids=None, for_retrain=False)

Retrieves the full description of the model training configuration options available for the specified project.

The configuration options available are determined by the use case associated with the specified project. Refer to the (Use Case Documentation)[https://api.abacus.ai/app/help/useCases] for more information on use cases and use case specific configuration options.

Parameters
  • project_id (str) – The unique ID associated with the project.

  • feature_group_ids (list) – The feature group IDs to be used for training

  • for_retrain (bool) – If training config options are used for retrain

Returns

An array of options that can be specified when training a model in this project.

Return type

TrainingConfigOptions

list_models(project_id)

Retrieves the list of models in the specified project.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array of models.

Return type

Model

describe_model(model_id)

Retrieves a full description of the specified model.

Parameters

model_id (str) – The unique ID associated with the model.

Returns

The description of the model.

Return type

Model

get_model_metrics(model_id, model_version=None, baseline_metrics=False)

Retrieves a full list of the metrics for the specified model.

If only the model’s unique identifier (modelId) is specified, the latest trained version of model (modelVersion) is used.

Parameters
  • model_id (str) – The unique ID associated with the model.

  • model_version (str) – The version of the model.

  • baseline_metrics (bool) – If true, will also return the baseline model metrics for comparison.

Returns

An object to show the model metrics and explanations for what each metric means.

Return type

ModelMetrics

list_model_versions(model_id, limit=100, start_after_version=None)

Retrieves a list of the version for a given model.

Parameters
  • model_id (str) – The unique ID associated with the model.

  • limit (int) – The max length of the list of all dataset versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns

An array of model versions.

Return type

ModelVersion

describe_model_version(model_version)

Retrieves a full description of the specified model version

Parameters

model_version (str) – The unique version ID of the model version

Returns

A model version.

Return type

ModelVersion

get_training_data_logs(model_version)

Retrieves the data preparation logs during model training.

Parameters

model_version (str) – The unique version ID of the model version

Returns

A list of logs.

Return type

DataPrepLogs

set_default_model_algorithm(model_id=None, algorithm=None)

Sets the model’s algorithm to default for all new deployments

Parameters
  • model_id (str) – The model to set

  • algorithm (str) – the algorithm to pin in the model

  • model_id

  • algorithm

get_training_logs(model_version, stdout=False, stderr=False)

Returns training logs for the model.

Parameters
  • model_version (str) – The unique version ID of the model version

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns

A function logs.

Return type

FunctionLogs

list_model_monitors(project_id)

Retrieves the list of models monitors in the specified project.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array of model monitors.

Return type

ModelMonitor

describe_model_monitor(model_monitor_id)

Retrieves a full description of the specified model monitor.

Parameters

model_monitor_id (str) – The unique ID associated with the model monitor.

Returns

The description of the model monitor.

Return type

ModelMonitor

get_prediction_drift(model_monitor_version)

Gets the label and prediction drifts for a model monitor.

Parameters

model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

Returns

An object describing training and prediction output label and prediction distributions.

Return type

DriftDistributions

get_model_monitor_summary(model_monitor_id)
Parameters

model_monitor_id (str) –

Returns

None

Return type

ModelMonitorSummary

list_model_monitor_versions(model_monitor_id, limit=100, start_after_version=None)

Retrieves a list of the versions for a given model monitor.

Parameters
  • model_monitor_id (str) – The unique ID associated with the model monitor.

  • limit (int) – The max length of the list of all model monitor versions.

  • start_after_version (str) – The id of the version after which the list starts.

Returns

An array of model monitor versions.

Return type

ModelMonitorVersion

describe_model_monitor_version(model_monitor_version)

Retrieves a full description of the specified model monitor version

Parameters

model_monitor_version (str) – The unique version ID of the model monitor version

Returns

A model monitor version.

Return type

ModelMonitorVersion

model_monitor_version_metric_data(model_monitor_version, metric_type)

Provides the data needed for decile metrics associated with the model monitor.

Parameters
  • model_monitor_version (str) – Model monitor version id.

  • metric_type (str) – The metric type to get data for.

Returns

Data associated with the metric.

Return type

ModelMonitorVersionMetricData

get_model_monitoring_logs(model_monitor_version, stdout=False, stderr=False)

Returns monitoring logs for the model.

Parameters
  • model_monitor_version (str) – The unique version ID of the model monitor version

  • stdout (bool) – Set True to get info logs

  • stderr (bool) – Set True to get error logs

Returns

A function logs.

Return type

FunctionLogs

get_drift_for_feature(model_monitor_version, feature_name)

Gets the feature drift associated with a single feature in an output feature group from a prediction.

Parameters
  • model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type

Dict

get_outliers_for_feature(model_monitor_version, feature_name=None)

Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.

Parameters
  • model_monitor_version (str) – The unique identifier to a model monitor version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type

Dict

get_outliers_for_batch_prediction_feature(batch_prediction_version, feature_name=None)

Gets a list of outliers measured by a single feature (or overall) in an output feature group from a prediction.

Parameters
  • batch_prediction_version (str) – The unique identifier to a batch prediction version created under the project.

  • feature_name (str) – Name of the feature to view the distribution of.

Return type

Dict

describe_deployment(deployment_id)

Retrieves a full description of the specified deployment.

Parameters

deployment_id (str) – The unique ID associated with the deployment.

Returns

The description of the deployment.

Return type

Deployment

list_deployments(project_id)

Retrieves a list of all deployments in the specified project.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array of deployments.

Return type

Deployment

list_deployment_tokens(project_id)

Retrieves a list of all deployment tokens in the specified project.

Parameters

project_id (str) – The unique ID associated with the project.

Returns

An array of deployment tokens.

Return type

DeploymentAuthToken

describe_refresh_policy(refresh_policy_id)

Retrieve a single refresh policy

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

Returns

A refresh policy object

Return type

RefreshPolicy

describe_refresh_pipeline_run(refresh_pipeline_run_id)

Retrieve a single refresh pipeline run

Parameters

refresh_pipeline_run_id (str) – The unique ID associated with this refresh pipeline_run

Returns

A refresh pipeline run object

Return type

RefreshPipelineRun

list_refresh_policies(project_id=None, dataset_ids=[], model_ids=[], deployment_ids=[], batch_prediction_ids=[], model_monitor_ids=[], prediction_metric_ids=[])

List the refresh policies for the organization

Parameters
  • project_id (str) – Optionally, a Project ID can be specified so that all datasets, models and deployments are captured at the instant this policy was created

  • dataset_ids (list) – Comma separated list of Dataset IDs

  • model_ids (list) – Comma separated list of Model IDs

  • deployment_ids (list) – Comma separated list of Deployment IDs

  • batch_prediction_ids (list) – Comma separated list of Batch Prediction IDs

  • model_monitor_ids (list) – Comma separated list of Model Monitor IDs.

  • prediction_metric_ids (list) – Comma separated list of Prediction Metric IDs,

Returns

List of all refresh policies in the organization

Return type

RefreshPolicy

list_refresh_pipeline_runs(refresh_policy_id)

List the the times that the refresh policy has been run

Parameters

refresh_policy_id (str) – The unique ID associated with this refresh policy

Returns

A list of refresh pipeline runs for the given refresh policy id

Return type

RefreshPipelineRun

list_prediction_metrics(feature_group_id, limit=100, should_include_latest_version_description=True, start_after_id=None)

List the prediction metrics for a feature group.

Parameters
  • feature_group_id (str) – The feature group used as input to this prediction metric.

  • limit (int) – The the number of prediction metrics to be retrieved.

  • should_include_latest_version_description (bool) – include the description of the latest prediction metric version for each prediction metric

  • start_after_id (str) – An offset parameter to exclude all prediction metrics till the specified prediction metric ID.

Returns

The prediction metrics for this feature group.

Return type

PredictionMetric

query_prediction_metrics(feature_group_id=None, project_id=None, limit=100, should_include_latest_version_description=True, start_after_id=None)

Query and return prediction metrics and extra data needed by the UI, constrained by the parameters provided.

feature_group_id (Unique String Identifier): [optional] The feature group used as input to the prediction metrics.

project_id (Unique String Identifier): [optional] The project_id of the prediction metrics. limit (Integer): The the number of prediction metrics to be retrieved. should_include_latest_version_description (Boolean): include the description of the latest prediction metric version for each prediction metric start_after_id (Unique String Identifier): An offset parameter to exclude all prediction metrics till the specified prediction metric ID.

Parameters
  • feature_group_id (str) –

  • project_id (str) –

  • limit (int) –

  • should_include_latest_version_description (bool) –

  • start_after_id (str) –

Returns

The prediction metrics for this feature group.

Return type

PredictionMetric

describe_prediction_metric_version(prediction_metric_version)

Retrieves a full description of the specified prediction metric version

Parameters

prediction_metric_version (str) – The unique version ID of the prediction metric version

Returns

A prediction metric version. For more information, please refer to the details on the object (below).

Return type

PredictionMetricVersion

download_batch_prediction_result_chunk(batch_prediction_version, offset=0, chunk_size=10485760)

Returns a stream containing the batch prediction results

Parameters
  • batch_prediction_version (str) – The unique identifier of the batch prediction version to get the results from

  • offset (int) – The offset to read from

  • chunk_size (int) – The max amount of data to read

Return type

io.BytesIO

get_batch_prediction_connector_errors(batch_prediction_version)

Returns a stream containing the batch prediction database connection write errors, if any writes failed to the database connector

Parameters

batch_prediction_version (str) – The unique identifier of the batch prediction job to get the errors for

Return type

io.BytesIO

list_batch_predictions(project_id)

Retrieves a list for the batch predictions in the project

Parameters

project_id (str) – The unique identifier of the project

Returns

A list of batch prediction jobs.

Return type

BatchPrediction

describe_batch_prediction(batch_prediction_id)

Describes the batch prediction

Parameters

batch_prediction_id (str) – The unique ID associated with the batch prediction.

Returns

The batch prediction description.

Return type

BatchPrediction

list_batch_prediction_versions(batch_prediction_id, limit=100, start_after_version=None)

Retrieves a list of versions of a given batch prediction

Parameters
  • batch_prediction_id (str) – The unique identifier of the batch prediction

  • limit (int) – The number of versions to list

  • start_after_version (str) – The version to start after

Returns

A list of batch prediction versions.

Return type

BatchPredictionVersion

describe_batch_prediction_version(batch_prediction_version)

Describes a batch prediction version

Parameters

batch_prediction_version (str) – The unique identifier of the batch prediction version

Returns

The batch prediction version.

Return type

BatchPredictionVersion

describe_algorithm(algorithm)

Retrieves a full description of the specified algorithm.

Parameters

algorithm (str) – The name of the algorithm.

Returns

The description of the Algorithm.

Return type

Algorithm

class abacusai.PredictionClient(client_options=None)

Bases: abacusai.client.BaseApiClient

Abacus.AI Prediction API Client. Does not utilize authentication and only contains public prediction methods

Parameters

client_options (ClientOptions) – Optional API client configurations

predict_raw(deployment_token, deployment_id, **kwargs)

Raw interface for returning predictions from Plug and Play deployments.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • **kwargs (dict) – Arbitrary key/value pairs may be passed in and is sent as part of the request body.

lookup_features(deployment_token, deployment_id, query_data={})

Returns the feature group deployed in the feature store project.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict(deployment_token, deployment_id, query_data={})

Returns a prediction for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_multiple(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (list) – This will be a list of dictionaries where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_from_datasets(deployment_token, deployment_id, query_data={})

Returns a list of predictions for Predictive Modeling

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the source dataset name and ‘Value’ will be a list of records corresponding to the dataset rows

Return type

Dict

predict_lead(deployment_token, deployment_id, query_data)

Returns the probability of a user to be a lead on the basis of his/her interaction with the service/product and user’s own attributes (e.g. income, assets, credit score, etc.). Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘user_id’ mapped to mapping ‘LEAD_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing user attributes and/or user’s interaction data with the product/service (e.g. number of click, items in cart, etc.).

Return type

Dict

predict_churn(deployment_token, deployment_id, query_data)

Returns a probability of a user to churn out in response to his/her interactions with the item/product/service. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘churn_result’ mapped to mapping ‘CHURNED_YN’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

Return type

Dict

predict_takeover(deployment_token, deployment_id, query_data)

Returns a probability for each class label associated with the types of fraud or a ‘yes’ or ‘no’ type label for the possibility of fraud. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing account activity characteristics (e.g. login id, login duration, login type, ip address, etc.).

Return type

Dict

predict_fraud(deployment_token, deployment_id, query_data)

Returns a probability of a transaction performed under a specific account as being a fraud or not. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_number’ mapped to the mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary containing transaction attributes (e.g. credit card type, transaction location, transaction amount, etc.).

Return type

Dict

predict_class(deployment_token, deployment_id, query_data={}, threshold=None, threshold_class=None, thresholds=None, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a classification prediction

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • threshold (float) – float value that is applied on the popular class label.

  • threshold_class (str) – label upon which the threshold is added (Binary labels only)

  • thresholds (list) – maps labels to thresholds (Multi label classification only). Defaults to F1 optimal threshold if computed for the given class, else uses 0.5

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type

Dict

predict_target(deployment_token, deployment_id, query_data={}, explain_predictions=False, fixed_features=None, nested=None, explainer_type=None)

Returns a prediction from a classification or regression model. Optionally, includes explanations.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the entity against which a prediction is performed and ‘Value’ will be the unique value of the same entity.

  • explain_predictions (bool) – If true, returns the SHAP explanations for all input features.

  • fixed_features (list) – Set of input features to treat as constant for explanations.

  • nested (str) – If specified generates prediction delta for each index of the specified nested feature.

  • explainer_type (str) –

Return type

Dict

get_anomalies(deployment_token, deployment_id, threshold=None, histogram=False)

Returns a list of anomalies from the training dataset

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • threshold (float) – The threshold score of what is an anomaly. Valid values are between 0.8 and 0.99.

  • histogram (bool) – If True, will return a histogram of the distribution of all points

Return type

io.BytesIO

is_anomaly(deployment_token, deployment_id, query_data=None)

Returns a list of anomaly attributes based on login information for a specified account. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘account_name’ mapped to mapping ‘ACCOUNT_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – The input data for the prediction.

Return type

Dict

get_forecast(deployment_token, deployment_id, query_data, future_data=None, num_predictions=None, prediction_start=None)

Returns a list of forecasts for a given entity under the specified project deployment. Note that the inputs to the deployed model will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘holiday_yn’ mapped to mapping ‘FUTURE’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘store_id’ in your dataset) mapped to the column mapping ITEM_ID that uniquely identifies the entity against which forecasting is performed and ‘Value’ will be the unique value of the same entity.

  • future_data (dict) – This will be a dictionary of values known ahead of time that are relevant for forecasting (e.g. State Holidays, National Holidays, etc.). The key and the value both will be of type ‘String’. For example future data entered for a Store may be {“Holiday”:”No”, “Promo”:”Yes”}.

  • num_predictions (int) – The number of timestamps to predict in the future.

  • prediction_start (str) – The start date for predictions (e.g., “2015-08-01T00:00:00” as input for mid-night of 2015-08-01).

Return type

Dict

get_k_nearest(deployment_token, deployment_id, vector, k=None, distance=None, include_score=False)

Returns the k nearest neighbors for the provided embedding vector.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • vector (list) – Input vector to perform the k nearest neighbors with.

  • k (int) – Overrideable number of items to return

  • distance (str) – Specify the distance function to use when finding nearest neighbors

  • include_score (bool) – If True, will return the score alongside the resulting embedding value

Return type

Dict

get_multiple_k_nearest(deployment_token, deployment_id, queries)

Returns the k nearest neighbors for the queries provided

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • queries (list) – List of Mappings of format {“catalogId”: “cat0”, “vectors”: […], “k”: 20, “distance”: “euclidean”}. See getKNearest for additional information about the supported parameters

get_labels(deployment_token, deployment_id, query_data, threshold=None)

Returns a list of scored labels from

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

  • threshold (None) – Deprecated

Return type

Dict

get_recommendations(deployment_token, deployment_id, query_data, num_items=50, page=1, exclude_item_ids=[], score_field='', scaling_factors=[], restrict_items=[], exclude_items=[], explore_fraction=0.0)

Returns a list of recommendations for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘time’ mapped to mapping ‘TIMESTAMP’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which recommendations are made and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • exclude_item_ids (list) – [DEPRECATED]

  • score_field (str) – The relative item scores are returned in a separate field named with the same name as the key (score_field) for this argument.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

  • explore_fraction (float) – The fraction of recommendations that is to be new items.

Return type

Dict

get_personalized_ranking(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of items with personalized promotions on them for a given user under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type

Dict

get_ranked_items(deployment_token, deployment_id, query_data, preserve_ranks=[], preserve_unknown_items=False, scaling_factors=[])

Returns a list of re-ranked items for a selected user when a list of items is required to be reranked according to the user’s preferences. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary with two key-value pairs. The first pair represents a ‘Key’ where the column name (e.g. a column with name ‘user_id’ in your dataset) mapped to the column mapping USER_ID uniquely identifies the user against whom a prediction is made and a ‘Value’ which is the identifier value for that user. The second pair will have a ‘Key’ which will be the name of the column name (e.g. movie_name) mapped to ITEM_ID (unique item identifier) and a ‘Value’ which will be a list of identifiers that uniquely identifies those items.

  • preserve_ranks (list) – List of dictionaries of format {“column”: “col0”, “values”: [“value0, value1”]}, where the ranks of items in query_data is preserved for all the items in “col0” with values, “value0” and “value1”. This option is useful when the desired items are being recommended in the desired order and the ranks for those items need to be kept unchanged during recommendation generation.

  • preserve_unknown_items (bool) – If true, any items that are unknown to the model, will not be reranked, and the original position in the query will be preserved.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

Return type

Dict

Returns a list of related items for a given item under the specified project deployment. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘item_code’ mapped to mapping ‘ITEM_ID’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – This will be a dictionary where ‘Key’ will be the column name (e.g. a column with name ‘user_name’ in your dataset) mapped to the column mapping USER_ID that uniquely identifies the user against which related items are determined and ‘Value’ will be the unique value of the same item. For example, if you have the column name ‘user_name’ mapped to the column mapping ‘USER_ID’, then the query must have the exact same column name (user_name) as key and the name of the user (John Doe) as value.

  • num_items (int) – The number of items to recommend on one page. By default, it is set to 50 items per page.

  • page (int) – The page number to be displayed. For example, let’s say that the num_items is set to 10 with the total recommendations list size of 50 recommended items, then an input value of 2 in the ‘page’ variable will display a list of items that rank from 11th to 20th.

  • scaling_factors (list) – It allows you to bias the model towards certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”], “factor”: 1.1}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” in reference to which the model recommendations need to be biased; and the key, “factor” takes the factor by which the item scores are adjusted. Let’s take an example where the input to scaling_factors is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”], “factor”: 1.4}]. After we apply the model to get item probabilities, for every SUV and Sedan in the list, we will multiply the respective probability by 1.1 before sorting. This is particularly useful if there’s a type of item that might be less popular but you want to promote it or there’s an item that always comes up and you want to demote it.

  • restrict_items (list) – It allows you to restrict the recommendations to certain items. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, “value3”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”, “value3”, …]” to which to restrict the recommendations to. Let’s take an example where the input to restrict_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. This input will restrict the recommendations to SUVs and Sedans. This type of restrition is particularly useful if there’s a list of items that you know is of use in some particular scenario and you want to restrict the recommendations only to that list.

  • exclude_items (list) – It allows you to exclude certain items from the list of recommendations. The input to this argument is a list of dictionaries where the format of each dictionary is as follows: {“column”: “col0”, “values”: [“value0”, “value1”, …]}. The key, “column” takes the name of the column, “col0”; the key, “values” takes the list of items, “[“value0”, “value1”]” to exclude from the recommendations. Let’s take an example where the input to exclude_items is [{“column”: “VehicleType”, “values”: [“SUV”, “Sedan”]}]. The resulting recommendation list will exclude all SUVs and Sedans. This is particularly useful if there’s a list of items that you know is of no use in some particular scenario and you don’t want to show those items present in that list.

Return type

Dict

get_feature_group_rows(deployment_token, deployment_id, query_data)
Parameters
  • deployment_token (str) –

  • deployment_id (str) –

  • query_data (dict) –

get_search_results(deployment_token, deployment_id, query_data)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Dictionary where key is “Content” and value is the text from which entities are to be extracted.

Return type

Dict

get_sentiment(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_entailment(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_classification(deployment_token, deployment_id, document)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • document (str) – # TODO

Return type

Dict

get_summary(deployment_token, deployment_id, query_data)

Returns a json of the predicted summary for the given document. Note that the inputs to this method, wherever applicable, will be the column names in your dataset mapped to the column mappings in our system (e.g. column ‘text’ mapped to mapping ‘DOCUMENT’ in our system).

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – Raw Data dictionary containing the required document data - must have a key document corresponding to a DOCUMENT type text as value.

Return type

Dict

predict_language(deployment_token, deployment_id, query_data)

TODO

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (str) – # TODO

Return type

Dict

get_assignments(deployment_token, deployment_id, query_data, forced_assignments=None)

Get all positive assignments that match a query.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – specifies the set of assignments being requested.

  • forced_assignments (dict) – set of assignments to force and resolve before returning query results.

Return type

Dict

check_constraints(deployment_token, deployment_id, query_data)

Check for any constraints violated by the overrides.

Parameters
  • deployment_token (str) – The deployment token to authenticate access to created deployments. This token is only authorized to predict on deployments in this project, so it is safe to embed this model inside of an application or website.

  • deployment_id (str) – The unique identifier to a deployment created under the project.

  • query_data (dict) – assignment overrides to the solution.

Return type

Dict

abacusai.__version__ = 0.36.17