abacusai.upload

Module Contents

Classes

Upload

A Upload Reference for uploading file parts

class abacusai.upload.Upload(client, uploadId=None, datasetUploadId=None, status=None, datasetId=None, datasetVersion=None, modelId=None, modelVersion=None, batchPredictionId=None, parts=None, createdAt=None)

Bases: abacusai.return_class.AbstractApiClass

A Upload Reference for uploading file parts

Parameters
  • client (ApiClient) – An authenticated API Client instance

  • uploadId (str) – The unique ID generated when the upload process of the full large file in smaller parts is initiated.

  • datasetUploadId (str) – Same as upload_id. It is kept for backwards compatibility purposes.

  • status (str) – The current status of the upload.

  • datasetId (str) – A reference to the dataset this upload is adding data to.

  • datasetVersion (str) – A reference to the dataset version the upload is adding data to.

  • modelId (str) – A reference the model the upload is creating a version for

  • modelVersion (str) – A reference to the model version the upload is creating.

  • batchPredictionId (str) – A reference to the batch prediction the upload is creating.

  • parts (list of json objects) – A list containing the order of the file parts that have been uploaded.

  • createdAt (str) – The timestamp at which the upload was created.

__repr__()

Return repr(self).

to_dict()

Get a dict representation of the parameters in this class

Returns

The dict value representation of the class parameters

Return type

dict

cancel()

Cancels an upload

Parameters

upload_id (str) – The Upload ID

part(part_number, part_data)

Uploads a part of a large dataset file from your bucket to our system. Our system currently supports a size of up to 5GB for a part of a full file and a size of up to 5TB for the full file. Note that each part must be >=5MB in size, unless it is the last part in the sequence of parts for the full file.

Parameters
  • part_number (int) – The 1-indexed number denoting the position of the file part in the sequence of parts for the full file.

  • part_data (io.TextIOBase) – The multipart/form-data for the current part of the full file.

Returns

The object ‘UploadPart’ which encapsulates the hash and the etag for the part that got uploaded.

Return type

UploadPart

mark_complete()

Marks an upload process as complete.

Parameters

upload_id (str) – A unique identifier for this upload

Returns

The upload object associated with the upload process for the full file. The details of the object are described below:

Return type

Upload

refresh()

Calls describe and refreshes the current object’s fields

Returns

The current object

Return type

Upload

describe()

Retrieves the current upload status (complete or inspecting) and the list of file parts uploaded for a specified dataset upload.

Parameters

upload_id (str) – The unique ID associated with the file uploaded or being uploaded in parts.

Returns

The details associated with the large dataset file uploaded in parts.

Return type

Upload

upload_part(upload_args)

Uploads a file part. If the upload fails, it will retry up to 3 times with a short backoff before raising an exception.

Returns

The object ‘UploadPart’ that encapsulates the hash and the etag for the part that got uploaded.

Return type

UploadPart

upload_file(file, threads=10, chunksize=1024 * 1024 * 10, wait_timeout=600)

Uploads the file in the specified chunk size using the specified number of workers.

Parameters
  • file (IOBase) – A bytesIO or StringIO object to upload to Abacus.AI

  • threads (int, optional) – The max number of workers to use while uploading the file

  • chunksize (int, optional) – The number of bytes to use for each chunk while uploading the file. Defaults to 10 MB

  • wait_timeout (int, optional) – The max number of seconds to wait for the file parts to be joined on Abacus.AI. Defaults to 600.

Returns

The upload file object.

Return type

Upload

_yield_upload_part(file, chunksize)
wait_for_join(timeout=600)

A waiting call until the upload parts are joined.

Parameters

timeout (int, optional) – The waiting time given to the call to finish, if it doesn’t finish by the allocated time, the call is said to have timed out. Defaults to 600.

get_status()

Gets the status of the upload.

Returns

A string describing the status of the upload (pending, complete, etc.).

Return type

str