azure.ai.ml.data_transfer package — Azure SDK for Python 2.0.0 documentation
Christopher Ramos
Published Feb 16, 2026
- class
azure.ai.ml.data_transfer.DataTransferCopy(*, component: Union[str,azure.ai.ml.entities._component.datatransfer_component.DataTransferCopyComponent], compute: Optional[str] = None, inputs: Optional[Dict[str,Union[azure.ai.ml.entities._job.pipeline._io.base.NodeOutput,azure.ai.ml.entities._inputs_outputs.input.Input,str]]] = None, outputs: Optional[Dict[str,Union[str,azure.ai.ml.entities._inputs_outputs.output.Output]]] = None, data_copy_mode: Optional[str] = None, **kwargs)[source]¶ Base class for data transfer copy node.
You should not instantiate this class directly. Instead, you should create from builder function: copy_data.
- Parameters
component (DataTransferCopyComponent) – Id or instance of the data transfer component/job to be run for the step
inputs (Dict[str, Union[NodeOutput, Input, str]]) – Inputs to the data transfer.
outputs (Dict[str, Union[str, Output, dict]]) – Mapping of output data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
data_copy_mode (str) – data copy mode in copy task, possible value is “merge_with_overwrite”, “fail_if_conflict”.
- Raises
ValidationException – Raised if DataTransferCopy cannot be successfully validated. Details will be provided in the error message.
clear() → None. Remove all items from D.¶
copy() → a shallow copy of D¶
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dumps the job content into a file in YAML format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Keyword Arguments
kwargs (dict) – Additional arguments to pass to the YAML serializer.
- Raises
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
fromkeys(value=None, /)¶Create a new dictionary with keys from iterable and values set to value.
get(key, default=None, /)¶Return the value for key if key is in the dictionary, else default.
items() → a set-like object providing a view on D’s items¶
keys() → a set-like object providing a view on D’s keys¶
pop(k[, d]) → v, remove specified key and return the corresponding value.¶If key is not found, default is returned if given, otherwise KeyError is raised
popitem()¶Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
setdefault(key, default=None, /)¶Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
update([E, ]**F) → None. Update D from dict/iterable E and F.¶If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values() → an object providing a view on D’s values¶
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
component¶
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
inputs¶ Get the inputs for the object.
- property
log_files¶ Job output files.
- property
outputs¶ Get the outputs of the object.
- property
status¶ The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns
Status of the job.
- Return type
Optional[str]
- class
azure.ai.ml.data_transfer.DataTransferCopyComponent(*, data_copy_mode: str = None, inputs: Optional[Dict] = None, outputs: Optional[Dict] = None, **kwargs)[source]¶ DataTransfer copy component version, used to define a data transfer copy component.
- Parameters
data_copy_mode (str) – Data copy mode in the copy task. Possible values are “merge_with_overwrite” and “fail_if_conflict”.
inputs (dict) – Mapping of input data bindings used in the job.
outputs (dict) – Mapping of output data bindings used in the job.
kwargs – Additional parameters for the data transfer copy component.
- Raises
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dump the component content into a file in yaml format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
data_copy_mode¶ Data copy mode of the component.
- Returns
Data copy mode of the component.
- Return type
- property
display_name¶ Display name of the component.
- Returns
Display name of the component.
- Return type
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
is_deterministic¶ Whether the component is deterministic.
- Returns
Whether the component is deterministic
- Return type
- property
type¶ Type of the component, default is ‘command’.
- Returns
Type of the component.
- Return type
- class
azure.ai.ml.data_transfer.DataTransferExport(*, component: Union[str,azure.ai.ml.entities._component.datatransfer_component.DataTransferExportComponent], compute: Optional[str] = None, sink: Optional[Union[Dict,azure.ai.ml.entities._inputs_outputs.external_data.Database,azure.ai.ml.entities._inputs_outputs.external_data.FileSystem]] = None, inputs: Optional[Dict[str,Union[azure.ai.ml.entities._job.pipeline._io.base.NodeOutput,azure.ai.ml.entities._inputs_outputs.input.Input,str]]] = None, **kwargs)[source]¶ Base class for data transfer export node.
You should not instantiate this class directly. Instead, you should create from builder function: export_data.
- Parameters
component (str) – Id of the data transfer built in component to be run for the step
sink (Union[Dict, Database, FileSystem]) – The sink of external data and databases.
inputs (Dict[str, Union[NodeOutput, Input, str, Input]]) – Mapping of input data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
- Raises
ValidationException – Raised if DataTransferExport cannot be successfully validated. Details will be provided in the error message.
clear() → None. Remove all items from D.¶
copy() → a shallow copy of D¶
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dumps the job content into a file in YAML format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Keyword Arguments
kwargs (dict) – Additional arguments to pass to the YAML serializer.
- Raises
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
fromkeys(value=None, /)¶Create a new dictionary with keys from iterable and values set to value.
get(key, default=None, /)¶Return the value for key if key is in the dictionary, else default.
items() → a set-like object providing a view on D’s items¶
keys() → a set-like object providing a view on D’s keys¶
pop(k[, d]) → v, remove specified key and return the corresponding value.¶If key is not found, default is returned if given, otherwise KeyError is raised
popitem()¶Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
setdefault(key, default=None, /)¶Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
update([E, ]**F) → None. Update D from dict/iterable E and F.¶If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values() → an object providing a view on D’s values¶
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
component¶
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
inputs¶ Get the inputs for the object.
- property
log_files¶ Job output files.
- property
outputs¶ Get the outputs of the object.
- property
sink¶ The sink of external data and databases.
- Returns
The sink of external data and databases.
- Return type
Union[None, Database, FileSystem]
- property
status¶ The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns
Status of the job.
- Return type
Optional[str]
- class
azure.ai.ml.data_transfer.DataTransferExportComponent(*, inputs: Optional[Dict] = None, sink: Optional[Dict] = None, **kwargs)[source]¶ DataTransfer export component version, used to define a data transfer export component.
- Parameters
sink (Union[Dict, Database, FileSystem]) – The sink of external data and databases.
inputs (dict) – Mapping of input data bindings used in the job.
kwargs – Additional parameters for the data transfer export component.
- Raises
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dump the component content into a file in yaml format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
display_name¶ Display name of the component.
- Returns
Display name of the component.
- Return type
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
is_deterministic¶ Whether the component is deterministic.
- Returns
Whether the component is deterministic
- Return type
- property
type¶ Type of the component, default is ‘command’.
- Returns
Type of the component.
- Return type
- class
azure.ai.ml.data_transfer.DataTransferImport(*, component: Union[str,azure.ai.ml.entities._component.datatransfer_component.DataTransferImportComponent], compute: Optional[str] = None, source: Optional[Union[Dict,azure.ai.ml.entities._inputs_outputs.external_data.Database,azure.ai.ml.entities._inputs_outputs.external_data.FileSystem]] = None, outputs: Optional[Dict[str,Union[str,azure.ai.ml.entities._inputs_outputs.output.Output]]] = None, **kwargs)[source]¶ Base class for data transfer import node.
You should not instantiate this class directly. Instead, you should create from builder function: import_data.
- Parameters
component (str) – Id of the data transfer built in component to be run for the step
source (Union[Dict, Database, FileSystem]) – The data source of file system or database
outputs (Dict[str, Union[str, Output, dict]]) – Mapping of output data bindings used in the job.
name (str) – Name of the data transfer.
description (str) – Description of the data transfer.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under, if None is provided, default will be set to current directory name.
compute (str) – The compute target the job runs on.
- Raises
ValidationException – Raised if DataTransferImport cannot be successfully validated. Details will be provided in the error message.
clear() → None. Remove all items from D.¶
copy() → a shallow copy of D¶
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dumps the job content into a file in YAML format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The local path or file stream to write the YAML content to. If dest is a file path, a new file will be created. If dest is an open file, the file will be written to directly.
- Keyword Arguments
kwargs (dict) – Additional arguments to pass to the YAML serializer.
- Raises
FileExistsError – Raised if dest is a file path and the file already exists.
IOError – Raised if dest is an open file and the file is not writable.
fromkeys(value=None, /)¶Create a new dictionary with keys from iterable and values set to value.
get(key, default=None, /)¶Return the value for key if key is in the dictionary, else default.
items() → a set-like object providing a view on D’s items¶
keys() → a set-like object providing a view on D’s keys¶
pop(k[, d]) → v, remove specified key and return the corresponding value.¶If key is not found, default is returned if given, otherwise KeyError is raised
popitem()¶Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
setdefault(key, default=None, /)¶Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
update([E, ]**F) → None. Update D from dict/iterable E and F.¶If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
values() → an object providing a view on D’s values¶
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
component¶
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
inputs¶ Get the inputs for the object.
- property
log_files¶ Job output files.
- property
outputs¶ Get the outputs of the object.
- property
status¶ The status of the job.
Common values returned include “Running”, “Completed”, and “Failed”. All possible values are:
NotStarted - This is a temporary state that client-side Run objects are in before cloud submission.
Starting - The Run has started being processed in the cloud. The caller has a run ID at this point.
Provisioning - On-demand compute is being created for a given job submission.
- Preparing - The run environment is being prepared and is in one of two stages:
Docker image build
conda environment setup
- Queued - The job is queued on the compute target. For example, in BatchAI, the job is in a queued state
while waiting for all the requested nodes to be ready.
Running - The job has started to run on the compute target.
Finalizing - User code execution has completed, and the run is in post-processing stages.
CancelRequested - Cancellation has been requested for the job.
- Completed - The run has completed successfully. This includes both the user code execution and run
post-processing stages.
Failed - The run failed. Usually the Error property on a run will provide details as to why.
Canceled - Follows a cancellation request and indicates that the run is now successfully cancelled.
NotResponding - For runs that have Heartbeats enabled, no heartbeat has been recently sent.
- Returns
Status of the job.
- Return type
Optional[str]
- class
azure.ai.ml.data_transfer.DataTransferImportComponent(*, source: Optional[Dict] = None, outputs: Optional[Dict] = None, **kwargs)[source]¶ DataTransfer import component version, used to define a data transfer import component.
- Parameters
- Raises
ValidationException – Raised if the component cannot be successfully validated. Details will be provided in the error message.
dump(dest: Union[str,os.PathLike,IO], **kwargs) → None¶Dump the component content into a file in yaml format.
- Parameters
dest (Union[PathLike, str, IO[AnyStr]]) – The destination to receive this component’s content. Must be either a path to a local file, or an already-open file stream. If dest is a file path, a new file will be created, and an exception is raised if the file exists. If dest is an open file, the file will be written to directly, and an exception will be raised if the file is not writable.
- property
base_path¶ The base path of the resource.
- Returns
The base path of the resource.
- Return type
- property
creation_context¶ The creation context of the resource.
- Returns
The creation metadata for the resource.
- Return type
Optional[SystemData]
- property
display_name¶ Display name of the component.
- Returns
Display name of the component.
- Return type
- property
id¶ The resource ID.
- Returns
The global ID of the resource, an Azure Resource Manager (ARM) ID.
- Return type
Optional[str]
- property
is_deterministic¶ Whether the component is deterministic.
- Returns
Whether the component is deterministic
- Return type
- property
type¶ Type of the component, default is ‘command’.
- Returns
Type of the component.
- Return type
- class
azure.ai.ml.data_transfer.Database(*, query: Optional[str] = None, table_name: Optional[str] = None, stored_procedure: Optional[str] = None, stored_procedure_params: Optional[List[dict]] = None, connection: Optional[str] = None)[source]¶ Define a database class for a DataTransfer Component or Job.
- Keyword Arguments
query (str) – The SQL query to retrieve data from the database.
table_name (str) – The name of the database table.
stored_procedure (str) – The name of the stored procedure.
stored_procedure_params (List[dict, StoredProcedureParameter]) – The parameters for the stored procedure.
connection (str) – The connection string for the database. The credential information should be stored in the workspace connection.
- Raises
ValidationException – Raised if the Database object cannot be successfully validated. Details will be provided in the error message.
get(key: Any, default: Optional[Any] = None) → Any¶
- property
stored_procedure_params¶ Get or set the parameters for the stored procedure.
- Returns
The parameters for the stored procedure.
- Return type
List[StoredProcedureParameter]
- class
azure.ai.ml.data_transfer.FileSystem(*, path: Optional[str] = None, connection: Optional[str] = None)[source]¶ Define a file system class of a DataTransfer Component or Job.
e.g. source_s3 = FileSystem(path=’s3://my_bucket/my_folder’, connection=’azureml:my_s3_connection’)
- Parameters
- Raises
ValidationException – Raised if Source cannot be successfully validated. Details will be provided in the error message.
get(key: Any, default: Optional[Any] = None) → Any¶
azure.ai.ml.data_transfer.copy_data(*, name: Optional[str] = None, description: Optional[str] = None, tags: Optional[Dict] = None, display_name: Optional[str] = None, experiment_name: Optional[str] = None, compute: Optional[str] = None, inputs: Optional[Dict] = None, outputs: Optional[Dict] = None, is_deterministic: bool = True, data_copy_mode: Optional[str] = None, **kwargs) → azure.ai.ml.entities._builders.data_transfer.DataTransferCopy[source]¶Create a DataTransferCopy object which can be used inside dsl.pipeline as a function.
- Keyword Arguments
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
inputs (dict) – Mapping of inputs data bindings used in the job.
outputs (dict) – Mapping of outputs data bindings used in the job.
is_deterministic (bool) – Specify whether the command will return same output given same input. If a command (component) is deterministic, when use it as a node/step in a pipeline, it will reuse results from a previous submitted job in current workspace which has same inputs and settings. In this case, this step will not use any compute resource. Default to be True, specify is_deterministic=False if you would like to avoid such reuse behavior.
data_copy_mode (str) – data copy mode in copy task, possible value is “merge_with_overwrite”, “fail_if_conflict”.
- Returns
A DataTransferCopy object.
- Return type
DataTransferCopyComponent
azure.ai.ml.data_transfer.export_data(*, name: Optional[str] = None, description: Optional[str] = None, tags: Optional[Dict] = None, display_name: Optional[str] = None, experiment_name: Optional[str] = None, compute: Optional[str] = None, sink: Optional[Union[Dict,azure.ai.ml.entities._inputs_outputs.external_data.Database,azure.ai.ml.entities._inputs_outputs.external_data.FileSystem]] = None, inputs: Optional[Dict] = None, **kwargs) → azure.ai.ml.entities._builders.data_transfer.DataTransferExport[source]¶Create a DataTransferExport object which can be used inside dsl.pipeline.
- Keyword Arguments
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
sink (Union[Dict,Database,FileSystem]) – The sink of external data and databases.
inputs (dict) – Mapping of inputs data bindings used in the job.
- Returns
A DataTransferExport object.
- Return type
DataTransferExport
- Raises
ValidationException – If sink is not provided or exporting file system is not supported.
azure.ai.ml.data_transfer.import_data(*, name: Optional[str] = None, description: Optional[str] = None, tags: Optional[Dict] = None, display_name: Optional[str] = None, experiment_name: Optional[str] = None, compute: Optional[str] = None, source: Optional[Union[Dict,azure.ai.ml.entities._inputs_outputs.external_data.Database,azure.ai.ml.entities._inputs_outputs.external_data.FileSystem]] = None, outputs: Optional[Dict] = None, **kwargs) → azure.ai.ml.entities._builders.data_transfer.DataTransferImport[source]¶Create a DataTransferImport object which can be used inside dsl.pipeline.
- Keyword Arguments
name (str) – The name of the job.
description (str) – Description of the job.
tags (dict[str, str]) – Tag dictionary. Tags can be added, removed, and updated.
display_name (str) – Display name of the job.
experiment_name (str) – Name of the experiment the job will be created under.
compute (str) – The compute resource the job runs on.
source (Union[Dict, Database,FileSystem]) – The data source of file system or database.
outputs (dict) – Mapping of outputs data bindings used in the job. The default will be an output port with the key “sink” and type “mltable”.
- Returns
A DataTransferImport object.
- Return type
DataTransferImport