# parlai.core.build_data¶

These can be replaced if your particular file system does not support them.

class parlai.core.build_data.DownloadableFile(url, file_name, hashcode, zipped=True, from_google=False)[source]

Bases: object

A class used to abstract any file that has to be downloaded online.

Any task that needs to download a file needs to have a list RESOURCES that have objects of this class as elements.

This class provides the following functionality:

• Untar the file if zipped

An object of this class needs to be created with:

• file_name <string> : File name that the file should be named

• zipped <boolean> : False if the file is not compressed

• from_google <boolean> : True if the file is from Google Drive

__init__(url, file_name, hashcode, zipped=True, from_google=False)[source]

Initialize self. See help(type(self)) for accurate signature.

checksum(dpath)[source]

Checksum on a given file.

Parameters

check_header()[source]

Performs a HEAD request to check if the URL / Google Drive ID is live.

parlai.core.build_data.built(path, version_string=None)[source]

Check if ‘.built’ flag has been set for that task.

If a version_string is provided, this has to match, or the version is regarded as not built.

parlai.core.build_data.mark_done(path, version_string=None)[source]

Mark this path as prebuilt.

Marks the path as done by adding a ‘.built’ file with the current timestamp plus a version description string if specified.

Parameters
• path (str) – The file path to mark as built.

• version_string (str) – The version of this dataset.

parlai.core.build_data.download(url, path, fname, redownload=False, num_retries=5)[source]

If redownload is set to false, then will not download tar file again if it is present (default False).

parlai.core.build_data.make_dir(path)[source]

Make the directory and any nonexistent parent directories (mkdir -p).

parlai.core.build_data.remove_dir(path)[source]

Remove the given directory, if it exists.

parlai.core.build_data.untar(path, fname, delete=True, flatten_tar=False)[source]

Unpack the given archive file to the same directory.

Parameters
• path (str) – The folder containing the archive. Will contain the contents.

• fname (str) – The filename of the archive file.

• delete (bool) – If true, the archive will be deleted after extraction.

parlai.core.build_data.ungzip(path, fname, deleteGZip=True)[source]

Unzips the given gzip compressed file to the same directory.

Parameters
• path (str) – The folder containing the archive. Will contain the contents.

• fname (str) – The filename of the archive file.

• deleteGZip (bool) – If true, the compressed file will be deleted after extraction.

parlai.core.build_data.download_from_google_drive(gd_id, destination)[source]

parlai.core.build_data.download_models(opt, fnames, model_folder, version='v1.0', path='aws', use_model_type=False, flatten_tar=False)[source]

Parameters

• use_model_type – whether models are categorized by type in AWS

parlai.core.build_data.modelzoo_path(datapath, path)[source]

Map pretrain models filenames to their path on disk.

If path starts with ‘models:’, then we remap it to the model zoo path within the data directory (default is ParlAI/data/models). We download models from the model zoo if they are not here yet.

parlai.core.build_data.download_multiprocess(urls, path, num_processes=32, chunk_size=100, dest_filenames=None, error_path=None)[source]

WARNING: may have issues with OS X.

Parameters

• path – directory to save items in

• num_processes – number of processes to use

• chunk_size – chunk size to use

• dest_filenames – optional array of same length as url with filenames. Images will be saved as path + dest_filename

• error_path – where to save error logs

Returns

array of tuples of (destination filename, http status code, error message if any). Note that upon failure, file may not actually be created.