PetalData Python Library
The PetalData Python library makes it easy to interact with your cloud app datasets.
petaldata package downloads CSV files and schema information from the PetalData server and generates Pandas Dataframes with the proper datatypes. Additionally, the library includes storage functionality for saving datasets locally and to Amazon S3.
The package can be installed via
The Python library can save your dataset to your local computer, Amazon S3, or both. Local Storage is the default storage location.
When a dataset is downloaded it is initally written to a
csv file in local storage. When calling
save() on a Dataset, the dataset is saved in a Pickle file that lets us preserve each column's datatype.
By default, PetalData writes all files to
os.getcwd() + "/petaldata_cache/". You can specify a different directory:
Disabling Local Storage
Local storage is always used to download
csv files but can be disabled for storing Pickle files:
Saving your datasets to Amazon S3 is a good option for:
- Remote scripts - rather than download an entire dataset when running a remote script, load the previous version,
upsert()the dataset, and save again. This can dramitally speed up the execution time of the script.
- Teamwide access - write a script that regularly updates and saves a dataset to S3, then give teams access to that dataset. They won't need to download their own full copy.
S3 storage must be explicity enabled:
Once enabled, S3 acts just like local storage.