HuggingFace datasets
A guidance of usage for Hugging Face datasets.
Install
Start by installing 🤗 Datasets:
1 | pip install datasets |
🤗 Datasets also support audio and image data formats:
1 | pip install datasets[audio] |
Load Dataset
Import load_dataset:
1 | from datasets import load_dataset |
Load remote dataset:
1 | dataset = load_dataset("glue", "mrpc", split="train") |
Load local CSV files:
1 | dataset1 = load_dataset('csv', data_files='data.csv') |
Load local JSON files:
1 | dataset = load_dataset('json', data_files='data.json') |
Load local TXT files:
1 | dataset = load_dataset('text', data_files='data.txt') |
Load from disk
The dataset downloaded to the local machine via the 🤗 datasets
library can be loaded using the load_from_disk()
function. This function allows you to load datasets that have been previously cached or downloaded locally, without the need to fetch them again from the 🤗 database.
1 | from datasets import load_from_disk |
HuggingFace datasets