Datasets
Provide NEO with access to your data using multiple methods and formats
NEO handles real-world data automatically, supporting various access methods, file formats, and data sources.
How to Provide Data to NEO
Platform mode
| Method | Description | Limit |
|---|
| File upload | Drag and drop directly in chat | Max 50MB per file |
| Public URLs | Reference public dataset URLs | Any size |
| Cloud storage | S3, Google Cloud Storage, Azure Blob Storage via Secrets Manager | No size limit |
| GitHub | Access repository datasets | Public and private repos |
| Kaggle | Competition datasets | Via API |
VS Code extension
- Local files - Place datasets in workspace folder, limited only by local disk and system resources
Integrated providers:
| Provider | Use case |
|---|
| Amazon S3 | Datasets and model checkpoints |
| Weights and Biases | Experiment tracking and artifacts |
| Hugging Face | Model hub access |
| Kaggle | Competition data |
| GitHub | Repository datasets |
Quick Setup Guide
| Step | Action | Details |
|---|
| Step 1 | Choose access method | Upload, URL, cloud, or local files |
| Step 2 | Prepare your data | Use supported formats (CSV, Parquet, JSON, etc.) |
| Step 3 | Reference in task | Include file path or URL in your task description |
Supported Formats and Media Types
| Format | Type | Description | Platform | VS Code |
|---|
| CSV | Tabular | Standard format for tabular data and time series | ✅ | ✅ |
| Parquet | Tabular | Recommended for datasets >100MB | ✅ | ✅ |
| JSON | Structured | Ideal for logs and nested data structures | ✅ | ✅ |
| Excel | Tabular | Business data and reports | Upload only (converted internally) ✅ | ✅ |
| Images | Visual | JPG, PNG, TIFF formats (ZIP) | ZIP ✅ | ✅ |
| Audio | Audio | WAV, MP3, FLAC formats (ZIP) | ZIP ✅ | ✅ |
Note: Images and audio must be uploaded as ZIP archives on the platform to preserve directory structure.
Example Tasks
CSV dataset example
Parquet dataset example
Cloud storage example
Multi-source example
Best Practices
-
- Start small - Test with samples under 10MB before scaling
-
- Clean column names - Use descriptive, consistent naming (e.g.,
customer_id not “Customer ID”)
-
- Standardize dates - Use ISO 8601 format for consistent parsing
-
- Use Parquet for large data - Faster processing, smaller storage footprint
Troubleshooting
| Issue | Solution |
|---|
| File not found | Verify file path, check spelling, ensure file exists |
| Format not supported | Convert to CSV, Parquet, or JSON |
| File too large | Use cloud storage or VS Code extension |
| Access denied | Verify credentials and permissions |