Datasets

Provide NEO with access to your data using multiple methods and formats

NEO handles real-world data automatically, supporting various access methods, file formats, and data sources.

How to Provide Data to NEO

Platform mode

Method	Description	Limit
File upload	Drag and drop directly in chat	Max 50MB per file
Public URLs	Reference public dataset URLs	Any size
Cloud storage	S3, Google Cloud Storage, Azure Blob Storage via Secrets Manager	No size limit
GitHub	Access repository datasets	Public and private repos
Kaggle	Competition datasets	Via API

VS Code extension

Local files - Place datasets in workspace folder, limited only by local disk and system resources

Integrated providers:

Provider	Use case
Amazon S3	Datasets and model checkpoints
Weights and Biases	Experiment tracking and artifacts
Hugging Face	Model hub access
Kaggle	Competition data
GitHub	Repository datasets

Quick Setup Guide

Step	Action	Details
Step 1	Choose access method	Upload, URL, cloud, or local files
Step 2	Prepare your data	Use supported formats (CSV, Parquet, JSON, etc.)
Step 3	Reference in task	Include file path or URL in your task description

Supported Formats and Media Types

Format	Type	Description	Platform	VS Code
CSV	Tabular	Standard format for tabular data and time series	✅	✅
Parquet	Tabular	Recommended for datasets >100MB	✅	✅
JSON	Structured	Ideal for logs and nested data structures	✅	✅
Excel	Tabular	Business data and reports	Upload only (converted internally) ✅	✅
Images	Visual	JPG, PNG, TIFF formats (ZIP)	ZIP ✅	✅
Audio	Audio	WAV, MP3, FLAC formats (ZIP)	ZIP ✅	✅

Note: Images and audio must be uploaded as ZIP archives on the platform to preserve directory structure.

Example Tasks

CSV dataset example


Analyze the retail sales data in sales_data.csv (columns: date, product_id, 
quantity, price, store_id) and forecast demand for each product category. 
Include confidence intervals.

Parquet dataset example



Use the large transaction dataset in transactions.parquet (10M+ records) 
to detect fraudulent transactions. Optimize for precision to minimize 
false positives.

Cloud storage example



Analyze customer feedback from s3://company-data/feedback/2024/ and 
perform sentiment analysis. Generate monthly sentiment trends.

Multi-source example




Combine customer_data.parquet, transactions.json, and product_images.zip 
to build a personalized recommendation engine.

Best Practices

Start small - Test with samples under 10MB before scaling
Clean column names - Use descriptive, consistent naming (e.g., customer_id not “Customer ID”)
Standardize dates - Use ISO 8601 format for consistent parsing
Use Parquet for large data - Faster processing, smaller storage footprint

Troubleshooting

Issue	Solution
File not found	Verify file path, check spelling, ensure file exists
Format not supported	Convert to CSV, Parquet, or JSON
File too large	Use cloud storage or VS Code extension
Access denied	Verify credentials and permissions

Learn More

VS Code Extension

Install Neo and work directly with local code and data.

Platform Features

Understand Neo’s capabilities across web and IDE environments.

FAQ

Review security, privacy, limits, and troubleshooting information.