Running With Local Files
Let’s verify your job works with simple local files before adding cloud storage and observability. This approach lets you iterate quickly and debug any issues.
Generate Test Data
Create a simple tool to generate test data locally. Create tool/main.go:
package main
import (
"flag"
"fmt"
"log"
"math/rand"
"os"
)
var cities = []string{
"Tokyo", "Jakarta", "Delhi", "Manila", "Shanghai",
"Sao Paulo", "Mumbai", "Beijing", "Cairo", "Mexico City",
"New York", "London", "Paris", "Moscow", "Sydney",
}
func main() {
count := flag.Int("count", 10000, "number of measurements to generate")
output := flag.String("output", "measurements.txt", "output file path")
flag.Parse()
log.Printf("Generating %d measurements...\n", *count)
f, err := os.Create(*output)
if err != nil {
log.Fatal(err)
}
defer f.Close()
for i := 0; i < *count; i++ {
city := cities[rand.Intn(len(cities))]
temp := -20.0 + rand.Float64()*70.0 // -20 to 50°C
fmt.Fprintf(f, "%s;%.1f\n", city, temp)
}
log.Printf("Generated %s successfully\n", *output)
}
Build and Generate Data
# Generate test data (10K measurements for quick testing)
go run tool/main.go -count 10000 -output measurements.txt
You should see:
Generating 10000 measurements...
Generated measurements.txt successfully
Verify Test Data
Check the generated file:
head -5 measurements.txt
You should see lines like:
Tokyo;35.6
Jakarta;-6.2
Delhi;28.7
Manila;32.1
Shanghai;18.9
Build and Run Your Job
# Ensure dependencies are installed
go mod tidy
# Run the job
go run .
You should see simple log output:
{"time":"2024-11-23T22:50:00Z","level":"INFO","msg":"starting 1BRC processing","input_file":"measurements.txt","output_file":"results.txt"}
{"time":"2024-11-23T22:50:00Z","level":"INFO","msg":"1BRC processing completed successfully","cities_processed":15}
That’s it! Your job runs in less than a second and processes the data.
Verify Results
Check the output file:
cat results.txt
Expected format:
Beijing=-19.5/16.3/49.8
Cairo=-18.2/17.9/48.5
Delhi=-17.9/15.8/47.3
Jakarta=-19.8/14.2/49.1
London=-19.3/15.4/48.9
Mexico City=-18.7/16.1/49.5
Moscow=-19.1/14.8/49.2
Mumbai=-19.6/15.7/48.8
New York=-18.9/15.3/49.1
Paris=-19.4/16.2/49.3
Sao Paulo=-18.5/15.6/48.7
Shanghai=-19.2/15.9/49.4
Sydney=-18.8/16.5/49.6
Tokyo=-19.7/15.1/48.6
Each line shows: city=min/mean/max
Test With Larger Dataset
Try a larger dataset to see performance:
# Generate 1 million measurements
go run tool/main.go -count 1000000 -output measurements.txt
# Run the job and measure execution time
time go run .
This will take a bit longer but you can verify the job handles larger files efficiently.
What If It Doesn’t Work?
Job crashes or errors:
- Check that
measurements.txtexists in the project root - Verify the file has valid data in
city;temperatureformat - Check config.yaml has correct file paths
No results.txt created:
- Check for error messages in stdout
- Try with a smaller dataset first (1000 rows)
- Verify write permissions in the directory
Empty results.txt:
- Check that measurements.txt has valid data
- Verify format is
city;temperature(semicolon separator) - Check for parsing errors in the logs
Why Start With Local Files?
Benefits:
- Zero infrastructure setup required
- Instant feedback loop
- Easy to debug with local files
- Proves the algorithm works correctly
- Can add cloud storage later without changing the core logic
What’s Next
Your job works with local files! Now let’s refactor it to use MinIO (S3-compatible storage) so you can process files in the cloud.