Open Data Schema for Energy
Implementation Guide

Unifying Three Inverter Feeds in a Week: A Day-by-Day Walkthrough

A worked example showing how to normalize Huawei, Enphase, and Solarman telemetry into one ODSE pipeline in a single sprint — what works, what doesn't, and what to do differently.

Consider a small portfolio: five sites across three OEMs — Huawei string inverters at two rooftop installations, Enphase microinverters at a commercial carport, and Solarman-connected panels at two ground-mount sites in the Eastern Cape. The portfolio is small enough that everyone assumes the data is fine. Then the quarterly energy report comes back with a 12% discrepancy between what the sites are producing and what the analytics show, and nobody can explain where the gap comes from.

The root cause turns out to be embarrassingly mundane: Huawei exports use UTC, Enphase uses local time with no offset marker, and Solarman includes a timezone field that's wrong half the time. The pipeline is double-counting some intervals and missing others entirely. The fix: block out a week, use ODSE as the normalization layer, and get it right. Here's how that week plays out.

Day 1: Figure Out What You Actually Have

Start by inventorying every data source actually being ingested — not what the vendor docs say is available, but what's landing in the pipeline. This takes longer than expected because the Solarman integration was set up by someone who's since left, and the export format has drifted from what's documented in the wiki.

For each source, write down: transport type (SFTP drop vs API pull), reporting interval, timestamp convention, and known gaps. The Huawei feed is the cleanest — 5-minute intervals, proper UTC timestamps, reliable delivery. Enphase is 15-minute intervals via API with local timestamps and no timezone indicator. Solarman is the wildcard: daily CSV dumps with a mix of interval lengths depending on the logger firmware version.

The output is a simple source manifest — a shared doc the whole team can reference. Nothing fancy, but having all three sources described in one place for the first time is surprisingly clarifying.

Day 2: Get One OEM Through End-to-End

Pick the noisiest OEM first — in this case, Huawei, which covers two of the five sites and is the most business-critical. The goal is narrow: timestamp normalization, energy field mapping, and error code classification. Nothing else.

from odse import transform

rows = transform("huawei_export.csv", source="huawei")
print(rows[0])

This goes faster than expected — maybe three hours to get a clean transform running against a week of historical data. The tricky part is the error codes. Huawei uses numeric fault codes that don't map neatly to anything generic, so you'll need to make some judgment calls about which codes map to which ODSE error_type values. Document the mapping decisions in comments rather than trying to get them perfect on day one.

Day 3: Wire In Validation (and Actually Block on It)

This is the step that changes the most about how the pipeline operates. In many setups, validation runs as a weekly QA check — someone eyeballs a report on Fridays and flags anything weird. That means the team is routinely making operational decisions Monday through Thursday on data that hasn't been validated at all.

Move validation to a gate: if records don't pass schema validation, they don't enter the analytics pipeline. Period.

from odse import validate

result = validate("site_a_odse.json")
print(result.is_valid)
print(result.errors)

The first run against the Huawei transform output flags about 4% of records — mostly null energy fields during overnight hours that had been treated as zero-generation instead of no-data. Small distinction, big impact on completeness calculations. Start tracking validation failures by class and frequency — this immediately becomes the backlog for the rest of the week.

Day 4: Bring In Enphase and Solarman

Now for the hard part. The other two OEM feeds need to produce the same ODSE output, and the temptation to build per-site special cases is real. Resist it. The whole point is one schema, not three tidy silos that happen to live in the same database.

Enphase is straightforward once the timestamp issue is sorted — pin everything to UTC at the transform layer and store the original local timestamp as diagnostic context. Solarman is painful. The interval lengths vary by firmware version, and two of the CSV columns have been renamed in a firmware update that only one of the two Solarman sites has received. The fix: a small pre-processor that normalizes the column names before the ODSE transform touches them.

By end of day, the output from all three OEMs can be concatenated and queried without schema exceptions. That moment — running one query across the whole portfolio for the first time — is genuinely satisfying.

Day 5: Build the Views Nobody Had Before

With unified data, build two portfolio-level views that operations teams typically ask for: a fault rollup by error_type across all sites, and a completeness report by site and day. In this example, the fault rollup immediately surfaces that one of the ground-mount sites has been throwing communication timeout errors every afternoon for weeks — invisible before because the Solarman error codes didn't mean anything to the team reviewing the Huawei-centric dashboard.

The completeness report is even more revealing. Two sites that look fine in isolation are actually missing 8-10% of their daily intervals. The gaps are short enough that the daily energy totals look plausible, but the interval-level data is full of holes.

Day 6: Make It Survive Real Operations

Days 2 through 5 are all happy-path work. Day 6 is about what happens when the Solarman SFTP server is down for six hours, or when the Enphase API returns a 429 at 2am and the retry logic doesn't handle it, or when someone's laptop timezone changes and the local transform runs shift the offset.

Add replay-safe ingestion so re-processing a batch doesn't create duplicates. Set up alerting for transform and validation failure spikes — not email alerts that everyone ignores, but a Slack message to the on-call channel with the failure class and count. And write down the fallback behavior for delayed source data, because "figure it out when it happens" is how you end up debugging at 11pm.

Day 7: Freeze It and Write It Down

Lock the ODSE contract version, document the error-code mappings, and write an internal runbook covering three things: who updates transforms when an OEM changes their export format, who triages validation failures when the alert fires, and who approves schema version updates when the contract needs to evolve.

That last point matters more than it sounds. Without clear ownership, schemas drift. Someone adds a field for one site, someone else changes a mapping, and six months later you're back where you started.

Common Mistakes Along the Way

A common mistake is treating ODSE as an analytics layer — running forecasting logic inside the transform pipeline instead of keeping it downstream where it belongs. ODSE is a data interchange layer. The moment you start embedding business logic in the normalization step, you've coupled things that should be independent.

Another pitfall: deferring the error taxonomy mapping until day 4, which is too late. If you don't map error codes early, you're just creating three tidy files that still can't be compared. The whole point of a shared schema is shared semantics, not shared syntax.

What This Sprint Actually Produces

The honest answer: this sprint won't solve every data problem. There will still be edge cases in the Solarman transform that need attention, and some error taxonomy mappings that aren't fully confident. But the quarterly report that kicked this off? Re-run it against the normalized pipeline and the 12% discrepancy drops to under 1%. The remaining gap is real — an inverter with degraded output that had been masked by the timestamp confusion.

More importantly, there's now one place to look. When something's wrong, the team doesn't have to check three different dashboards with three different conventions. One schema, one validation gate, one completeness view. Everything built from here can trust the foundation.

Next steps (OSS):

Install `odse`, run your first transform, validate one week of historical data, and open an issue for any unsupported OEM mappings.

Get Started | Schema Reference | GitHub

← Back to Blog