The 2019 Brooklyn Blackout Was in the 311 Data Four Days Early
Everyone knows what 311 is for. You call when a streetlight goes out, when the traffic signal is stuck, when your block has been dark for three hours. The agency logs the complaint, dispatches someone, closes the ticket.
That mental model is wrong. Every customer-service queue at every utility, municipal government, large property owner, and retail chain is an unlabeled sensor network pretending to be a work queue.
NYC gets about 3 million 311 complaints a year. Every call is geocoded, timestamped, and tagged with a complaint type. Most people look at 311 as customer-service data. Read that again. It is actually a citywide sensor network with human nodes.
The Brooklyn blackout of July 21, 2019 is the proof. Four days before Con Edison lost power to roughly 72,000 customers in Canarsie, Mill Basin, and Flatbush, the 311 data was already showing the cascade. Not in one cell. Not in one complaint. Across the grid, simultaneously, in a pattern the network had seen dozens of times before.
52,636 complaints, one grid, 18 months of history
NYC Open Data publishes the full 311 service request archive via Socrata (dataset 76ig-c548), updated daily. We pulled 52,636 electrical complaints from Brooklyn, January 2018 through June 2019. Complaint types: ELECTRIC, Electrical, Street Light Condition, Traffic Signal Condition.
Before any pattern work, we standardized the vocabulary. Over 90 distinct descriptor strings in the raw data — POWER OUTAGE, NO LIGHTING, Wiring Defective/Exposed, dozens more — inconsistently formatted, overlapping but not identical. We collapsed those into 11 stable complaint families before building fingerprints.
The fingerprint is the key mechanism. One row per grid cell per day. Brooklyn gridded at 0.01 degree resolution, roughly 1.1km by 0.85km cells. Each cell is a monitoring station. Each fingerprint captures the complaint types that occurred in that cell over the prior 48 hours and how intense they were relative to that cell's own historical baseline.
That last part matters more than anything else in this post. Every cell has its own baseline. A commercial strip in DUMBO that logs 4 electrical complaints in any 48 hours is normal. A residential block in Marine Park that logs 4 in 48 hours is a cascade. Severity is measured against the cell's own history, not against a citywide threshold.
254 patterns, zero rules written
We fed 25,235 discovery fingerprints into a pattern-discovery algorithm. No labels. No rules about which complaint types indicate grid stress. No threshold values entered by hand. One question: what complaint fingerprint patterns exist in 18 months of Brooklyn electrical data?
Four minutes. That is how long the algorithm needed to return 254 stable patterns, plus 3,945 noise fingerprints that did not cluster into anything repeatable enough to name.
Human analysts reviewed all 254. Each pattern got a name. Insight IDs 1545 through 1798. Some were labeled Clustered signal failures. Some Clustered outages. The most severe got multi_mode_cascade — all three failure modes (power outages, wiring defects, signal failures) co-occurring in the same 48-hour window. Analysts approved the naming scheme. The engine got the approved library. Nobody wrote a rule.
Full pipeline runtime from pull to approve: roughly 7 minutes.
One clarification before we go further. This is a retrospective validation against a known event. The pipeline did not predict the July 21 outage in real time — it demonstrated that the signal was present in data that was already available, and that the approved pattern library would have matched it. A production deployment would require live Socrata ingestion and live classification. What we built is a validation, not a deployment. What the validation shows still matters.
The July buildup — four days of distributed signal
Here is what the discovery fingerprints show for July 2019, before the matching step: the count of Brooklyn grid cells at spike or cascade severity per day.
Jul 01: 0
Jul 02: 4 ████
Jul 03: 3 ███
Jul 04–07: 0 – 0
Jul 08: 1 █
Jul 09: 5 █████
Jul 10: 3 ███
Jul 11: 0
Jul 12: 4 ████
Jul 13: 2 ██
Jul 14–15: 0 – 0
Jul 16: 1 █
Jul 17: 5 █████
Jul 18: 6 ██████ <- buildup
Jul 19: 2 ██
Jul 20: 4 ████
Jul 21: 6 ██████ <- outage day
Jul 22: 17 █████████████████ <- peak
Jul 23: 13 █████████████
Jul 24: 5 █████The normal July baseline is 0 to 3 cells per day, and in the first half of the month the count never exceeds 5. Then July 17 arrives, and the chart stops looking like noise.
Five cells hit spike or cascade severity on July 17. Six on July 18, and the cells are not the same ones that fired the day before — they are distributed across southern Brooklyn zip codes. The network is stressed across multiple points simultaneously, which is a different signal than one bad block having a bad week. By outage day, July 21, the count is back to 6. Not a single concentrated spike. A diffuse, simultaneous elevation across the grid.
On July 22, the day after the outage, the count hits 17. Peak severity. The system is now measuring the aftermath — what happens when 72,000 customers call 311 within 24 hours of losing power, confirming in the data what the grid already knew.
Think about what the ops manager's dashboard would have shown on the morning of July 18. Not one bad block, not one complaint cluster, not one transformer acting up. Six different grid cells, distributed across southern Brooklyn, all flagging cascade-level electrical patterns at the same time. Against a baseline where 3 is normal, 6 simultaneous cells is not noise. That is the signal. Call the grid operator, pre-position crews, dispatch inspectors before anyone uses the word outage.
16 complaints, 5 outages, a 90% pattern match
The 1,170 matching fingerprints from July 2019 were classified against the approved library from the discovery run. Four cells in the pre-blackout window matched at cascade-level severity.
- •Jul 18 — 11211 Williamsburg: 8 complaints, 2 power outages, clustered outages, matched Insight #1793 (signal + light + wiring cascade)
- •Jul 20 — 11214 Bensonhurst: 7 complaints, 1 power outage, clustered signal failures, matched Insight #1609 (signal cascade with spike trend)
- •Jul 21 — 11234 Marine Park: 7 complaints, 1 power outage, clustered signal failures, matched Insight #1609 (same pattern)
- •Jul 21 — 11212 Brownsville / East New York: 16 complaints, 5 power outages, all three cascade modes, matched Insight #1759 (multi-mode electrical cascade) at 90% similarity
The Brownsville row is the one that matters most. On the same day the grid failed, zip 11212 had 16 complaints in a 48-hour window, 5 confirmed power outages, and all three cascade modes active simultaneously — outages, wiring defects, signal failures. The pipeline matched it against Insight #1759, multi-mode electrical cascade, with a 90% similarity score.
What the 90% match means in practice: that Brownsville cell on July 21 was nearly identical to a fingerprint pattern analysts had already reviewed, named, and approved. Not a generic anomaly. Not something looks off. A match to a specific, named, human-approved pattern that the library recognized because it had appeared repeatedly in 18 months of Brooklyn history.
This is where auditability becomes operational. When the infrastructure review asks what did we know and when, the answer is not we had 2,540 complaints in July. The answer is a timestamped list: every cell, every day, every pattern match, every human-approved insight it matched against, every similarity score. July 18 Williamsburg Insight #1793. July 20 Bensonhurst Insight #1609. July 21 Brownsville Insight #1759 at 90%. That is not a retrospective reconstruction. That is native architecture.
The usual AI vendor demo shows you dashboards. The interesting demo shows you the approval queue — every pattern named, every name signed, every match traceable.
The operator is not the recipient of the alert. The operator is the one whose pattern library the alert uses. Your senior analysts review 254 candidate patterns, approve the ones that map to real grid-stress modes, reject the ones that are just normal-day noise. The engine then applies that approved library to every new July. The operator's expertise is the training data. The machine is the scale.
Four datasets, four industries, same four steps
This is the fourth time the same architecture has run against a different problem. Environmental data — public NOAA weather sensor records produced eight days of lead signal before the onset conditions of the 2018 Woolsey Fire. Manufacturing — NASA's C-MAPSS aircraft engine benchmark, where unmatched patterns carried a 3.3x anomaly premium over matched ones. Healthcare fraud — Medicare billing records for medical equipment suppliers, cross-referenced against the LEIE excluded-providers list, with a 3x to 6.62x fraud detection uplift and a 335-facility triage completed in under one analyst-day. NYC 311 electrical complaints — four days of lead signal before a blackout that affected roughly 72,000 customers.
Pull records. Compose per-cell fingerprints. Discover patterns and let humans approve them. Classify new records against the approved library. The fingerprint composition adapts to the data shape. Everything else is identical.
The 311 vocabulary was messy — 90 distinct descriptor strings collapsed into 11 stable families. Your claims notes will be messier. The fingerprint composition step handles that.
When a utility's data director says our complaint vocabulary is different from NYC 311, that is the right concern and the wrong objection. The pipeline learned the vocabulary from the data. You register the table. The system clusters what it finds. Your team approves the naming scheme. The vocabulary adapts because the patterns are discovered, not authored.
Sources
NYC Open Data, Socrata 311 Service Requests dataset 76ig-c548, accessed for Brooklyn electrical complaints January 2018 through June 2019.
July 21, 2019 Con Edison Brooklyn outage — publicly documented, approximately 72,000 customers affected across Canarsie, Mill Basin, and Flatbush.
Coherany demo notebook, Brooklyn electrical pipeline (internal, 2026-04). Full methodology available on request.
Stay ahead on accountable AI
Get insights on AI governance, classification best practices, and product updates. No spam, unsubscribe anytime.
Want to see accountable AI in action?
Book a Demo