Prompting Sustainability

Prompting Sustainability

How does generative AI interpret and visualize urban "sustainable" streetscapes? A cross-city research project examining Jakarta, Melbourne, and Singapore.

About

About the project

A cross-city investigation of how generative AI interprets and visualizes "sustainable" urban streetscapes.

Motivation

A sustainable streetscape is commonly associated with greenery, improved pedestrian infrastructure, cycling facilities, and traffic calming — yet the concept itself remains ambiguous and contested. What counts as "sustainable" is deeply contingent on local urban morphology, governance priorities, and socio-cultural expectations. A streetscape considered sustainable in one context may be inappropriate or counterproductive in another.

Generative AI tools are now entering the workflows of architects, urban planners, and policy communicators. They can translate text prompts into vivid streetscape visualizations, scaling design ideation in ways previously impossible. But this comes with a risk: when sustainability is applied through generalized transformation rules, it can be reduced to a limited set of recurring features — vegetation, bike lanes, widened sidewalks — replicated across cities regardless of fit. Sustainability may quietly become a standardized aesthetic rather than a place-based strategy.

This project asks: do AI-generated outputs reflect context-sensitive improvements, or do they unintentionally standardize what "sustainable" streetscapes should look like across different urban settings?

Research questions

  1. What general visual characteristics are associated with AI-generated sustainable streetscapes?
  2. How does the semantic composition of streetscapes shift during transformation toward becoming "more sustainable"?
  3. To what extent do these transformations lead to homogenization of streetscape semantics and a reduction of cross-city variation?
  4. How does prompt specificity (context-rich vs. generic) shape what generative AI produces under the label "sustainable"?

Methodology

The project combines three methodological pillars: street-view imagery collection from selected cities (Jakarta, Melbourne, Singapore), controlled image-to-image transformation using OpenAI multimodal models (GPT-4o, GPT-5), and semantic analysis of the outputs through pre-trained segmentation models (Mask2Former with Swin-Large backbone, trained on ADE20K).

Pixel-level segmentation outputs are aggregated into seven domain-relevant categories (sky, vegetation, built structure, road infrastructure, vehicle, water and natural, street furniture) and projected into a shared two-dimensional space using UMAP. Scene typologies are identified through K-Means clustering on the raw images, then projected onto the AI-generated images as a fixed baseline. This design choice — clustering on raw images only — ensures that any redistribution of AI images across clusters reflects a genuine structural shift, not a contamination of the reference geometry.

Homogenization is assessed using four complementary metrics: mean pairwise distance, normalized cluster entropy, feature diversity, and convex hull area. Cross-city distinguishability is tested using chi-square independence tests with Cramér's V.

Generation pipeline: raw street-view images transformed by GPT-5 into AI-generated sustainable versions at 1024x1024 resolution.
Image generation pipeline. Raw Google Street View imagery is transformed by GPT-5 into AI-generated "sustainable" versions at 1024 × 1024 resolution.
Four-cluster typology of streetscapes across raw and AI-generated images, with examples from Jakarta, Melbourne, and Singapore.
Examples from each of the four scene typology clusters (A–D), comparing raw Google Street View imagery against AI-generated counterparts across Jakarta, Melbourne, and Singapore.

Key findings (so far)

  • Vegetation inflation in every cluster and city, with road and sky area reciprocally compressed.
  • Within-city compositional convergence: feature diversity declines 23%–31% across the three cities studied.
  • Between-city distinguishability is preserved — Jakarta, Melbourne, and Singapore remain statistically separable after transformation.
  • Prompt specificity matters: context-rich prompts produce more balanced streetscapes (with sidewalks, stormwater features, public seating, lighting), while generic prompts default to culturally salient "sustainability markers" — cycle lanes and manicured greenery.
Papers

Papers

Manuscripts produced under this project. Both papers are currently in progress and under journal consideration.

Manuscripts are currently under journal consideration. Preprints, full citations, and links will be added once available.

Datasets

Datasets

Open-access datasets released under the project, hosted on Zenodo and Hugging Face.

Dataset v1 — Jakarta, prompt-treatment study

Our first dataset accompanies the prompt-specificity study. It contains 100 Jakarta Google Street View scenes, each accompanied by two GPT-4o generated counterparts — one produced under a generic "non-context" sustainability prompt, and one under a "context-rich" prompt that explicitly enumerates ten sustainability categories (sidewalks, street furniture, trees and landscaping, lighting, bicycle infrastructure, stormwater management, public open spaces, building frontage, smart technologies, and mobility systems).

The dataset enables direct comparative study of how prompt specificity reshapes AI's interpretation of sustainability at the streetscape level, and includes both raw images and segmented counterparts.

Examples of raw, context, and noncontext generated streetscapes for Jakarta scenes.
Example entries from Dataset v1. Each row pairs a raw Jakarta street-view scene with its context-rich and non-context AI-generated counterparts.
Segmented versions of the raw, context, and noncontext images alongside their semantic masks.
Each entry is paired with semantic segmentation outputs. Pixel-level masks support quantitative analysis of object appearance under different prompt treatments.
Zenodo · Open access

Sustainable streetscape, prompt-specificity dataset

100 Jakarta scenes × 3 treatments (raw, context, non-context), with paired segmentation masks. Full descriptions of fields, file structure, and generation parameters are documented on the Zenodo record page.

Dataset v2 — Cross-city corpus

The second dataset widens the scope from a single city to three: Jakarta, Melbourne, and Singapore. We collected 1,000 random points per city along OpenStreetMap road networks, retrieved the nearest available Google Street View imagery, and filtered for indoor, blurred, and non-representative scenes. The final corpus contains 567 images from Jakarta, 832 from Melbourne, and 694 from Singapore.

Each raw scene is paired with a GPT-5 generated "more sustainable" counterpart (2,093 raw + 2,093 generated = 4,186 images total). All images are standardized to 640 × 640 pixels.

Hugging Face · Open access

genai_sustainablestreetscape

Cross-city corpus of paired raw and AI-generated streetscapes from Jakarta, Melbourne, and Singapore. Includes generation parameters, source coordinates, and metadata sufficient to reproduce the cross-city semantic analysis.

Both datasets are released under open-access terms — see the respective repository pages for citation information and licensing details.

Funding

Funding

Grants supporting the Prompting Sustainability research project.

This research is supported by the following grant.

Additional funding sources will be listed as the project progresses.

News

News & updates

Project announcements, talks, dataset releases, and publications.

No news yet. Updates will be posted here as the project progresses.
Contact

Contact

For project inquiries, dataset access questions, or collaboration proposals.

Mohammad Raditia Pradana

Mohammad Raditia Pradana

Universitas Indonesia