Methodology

Subsidized Housing Database

CoreData.nyc is host to the NYU Furman Center’s Subsidized Housing Database, a property-level database that links multiple sources of housing subsidy data to individual properties in New York City. The Subsidized Housing Database includes residential properties with 5 or more residential units and at least one active housing subsidy (based on available data). Therefore, properties with expired subsidies are not included in the Subsidized Housing Database. When data are updated, expired subsidies will be removed. Generally, records in the Subsidized Housing Database are for multi-family properties.

Program Descriptions: Descriptions of the subsidy programs included in CoreData.nyc, and notes on methodology by program can be found in the Directory of NYC Housing Programs.

Data Linking: There are dozens of housing subsidies in New York City from a variety of government sources (see the Directory of NYC Housing Programs for more information). We acquire these data from several sources, including the New York City Department of Finance (DOF), the New York City Department of Housing Preservation and Development (HPD), the New York City Housing Authority (NYCHA), the New York State Department of Housing and Community Renewal (HCR), and the United States Department of Housing and Urban Development (HUD).

Because data from each of these sources is formatted differently, we perform a number of data cleaning techniques to link the data at the property level:

Standardizing Addresses: Properties in the Subsidized Housing Database are tax parcels, identified by a unique borough-block-lot (BBL) number. Oftentimes housing subsidy data includes an address - but not the BBL - of a property. Therefore, we must convert addresses into the corresponding BBLs. To do this, we use Geosupport, a geocoding tool provided by New York City’s Department of City Planning that identifies the BBL for a given address. All address data in the Subsidized Housing Database is processed through Geosupport. See the Geocoding Application page for more information about Geosupport. In some cases, Geosupport fails to identify a BBL for an address. When this occurs, we assign the BBL based on a manual map search. If the manual search does not produce a BBL (due to incomplete information in the subsidy record) the record is dropped. For NYCHA properties, we use map files provided by the agency to spatially join developments to MapPLUTO.
Identifying Related Properties: An individual subsidy agreement may apply to more than one property. In some cases, the data we receive only lists a single address for a subsidy even though the agreement covers multiple properties. In order to identify all properties associated with a subsidy, we search New York City property records to identify the related properties. First, we convert the address listed in the subsidy agreement to its BBL using the process described above. We call this property the reference BBL. We then search New York City Department of Finance’s Automated City Register Information System (ACRIS) to identify related properties - properties that are referenced in documents associated with the reference BBL. We then cross-check information in the subsidy agreement with the property information from ACRIS to ensure the related properties are in fact associated (we match based on the number of residential units). We only allow matches to associated BBLs with a building class that falls under the following building classification codes from the Department of Finance: A, B, C, D, R, S, L, I, N, H3, H6, H7, H8, and K4.
Mapping Addresses: All BBL data is mapped based on MapPLUTO, a geographic dataset of tax lots, provided by the New York City Department of City Planning. In the rare case that a BBL in the subsidy data does not match a MapPLUTO record, we drop the record since it cannot be mapped in CoreData.nyc.

Subsidy Start and End Dates: For a given subsidy, the start date indicates when the subsidy was issued or when the property was placed in service. The end date indicates when the requirements associated with the subsidy end or expire. In cases where only the year is known, the start date is set to January 1st and the end date is set to December 31st.

If a subsidy is permanent or has no foreseeable end date (e.g. Public Housing, Inclusionary Zoning), the end date is listed as blank. Program notes are as follows:

Mitchell-Lama: The end date listed for Mitchell-Lama properties reflects the opt-out date. The Subsidized Housing Database includes Mitchell-Lama properties that have exceeded their opt-out date but (as best we can determine) have not opted out of the program.
Low-Income Housing Tax Credit (LIHTC): LIHTC data is from HUD’s LIHTC Database and only includes the subsidy start date. For these properties we estimate the end date of the LIHTC subsidy to be 30 years from the start date.

Neighborhood Indicators

Neighborhood Indicators in CoreData.nyc present information for the entire City of New York, for each of the five boroughs, and for the neighborhoods within each borough. The city defines neighborhoods by dividing the boroughs into 59 community districts (CDs); the US Census Bureau, however, divides the boroughs into 55 Public Use Micro Areas, or sub-borough areas (SBAs). CoreData.nyc provides data for CDs where available, but otherwise employs data at the SBA level. The term neighborhood refers to both CDs and SBAs.

Borough: New York City consists of five boroughs: the Bronx, Brooklyn, Manhattan, Queens, and Staten Island. Each borough is represented by a borough president, an elected official who advises the mayor on issues related to his or her borough and, along with the borough board, makes recommendations concerning land use and the allocation of public services. Each borough is also a county. Counties are legal entities with boundaries defined by state law.

Community District (CD): Community districts are political units unique to New York City. Each of the 59 community districts has a community board. Half of the community board’s members are appointed by the borough president and half are nominated by the City Council members who represent the district. The community boards review applications for zoning changes and other land use proposals and make recommendations for budget priorities. Each community board is assigned a number within its borough. The borough and this number uniquely identify each of the 59 community districts. Therefore, we designate each community district with a two-letter borough code and a two-digit community board code. For example, BK 02 is the community district represented by Community Board 2 in Brooklyn.

Sub-Borough Area (SBA): Sub-borough areas are geographic units created by the U.S. Census Bureau for the administration of the New York City Housing and Vacancy Survey and were designed to have similar boundaries to those of community districts. Sub-borough areas are coterminous with the U.S. Census Bureau’s Public Use Microdata Areas (PUMAs), so we use the two terms interchangeably. Sub-borough areas are referred to using a three-digit number, where the first digit signifies the borough. There are 59 community districts in New York City but only 55 sub-borough areas. The U.S. Census Bureau combined four pairs of community districts in creating the sub-borough areas to improve sampling and protect the confidentiality of respondents. These pairs are:

Sub-borough Area	Public Use Microdata Area	Community Districts
101	3810 Battery Park City, Greenwich Village & Soho	Financial District (MN 01) Greenwich Village/Soho (MN 02)
103	3807 Chelsea, Clinton & Midtown Business District	Clinton/Chelsea (MN 04) Midtown (MN 05)
201	3710 Hunts Point, Longwood & Melrose	Mott Haven/Melrose (BX 01) Hunts Point/Longwood (BX 02)
202	3705 Belmont, Crotona Park East & East Tremont	Morrisania/Crotona (BX 03) Belmont/East Tremont (BX 06)

United States Census Sources

Decennial Census (Census): From 1970 through 2000, the decennial census consisted of two parts: the “short form” that collected information from every person and about every housing unit in the country, and the “long form” of additional questions asked of a sample of people and households. The short form collected information on age, race, Hispanic or Latino origin, household relationship, sex, tenure, and vacancy status. The long form provided more in-depth information about personal and housing characteristics such as income, employment status, and housing costs. Coredata.nyc uses data from the decennial census short and long forms to derive demographic, economic, and housing measures for the years 2000 and 2010. To create most of these indicators, we use summary census data reported at the city, borough, and sub-borough area levels. In 2010, the decennial census only included the short form since most of the data that had previously been included in the long form were now reported in the American Community Survey (see below). While much of the decennial census short-form data is also found in the American Community Survey (such as the count of households), the two sources often report differing numbers for statistical and methodological reasons.

American Community Survey (ACS): The American Community Survey (ACS) is an annual survey that collects data similar to those formerly collected by the census long form described above. As with the long form, the ACS covers only a sample of individuals and housing units, approximately one in 40 housing units each year. Reliable annual estimates for indicators in geographic areas with a population of 65,000 or more became available from the ACS in 2005. For most city- and borough-level indicators, we report figures derived from one-year estimates from the ACS. Because of the COVID-19 pandemic, the Census Bureau did not release its standard 2020 ACS one-year estimates. In other years, for some indicators, due to the small sample size, one-year estimates can be prone to volatility and sampling error, which can make it difficult to reliably discern whether an indicator’s change from one year to the next represents a real change or a statistical anomaly. In order to reduce this uncertainty and draw valid conclusions from differences over both time and space, for select indicators we use five-year ACS estimates. Due to space constraints, multiyear estimates presented in CoreData.nyc are labeled using only the final year of the range that they cover (that is, an indicator from the 2010–2014 ACS is listed under the heading “2014”). Multiyear estimates, however, should be interpreted as a measure of the conditions during the whole range. The indicators and years that use multiyear estimates rather than one-year estimates are noted by indicator in the Data Dictionary.

Public Use Microdata Samples (PUMS): While most decennial census- and ACS-derived indicators use pre-tabulated summary data that are reported at a given geography, we calculate some indicators by aggregating person- and household-level data to the desired geographic level. The US Census Bureau makes individual-level data available in Public Use Microdata Samples (PUMS), which are anonymized extracts from the confidential microdata that the US Census Bureau uses in its own calculations for the decennial census and the ACS. Indicators that are calculated through PUMS are noted in the Data Dictionary.

Indicator Notes

US Department of Housing and Urban Development Income and Rent Limits: The US Department of Housing and Urban Development (HUD) defines income eligibility limits for its Section 8 and HOME programs based on the area median income (AMI) in a metropolitan area. HUD determines three general income limits at 30, 50, and 80 percent of AMI for various household sizes. We employ HUD’s general method to calculate 120 percent of the area median income for various household sizes.

HUD assigns category names to ranges of the area median income and uses “low-income” to describe households that have incomes above 50 and at or below 80 percent of AMI. However, CoreData.nyc uses “low-income” to describe any household earning at or below the 80 percent limit.

In order to calculate the share of rental units that are affordable to households of various income levels, we need to take household size into account, since the definition of income limits (and thus maximum affordable housing costs) vary by household size. For a rental unit with n bedrooms, we classify it as affordable at X percent of AMI if its gross rent is less than the maximum affordable rent specified by HUD for a household of size n+1; that is, a studio (i.e. a unit with zero bedrooms) is classified according to the maximum rent values for single-person households, a one-bedroom is classified according to the maximum rent values for two-person households, a two-bedroom is classified according to the maximum rent values for three-person households, and a unit with three or more bedrooms is classified according to the maximum rent values for four-person households. HUD does not publish income guidelines for households with more than eight members, although its methodology allows for their calculation. To ease computation, we apply the eight-person limits to these larger households. This method makes assumptions about the composition of the households that occupy each unit. Therefore, this indicator should be interpreted with some caution. For more information about HUD’s method and their published guidelines, refer to individual years’ guidelines here.

Mortgage Lending Indicators: The Federal Home Mortgage Disclosure Act (HMDA) requires financial institutions with assets totaling at least $44 million as of 2015 to report information on loan applications and originations if they have originated or refinanced any first-lien home purchase loans on one- to four-family properties (including condominium and cooperative units) in the previous year. Thus, the HMDA data capture most, but not all, one- to four-family residential mortgage lending activity. All figures in our analysis are based on non-business-related loans on owner-occupied, one- to four-family properties (including condominiums). We exclude from our analysis any loans for manufactured or multi-family rental housing (with five or more units), loans on properties that are not owner-occupied, and any loans deemed to be business related (classified as those loans for which a lender reports an applicant’s ethnicity, race, and sex as “not applicable”). Since 2004, HMDA requires lenders to report when the spread between the annual percentage rate (APR) of a loan and the rate of Treasury securities of comparable maturity is greater than three percentage points for first-lien loans and five percentage points for junior-lien loans. For indicators included in CoreData.nyc, all loans with an APR above this threshold are referred to as higher-cost loans.

Notices of Foreclosure (Lis Pendens): We receive data on lis pendens (LP) filings from a private vendor, Public Data Corporation. An LP may be filed for a host of reasons unrelated to a mortgage foreclosure, so we use a variety of screening techniques to identify only those LPs related to a mortgage. These techniques include searching for words within either of the party names and dropping any LPs that relate to a tax lien or a mechanic’s lien, or that are originated by a government agency. If the same property receives any additional LPs within 90 days of the initial LP, the additional LPs are not included in our rate to avoid counting the same foreclosure twice.

Aggregating Student Performance: The New York State Education Department publishes school-level proficiency rates every year. We joined the proficiency data with a school facilities shapefile provided by the New York City Department of City Planning’s Bytes of the Big Apple website, which also includes each school’s community district. We removed private and charter schools and then summed up the number of fourth graders scoring “proficient” in math and English language arts, and the number of students who were tested in each subject. We use those aggregates to calculate proficiency rates at the community district level. Since students can attend schools outside of their community district (for example, if their school zone extends beyond the borders of their community district), the student performance indicators provide information about the performance of students who attend schools in that neighborhood, rather than the performance of students who live in that neighborhood. Because of the COVID-19 pandemic, these exams were not administered in 2020 and were optional for students to take in 2021. Therefore, the 2020 data points were unavailable, and the 2021 results may not accurately represent student performance.

Inflation Adjustments

Unless stated otherwise, when reporting dollar-denominated indicators, we adjust amounts to real dollars using the Consumer Price Index for All Urban Consumers (Current Series) without seasonal adjustments from the Bureau of Labor Statistics over all major expenditure classes for the New York City metropolitan area. This allows for more consistent comparisons across years for individual indicators. The annual consumer price index (CPI) we use to make these adjustments is released by the Bureau of Labor Statistics in February, and so our adjustments in CoreData.nyc will be delayed until this time each year.

User Guide

Methodology

Subsidized Housing Database

Neighborhood Indicators

United States Census Sources

Indicator Notes

Inflation Adjustments