One of the features of big data is the accumulation of massive amounts of information that are not suited to traditional econometric and statistical techniques. I predict that this phenomenon will someday change the way real estate economics are done.
Models of house prices are used in a lot of ways. This is how a lot of cities and counties do “mass appriasals”. It’s how house price indexes separate price changes from quality changes. When companies like Zillow generate Zestimates of what a house is worth, these are the models underlying them. It’s how the Bureau of Labor statistics adjusts housing rents for quality changes and depreciation when computing the CPI.
The general form of these models is to take the log of price as the dependent variable, and housing and neighborhood characteristics as the independent variable. But the list of independent variables that is usually available gives you a relatively limited description. While the exact list varies form data source to data source, typically it might look like: building square footage, lot square footage, number of stories, number of bedrooms, number of bathrooms, central air (yes/no), number of fireplaces, garage size, finished basement (yes/no), and of course address. Using this last variable you can get a lot more information about the neighborhood from other data sources.