Summary: In a recently completed report for Congress, we evaluated how the U.S. Department of Agriculture's (USDA) Rural Housing Service (RHS) makes eligibility determinations for its rural housing programs. As part of that review, we used 2000 census data to determine the populations of the rural areas that received RHS housing program loans and grants. We obtained information on the RHS loans and grants provided to communities, from October 1998 through April 2004, from databases maintained by USDA's Information Resource Management (IRM) in St. Louis, Missouri. As with any system, the accuracy of the data and the process used for entry affects reliability and usefulness for management and reporting purposes. During our review, we identified several issues that raised concerns about the accuracy of the information in the IRM databases. For example, while we originally intended to geocode (that is, match) 5 years of the national RHS housing loan and grant portfolio to specific communities, the time needed to ensure the reliability of the data required us to limit much of our analysis to five states (Arizona, California, Maryland, Massachusetts, and Ohio). This report is a follow-up on our report to Congress, and its purpose is to discuss the implications of the data issues for the management and reporting functions of the Administrator, Rural Housing Service. In this report, we describe (1) the types of inaccuracies we encountered with the RHS data and (2) what, if any, reviews and systems controls are in place to detect or control database errors.
Our analysis of information in USDA's IRM loan and grant databases raised concerns about the accuracy of the databases. In reviewing 29,000 records for five states we found incorrect, incomplete, and inconsistent entries. For example, over 8 percent of the community names or zip codes were incorrect. Additionally, inconsistent spellings of community names distorted the number of unique communities in the database. More than 400 entries lacked sufficient information (i.e., street addresses, community names, and zip codes) that are needed to identify the community to which the loan or grant had been made. As a result, some communities served by RHS were double counted, others could not be counted, and the ability to analyze the characteristics of communities served was compromised. Because data from these systems are used to inform Congress, senior agency management, and the public about the reach and effectiveness of RHS programs, eliminating erroneous data will help ensure that key decisions and analyses are reliably supported. However, we found RHS lacks appropriate reviews and database entry processes that could prevent or detect inaccurate or incomplete data in its normal course of business. For example, RHS does not have procedures for second-party review of the data in IRM systems. Moreover, while the databases have edit functions in place that are intended to prevent the entry of nonconforming data (such as the entry of a community name in a street address field), the functions are not preventing incorrect or incomplete entries.