Blog
Earlier this year, the Chief Data Officers Council requested public feedback on how they can continue to improve the government’s management, use, protection, dissemination and generation of data in government’s decision-making and operations.
In response, the Data Coalition hosted a virtual public forum to create an opportunity for the data community to offer feedback, recommendations, and advice to the federal CDO Council. As a result of that forum, as well as research informed by the Data Foundation's CDO Insights Survey, we offered the following 12 recommendations to the CDO Council. Our full comments are available here.
The success of CDOs in the federal government hinges on their ability to perform expected and critical tasks. If they are successful, government data can be an asset, creating a robust data infrastructure that will serve a variety of purposes, including improving operational decision-making and evidence-based policymaking capabilities. While there are challenges, the progress of CDOs over the past year is commendable. We hope to continue a productive working relationship and dialogue with the Council going forward and are happy to respond to any questions you may have regarding these recommendations.
Author: Amanda Hejna, Data Foundation Fellow, and Senior Associate with Grant Thornton Public Sector
The Advisory Committee on Data for Evidence Building (ACDEB) was formed over a year ago to provide recommendations to the White House Office of Management and Budget (OMB) on how agencies can better use data to build evidence and improve decision making across the federal government and beyond. Composed of data experts from all levels of government and the public sector, the Committee was charged with forming a foundational understanding of the current state of and future needs for the use of data for evidence building and in doing so fulfill the spirit and vision of the Evidence Act.
Throughout the first year, the Committee focused particularly on developing a vision and framework for National Secure Data Service that would connect data users at all levels of government and the public and establish a unified evidence-building system across the federal government. At the culmination of Year 1, the Committee presented seven high-priority recommendations to the Director of OMB. These actionable and timely items will contribute directly to ongoing implementation of the Evidence Act and the establishment of a successful National Secure Data Service:
A number of subcommittees drilled down into specific focus areas and presented additional recommendations to the broader ACDEB. Focus areas included Legislation and Regulations; Governance, Transparency, and Accountability; Technical Infrastructure; Government Data for Evidence Building; and Other Services and Capacity-Building Opportunities. These preliminary recommendations will be integrated into the Committee’s Year 2 agenda as it looks to define the steps needed to fully operationalize the National Secure Data Service. In the next year, the Committee will continue to expand on its success to advance the use of data for evidence building and ultimately produce better results for the American people.
Last week the White House Office of Management and Budget (OMB) released the Federal Data Strategy (FDS) 2021 Action Plan, an interagency effort meant to coordinate and leverage data as a strategic asset across the Federal government. Building upon the FDS 2020 and stakeholder engagement, the newly released strategy places emphasis on workforce development and data leadership within agencies.
Part of the Executive Branch’s management agenda, the FDS is a 10-year plan to establish best practices for ethical data governance, management, and use. The FDS is an iterative process, with each Action Plan intended to incorporate lessons learned from agencies the prior year, public comments, and takeaways from conversations with data professionals from both government and non-government stakeholders –– such as the forum hosted last year by Data Coalition.
OMB identified major successes from Year 1 regarding the formation of agencies’ planning, governance, and data infrastructure foundation. For example, praise for the establishment of the interagency Federal Chief Data Officer (CDO) Council, the creation of a data upskilling pilot, and improvements to data inventories within Data.gov.
Learning from Year 1’s successes, and identified challenges –– such as the need for more statutory requirements, published guidance on timelines, and additional interagency working groups –– the 2021 FDS lays out 11 action categories of 40 practices for agencies to implement going forward. Year 2 seeks to offer agencies more flexibility in achieving the Action Plan milestones in hope to meet agencies where they are in their foundational activities from FDS 2020.
Five out of 11 actions require specific interagency councils to identify pilot projects or government-wide services, highlighting the necessity of collaboration among data leadership. Some Year 2 practices include making public non-classified AI use case inventories, improved linkage and governance of wildfire fuel data, and creation of a data-skills training playbook. The 2021 FDS also reiterates goals from 2020, such as continued assessment of data to answer agency questions as well as maturation of data governance and infrastructure.
Although the Data Coalition members appreciate the Year 2 strategy’s focus on workforce development and the role of data leadership within agencies, there are still many barriers to the next steps of implementation of improved data practices across the Federal government. On November 9, Data Coalition will be hosting a public forum to discuss key takeaways from the Action Plan, seek feedback to the Federal CDO Council’s recent Request for Information, and gather additional information on how to best assist in a collaborative effort to realize the full benefits of the evidence-informed policy in practice.
Each year, federal agencies provide Congress with funding requests that explain the resources needed to run programs and achieve their missions. These publicly available requests, called congressional budget justifications, are not collected into a structured central repository which makes locating particular budget justifications challenging for congressional offices, federal agencies, White House staff, and the American taxpayer.
This bill seeks to provide open and transparent data about how agencies allocate resources, a pillar of accountable government. It will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time. Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost less than $1 million per year to implement.
WHAT IS THE CONGRESSIONAL BUDGET JUSTIFICATION ACT (P.L. 117-40)?
The Congressional Budget Justification Transparency Act (P.L. 117-40) directs federal agencies to publish more information online about federal spending. Specifically, the bill would require:
WHAT ARE CONGRESSIONAL BUDGET JUSTIFICATIONS?
Congressional budget justifications (CJs) are documents submitted by Executive Branch agencies to support the annual President’s Budget Request, typically in February. The justifications are intended to be plain-language explanations for how agencies propose to spend funding that they request from congressional appropriators, core priorities and performance goals, and a summary of past performance.
WHAT PROBLEM DOES THE CONGRESSIONAL BUDGET JUSTIFICATION TRANSPARENCY ACT SEEK TO SOLVE?
Agency budget justifications contain a wealth of information about agency performance and priorities but are published as large, unwieldy documents. Currently, agencies are only required to produce a machine-readable summary table for the budget submission, meaning many data elements and core features of the justification are not captured.
The absence of consistent, machine-readable data means the American public, congressional offices, third-party intermediaries, and even OMB staff must manually review and transpose information in the budgets for relevant analysis. Moreover, the lack of a structured database limits the accessibility of detailed budget proposals to those who know how to find them, which in turn limits transparency for the American public and clear opportunities for accountability and oversight.
WHAT ARE AGENCIES DOING NOW?
There is no publicly-available, comprehensive list of agencies that must publish CJs. However, according to a 2019 survey conducted by Demand Progress of 456 agencies, over 20% did not publish any CJs publicly. Only 13 agencies of those surveyed (3%) published their CJs online in both FYs 2018 and 2019. While all 24 Chief Financial Officers Act agencies (i.e., large agencies) were among those who did publish their CJs online, independent agencies were found to be especially difficult to locate, according to the survey. Demand Progress noted in their survey methodology that they found more than 40 alternative document titles. This lack of standards creates confusion, inhibits transparency, and causes roadblocks to those who need access to budget information to support decisions about resource allocation or to fulfill transparency and accountability goals.
WHAT ARE THE BENEFITS OF IMPROVING ACCESSIBILITY TO AGENCY BUDGET JUSTIFICATIONS?
Open and transparent data about how agencies allocate resources are a pillar that supports an accountable government. This bill will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time. Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost $500,000 per year to implement.
HOW WOULD CONGRESSIONAL APPROPRIATORS AND OMB STAFF BENEFIT FROM THIS BILL?
Staff across federal agencies, congressional offices, and even the White House budget office spend countless hours searching, collating, and repurposing content for budget formulation activities each year. Part of this exercise often requires agency staff to review old congressional justification materials to identify historical funding trends. By simply adjusting how information is published, staff supporting budget formulation and execution across agencies and branches of government will be able to more efficiently and accurately portray budgetary information to support decision-making on resource allocations. The same is true for reviewing and applying agency performance measures to promote effective performance management in the budget formulation and execution processes.
WHAT IS THE ROLE OF THE OFFICE OF MANAGEMENT AND BUDGET?
OMB coordinates the federal budget formulation and execution processes. After annual budgets are developed and proposed funding levels agreed to within the Executive Branch, agencies are required to submit congressional justification materials for review and clearance by OMB staff. This requirement, established in OMB Circular A-11, dictates that agency justification materials align with the formal President’s Budget Request published annually by OMB.
OMB also requires agencies to publish justifications at a vanity URL (agencyXYZ.gov/CJ) following transmittal to Congress, unless exempted for national security purposes. However, while OMB publishes top-line budgetary information in the President’s Budget Request volumes, OMB does not provide a consolidated database or repository for agency justifications. OMB already publishes many other budget documents on a central website, and adding the CBJs to that site would be a useful resource for Congress, agency staff, journalists, watchdogs, and the general public.
WHAT IS THE STATUS OF THE BILL?
S. 272 passed the Senate in June 2020 and the House in August 2021. It is expected to be signed by the president in the coming days.
DOES THE LEGISLATION HAVE BIPARTISAN SUPPORT IN CONGRESS?
Both the House and Senate versions have a bipartisan set of sponsors. U.S. Representatives Mike Quigley (D-IL) and James Comer (R-KY) ) in the House, and Sens. Thomas Carper (D-DE) and Rob Portman (R-OH) in the Senate.
WHAT ORGANIZATIONS HAVE ENDORSED THE LEGISLATION?
Campaign for Accountability
Data Coalition
Demand Progress
FreedomWorks
Government Information Watch
National Taxpayers Union
Open The Government
Protect Democracy
R Street Institute Senior
Executives Association Society of Professional Journalists
Taxpayers for Common Sense
Union of Concerned Scientists
It’s no secret that the government collects a trove of data from the American people – estimated to cost $140 billion each year. But the value of that information is much higher, if it can be successfully and securely applied to make decisions about policies that improve lives and the economy.
Four years ago, the 15-member U.S. Commission on Evidence-Based Policymaking issued its final report to Congress and the President. Since then, while the world has changed drastically, the vision from the Evidence Commission is more relevant than ever: to enable the use of data in our society to solve real problems for the American people.
The Evidence Commission accomplished its mission with just 18-months to learn about the nature of the country’s data challenges, study the contours of potential solutions, and reach a bipartisan agreement on salient, timely recommendations. It is already a major success story to be emulated by future government commissions, and the impact is still ongoing.
During a press conference on Sept. 7, 2017 releasing the final Evidence Commission recommendations, then-Speaker Paul Ryan and Senator Patty Murray stood side-by-side to applaud the realistic, practical solutions offered by the commission members. Speaker Ryan said: “it’s time to agree where we agree.” And in that spirit, days later, Speaker Ryan and Sen. Murray jointly filed the monumental Foundations for Evidence-Based Policymaking Act (Evidence Act).
Enacted 16-months after the commission’s report with overwhelmingly bipartisan support in Congress, the Evidence Act was the most significant government-wide reform to the national data infrastructure in a generation. The Evidence Act created chief data officers and evaluation officers in federal agencies, established processes for planning data priorities and research needs, required government data to be open by default, and enabled new data sharing capabilities within one of the world’s strongest privacy-protective frameworks. In short, the legal authority of the Evidence Act was a game changer for how our government responsibly manages and uses data. The work to implement that law is now ongoing across federal agencies.
The Evidence Act also has tremendous implications for state and local governments, federal grantees, researchers, and even allies on the international stage. The law positions the United States as a clear leader in the dialogue about producing useful evidence for decision-making, while also shifting the discourse about the role of data infrastructure in supporting basic program administration.
What’s possible today that was not four years ago? A lot. Take for example the recent efforts to improve talent in the federal government by aligning roles under the chief data officers and evaluation community. Agencies like the US Department of Agriculture are launching new enterprise data capabilities to understand what data they have and use it. Coordination across new data leaders is producing new innovations for the government, like the use of natural language processing to accelerate the review of comments on federal rules. Real dialogue is now underway to break down the barriers and silos of data within agencies, and promote more public access. A new portal for researchers to have a one-stop-shop for applying to access restricted data is under development. New pilot projects of privacy-preserving technologies are underway as public-private partnerships. All of these activities will lead to greater capacity to use data and, therefore, better information to solve the government’s most wicked problems.
While real progress is being made, there are other areas ripe for attention from leaders at the White House where implementation of the Evidence Act has lagged. Here are two examples:
The Evidence Act was a starting point, but there is still yet more work underway to implement the Evidence Commission’s recommendations. Earlier this year, Rep. Don Beyer filed the National Secure Data Service Act as a strategy to take many of the commission’s remaining recommendations for a new infrastructure capable of securely combining data, creating a pathway for implementation. That bill quickly passed the U.S. House with strong bipartisan support and is now awaiting further action in the Senate. In parallel, the new Advisory Committee on Data for Evidence Building continues to study the challenges identified by the commission and is devising recommendations that will also further address the Evidence Commission’s work.
While much progress has been made based on the commission’s advice, there is still a long path ahead in the United States to implement effectively and ensure the remaining recommendations come to fruition. Importantly, the Evidence Commission is itself an example for how to develop and use evidence in policy making. Fortunately, because of the commission members’ diligent service to the country and the leadership from Speaker Ryan, Sen. Murray, Rep. Beyer and others, the country is well on its way to realizing the promise of evidence-based policymaking.
On the first day of their Administration, the Biden-Harris team issued an Executive Order on Advancing Racial Equity and Support for Underserved Communities Through the Federal Government (Executive Order 13985). The executive order was issued to promote and protect equitable policies and data in the Federal Government. These efforts supported the inclusion of marginalized groups in Federal research and analysis, the improvement of equitable policies, and to provide each person with the opportunity to reach their full potential.
In order to ensure the implementation of the program, the White House Domestic Policy Council (DPC) is “directed to coordinate the efforts to embed equity principles, policies, and approaches across the Federal Government.” This includes efforts to remove systemic barriers, develop policies to advance equity, and encourage communication between the National Security Council and the National Economic Council. As noted in the EO, it is the responsibility of the Office of Management and Budget (OMB) to analyze and “assess whether agency policies create or exacerbate barriers to full and equal participation by all eligible individuals.” This responsibility is key to identifying and quantifying the challenges toward equity.
The Executive Order recognized the important role of disaggregating data, or data that has been broken down by detailed sub-categories, such as race, ethnicity, gender, disability, income, veteran status, and other key demographic variables, by creating the Equitable Data Working Group. The Working Group has been tasked with “identifying inadequacies in existing Federal data collection infrastructure and laying out a strategy for improving equitable data practices in the Federal government.” This is accomplished through the collection of new data or through the combination of multiple data sources in order to fill the data gaps that make assessments of equity difficult, which in turn supports evidence-based policies within the Federal government and state and local governments through vertical policy diffusion. “By exploring key policy questions dependent upon underutilized, inaccessible, or missing data, the Equitable Data Working Group explores ways to leverage government data in order to measure and promote equity.”
Despite overwhelming positives in exposing gaps of data, the Group recognizes there are possible unintended consequences when considering privacy and the vulnerability of underserved populations. With this in mind, aggregating data into summary data can help understand broad trends within these communities without disseminating personal data. For example, the National Crime Victimization Survey (NCVS) collects data on self-reported accounts of criminal victimization. The NCVS produces reports that break down victimization data by race, ethnicity, gender, age, marital status, and income. However, once the data is able to be separated by race, researchers and analysts can provide summary statistics and better insights into disparities, without exposing personal identifiers. This protects the privacy of those who have been surveyed while still leveraging the data collected, while helping us answer important policy questions about crime.
The Data Coalition Initiative will be looking for how the Working Group is approaching these issues when its first report is provided to Ambassador Susan Rice, Assistant to the President for Domestic Policy, this fall, which will identify and discuss the barriers and gaps of equitable data identified through case studies, along with recommendations on how to address these problems.
The Working Group report will also include a plan to foster new partnerships among Federal agencies, academic and research partners, state, local, and tribal governments, community and advocacy groups, and other stakeholders, in order to leverage Federal data for new insights on the effects of structurally biased policies, and to advance capacity for multilayered, intersectional analysis of Federal datasets. The Data Coalition is looking forward to the chance to engage with the Working Group on its efforts, and will continue to provide updates as their important work progresses.
The nation is preparing to send its children back to school this fall, but there will be many questions about the on-going impacts the pandemic has on our children, both in the short and long term. While there are a great many strengths of our country’s educational infrastructure, the data infrastructure applied to improving learning and the workforce continues to face substantial gaps. In order to understand, adapt to, and mitigate the impact of the pandemic, we must ensure that there is a robust data infrastructure. One way to ensure there is timely useful data about our learners and workers is to provide significant and sustained funding for the Statewide Longitudinal Data Systems (SLDS).
SLDS is a Federal government program that allows and provides access to historical data on public-school enrolled students and teachers starting from the 2006-2007 school year. The SLDS system was designed to improve data-driven decisions impacting student learning and education. It focuses on the connection among PreK, K-12, postsecondary, and workforce education data. School districts, public schools, and teachers can access the data system via their district’s Student Information Systems (SIS). It is accessible through a free application that is available to eligible state grant recipients, such as school districts, schools, and teachers. This data includes assessment scores, daily attendance, enrollment, courses, and grades. In its most advantageous state, it enables grantees to link individual level data from Pre-K to the labor market.
The SLDS plays a significant role in creating data-driven policies. While the information is collected and stored, the grant program also provides more accessible data in order to get a better understanding of a policy’s impact on student learning. Moreover, it encourages policy efficiency and equity by quantifying educational measurements over time. Data-driven systems such as SLDS provide transparency about which policies affect students and the significance of their impact.
The SLDS has meaningful benefits, although there are also challenges when implementing a data-driven program. States have been able to put this data to work to better support students on pathways to the workforce. Currently, every state, the District of Columbia and Puerto Rico has an SLDS that connects data between some data systems, but few can connect early education, K-12, postsecondary, and workforce. This makes it challenging to study and evaluate programs intended to improve outcomes in college and the workforce. As states and federal programs strive to boost education attainment and close the skill gaps in the workforce, it is vital that our country has the ability to produce rigorous analyses based on high-quality data.
New, sustained investment in SLDS data can provide the important information to answer the critical questions policymakers, educators, as well as parents and students. This will require a significant, multi-year investment of $1 billion. This funding should focus on modernizing SLDS data systems to build more interoperable and accessible data platforms with privacy-preserving technology as well as building capacity to use SLDS data through state research-practice partnerships that bring both real-time learning and longitudinal data, as well as diversify representation of practitioners. Finally, funding should be directed to ensuring robust governance and accountability structures are put into place to ensure these systems transparently address the real priorities, needs and community expectations.
Not only is this funding necessary to improve the data infrastructure to meet the needs of learners and workers, it is necessary to make this a sustained funding level, so that these systems have the resources to evolve to meet ever changing research needs and privacy protection.
Sustained and continued financial investment in the SLDS program would help ensure data-driven success and proper-use of the data. An increase in funding will help provide the much needed update to the data infrastructure necessary to advance evidence based policymaking, and modernize privacy protection. Providing this funding for SLDS is smart investment that ensures we will have the evidence and data to provide the best outcomes for our students.
Crime data – which includes data on types of crime, demographics of victims and perpetrators, corrections, recidivism and reentry and court information – is crucial evidence that is used to inform policy decisions in all jurisdictions. In the United States, national crime data is aggregated at the federal level, by the Department of Justice’s Bureau of Justice Statistics (BJS) and Federal Bureau of Investigations (FBI). Reliable and up-to-date criminal justice statistics are imperative in order for policymakers to make evidence-based decisions. However, as questions around policing and criminal justice become ever more pressing, it is worth exploring the challenges and limitations of crime data so that we may identify opportunities to improve both the data, and the policy decisions informed by the data.
The National Crime Victimization Survey (NCVS) is the nation’s leading source for statistical information on victims of crime. Victim, offender and crime characteristics are reported along with reasons for reporting or not reporting the crime. However, there are serious limitations to the NCVS. The survey is self-reported by the victim and is not recorded when a crime is committed nor when there is victim to a crime. A sample of households in the United States are taken every 3.5 years and households are interviewed 7 times within that span. Only 71% of households sampled responded to the survey in 2019. Since the survey is only a sample, it does not capture variation in victimization patterns at the local or state level. Therefore, victimization patterns at the city level would require additional research. Additionally, data on the effectiveness of policing practices when addressing crime and victimization is lacking from NCVS reports.
An example of the type of effective additional research is the Data Foundation’s Policing in America Project a multi-pronged, open data effort to systematically improve evidence about how the American people view the criminal justice system and police forces. The project focuses on the value of building data capabilities to enable a more robust understanding of the relationship between perceptions of law enforcement agencies and the conditions in select cities, including disparate perceptions by sub-populations.
In addition to the NCVS, which relies on traditional surveys of victims, a good deal of crime data is reported to the Department of Justice by local law enforcement. The FBI’s Uniform Crime Report (UCR) has been used to provide crime statistics since 1930. The BJS, the primary statistical agency of the Department of Justice, uses UCR data in their publications and datasets. BJS has long been trusted to publish up-to-date and accurate information, utilized by academia and professionals for criminal justice reports and open access data. The data from BJS and the UCR includes local, state, and national level data on corrections, courts, crime, the Federal justice system, forensic science, law enforcement, recidivism and reentry, tribal crime, and victims of crime. The data is reported by local law enforcement agencies to form a national database of criminal justice statistics. In practice, this has led to incomplete and non-standard reporting to the FBI. Local jurisdictions may have different definitions of crime that can make uniform crime reporting difficult. There may be lags in reporting for local agencies, as well as incomplete data.
One challenge comes from inconsistent reporting from local law enforcement agencies (LEAs) which can make arrests difficult to calculate. Reporting data is voluntary, so LEAs may not always report the same data every year. But UCR only uses data from these voluntary reports. The procedure to calculate the aggregated national county and state arrest rates does not take into account the population covered by the UCR. Due to the variable population in UCR coverage each year, this would have a significant effect on the arrest rates. This proposes serious problems in analyzing national time series (over time) trends. Perhaps the main limitation on UCR data, however, is the difference between actual and reported crime.
In addition to inconsistent reporting, the data that is reported is not standardized. Some states may have differing definitions of crime, as well as wholly different crimes on the books. One example would be some states, such as Minnesota, have a 3rd degree murder charge, whereas other states would classify that as manslaughter. Similar challenges exist with hate crime statutes, which may be vastly different, include different demographic information, or may not be a part of the criminal penal code.
Timeliness of data is also incredibly important for informing policy, but there are significant lags in crime data. As of June 2021, aggregated arrest data was last reported in 2016, half a decade to date. Despite the availability of raw arrest data on the FBI’s Crime Data Explorer up to 2019, the Bureau of Justice Statistics has not reported the arrest figures. This means that any policy decisions based on crime data are based on data that is missing the most timely insights.
And finally, data needs to be usable by the public, academic researchers, and policymakers. This means that it needs to be published in an accessible format. Crime data has some significant challenges in this respect. But there are tools to help. The BJS Arrest Data Analysis Tool allows researchers to find national estimates and/or agency-level counts of crime. This data is sourced from the FBI’s Uniform Crime Report. The tool is significant in that viewers are easily able to generate arrest figures at the national and local level without needing the data science background required for raw data processing. While some of these challenges are unique to the crime data, many of these challenges exist in the data infrastructure throughout the country. Many initiatives are being undertaken in order to help address these problems, and optimize data for evidence-based policymaking.
The Foundations for Evidence-Based Policymaking Act of 2018 became law on January 14th, 2019. The bill requires available agency data that is accessible and reports that utilize statistical evidence to support policymaking. Annually, agencies must craft a learning agenda to address policy concerns to the Office of Management and Budget (OMB). This is an opportunity for the Department of Justice to identify and address what needs to be improved and work with stakeholders to ensure that the necessary improvements can be made. This includes an Open Data Plan that must detail how each respective agency plans to make their data open to the public.
Investment in crime and policing data has been modest, preventing meaningful updates to data collection and modernization. Additional funding for criminal justice data collection and reporting is recommended. Increasing local law enforcement training on reporting and correctly classifying all crimes to the FBI can help increase accuracy and reliability. In addition, increased efforts for interagency collaboration between local law enforcement agencies and the FBI can provide more accurate aggregate data. With these common-sense improvements, crime data can be more effective in helping craft evidence based policymaking.
The emerging need to securely share, link, and use information collected by different government agencies and entities is challenged today based on administrative, legal, and operational hurdles. The National Secure Data Service Act (H.R. 3133), sponsored by Rep. Don Beyer (D-VA), seeks to implement a demonstration project for a data service that could rapidly address policy questions and reduce unintended burdens for data sharing, while aligning with design principles and concepts presented in recommendations from data and privacy experts. The proposal specifically cites an effort to support full implementation of recommendations made by the bipartisan U.S. Commission on Evidence-Based Policymaking for data linkage and access infrastructure.
The federal government’s data infrastructure is largely decentralized. Individual agencies and programs may collect data without sharing or using information already collected by other parts of government. This imposes undue burdens on the American public and businesses through repeated reporting of information the government already has. Creating a capacity to securely share information while protecting confidentiality and deploying other privacy safeguards offers tremendous potential for developing new insights and knowledge to support statistical analysis and summary-level information relevant for evidence-based policymaking and practice.
The National Secure Data Service Act builds on the bipartisan and unanimous recommendations from the U.S. Commission on Evidence-Based Policymaking from 2017, a consensus proposal from the National Academies of Sciences, Engineering and Medicine in 2017, and a suggested roadmap published by the Data Foundation in 2020. The proposed legislation creates an expectation for the National Science Foundation to make rapid progress in launching a data service and transparently supporting government-wide evidence-building activities.
Under the proposed legislation, the data service at NSF must adhere to federal privacy laws, including the Confidential Information Protection and Statistical Efficiency Act of 2018 (CIPSEA). This law was reauthorized by Congress with bipartisan approval in 2018, establishing one of the strongest government privacy laws in the world, including strong criminal and civil penalties for misuse. The proposed data service can only operate using the CIPSEA authority and in compliance with the Privacy Act of 1974. The data service will also provide information to Congress about specific policies and practices deployed for protecting data.
Yes. Consistent with principles about transparency specified by experts from the Evidence Commission, National Academies panel, and the Data Foundation, the proposed legislation specifically directs NSF to publish information about activities that are underway. In addition, Congress will receive a report on all projects, expected to include information about the costs and benefits of each.
The National Secure Data Service builds on existing capabilities and authorities established in the Evidence Act, while also providing a resource for federal agencies, researchers, and data analysts to responsibly produce insights that can address questions in agency evidence-building plans (learning agendas). When Congress approved the Evidence Act, the creation of an advisory committee was intended to signal Congress’ continued interest in establishing a data service and provide relevant information to support implementation of next steps within two years of enactment. Now, more than two years after enactment of the Evidence Act the advisory committee continues to meet and consider technical implementation details. The proposed legislation sets up the formal authorization of a data service to continue this momentum.
No. The Federal Advisory Committee on Data for Evidence Building is a collection of nearly 30 experts considering a range of topics related to data linkage and use. Nothing in the proposed legislation restricts the ability of the advisory committee to offer OMB recommendations, as required by its charge in the Evidence Act. Instead the legislation specifically encourages NSF to consider practices and recommendations from the advisory committee as part of its administrative implementation efforts. The role of the advisory committee is also likely increasingly influential in supporting tangible implementation of activities at NSF under the proposed legislation.
No. The data service is designed to supplement rather than displace any existing, successful, and sufficiently secure data linkage arrangements. Statistical agencies engaged in production-level data collection, sharing, and publication for the development of federal statistical indicators will receive additional capabilities from the National Secure Data Service but could retain existing practices.
In 2020, the Data Foundation published a white paper establishing a framework for considering where to operate a data service in government that can meet broad needs government-wide and from across the research and evaluation communities. After exploring the range of potential options, the authors recommended the National Science Foundation given its ability to deploy the strong privacy authorities under CIPSEA, existing expertise in social sciences and computer science, the presence of one of the existing federal statistical agencies with expertise in confidentiality protections and data linkage, and NSF’s close connections and existing relationships with the research community.
The text of the National Secure Data Service Act provides NSF flexibility to determine how to implement a data service, including the possibility of issuing a contract through a Federally-Funded Research and Development Center, as recommended in the Data Foundation white paper. This recommendation was presented to the Federal Advisory Committee on Data for Evidence Building in April 2021 and to the NSF Social, Behavioral and Economics Sciences Advisory Committee in May 2021, receiving favorable perspectives and comments from respective committee members.
Precise implementation costs will vary based on the level of services and activities applied at a data service. In 2021, the Data Coalition recommended that a National Secure Data Service receive an initial annual appropriation of $50 million to support development and launch of core linkage capabilities, privacy protective features, and necessary disclosure avoidance protocols, among other features.
In 2020, the Data Coalition called on Congress to authorize a data service to support pandemic response activities, then later reiterated support following publication of the Data Foundation white paper. supporting the recommendations from the Data Foundation’s white paper.
Representing the broad membership and interests from the data community, the Data Coalition endorsed the National Secure Data Service Act filed on May 13, 2021. The Data Coalition has also encouraged administrative actions to make progress on the establishment and launch of a data service, including NSF’s recent activities on America’s DataHub.
The Administration has not formally weighed in on the proposal with a Statement of Administration Policy, however, NSF did provide technical feedback on a draft of the legislative text.
Last edited May 13, 2021
This month our RegTech 2021 series, continued by examining government uses of artificial intelligence (AI). Just last year, Congress passed legislation encouraging the government to move from pilots and demonstration projects to scaled up, applied solutions. The discussion featured two fireside chats with government leaders: Henry Kautz, Division Director, Information & Intelligent Systems, National Science Foundation (NSF) and Mike Willis, Associate Director in the Division of Economic and Risk Analysis, Securities and Exchange Commission (SEC).
First Director Kautz discussed the work in AI at NSF, as the agency seeks to fund larger, more interdisciplinary projects. Lately, the agency has been focused on establishing AI Institutes, virtual centers organized around themes, connecting colleges and universities to partners in the private and public sector. Themes include AI-Augmented Learning, AI-Driven Innovation in Agriculture and the Food System, and Advanced Cyberinfrastructure. Director Kautz emphasized the importance of NSF’s role in supporting foundational, pre-competitive research and development in these private-public partnerships.
When thinking about what challenges the government is facing, he recommends that agencies consider improving coordination among themselves on how to best make use of AI internally. He pointed out the success of coordinating bodies like the Joint Artificial Intelligence Center at the Department of Defense, but encourages the government to think more broadly about the big questions facing the government. Additional suggestions to scale up AI include building up AI expertise within the government, especially at the managerial level, being sensitive to and aware of AI skepticism, and rethinking traditional procurement practices. He also emphasized the need for explainability and transparency in ensuring ethical uses of AI and conceptualizing data as infrastructure.
In the next fireside chat, Preethy Prakash, Vice President of Business Development at eBrevia, spoke with Mike Willis from the SEC. Willis, speaking for himself, spoke of the SEC’s steps to make registrant disclosures more accessible and usable, after noticing well over 90% of EDGAR visitors are machines.
Even though these data sets are highly desired by the public and outside uses of AI, the role of AI within the SEC today is largely focused on the enhancement of the effectiveness of staff analytical procedures, including those related to risk assessments, identifying potential areas for further investigations for activities like insider trading, comment letter analysis, and entity mappings.
When asked how to think about creating quality data that is interoperable, Willis pointed directly to the Evidence Act which defines the term “open government data asset” based upon an underlying open standard. “Leveraging industry and market standards, I think, are a very useful way to drive down compliance costs, while streamlining the validation and analysis of the data, including for AI and ML purposes,” Willis stated. He went on to note how these open standards are a great example of public-private partnerships discussed previously.
As the SEC continues to implement AI, Willis outlined some change management considerations. His recommendations were to ensure that you have talented, qualified professionals, help people understand the problems and processes that AI can help supplement, provide use cases and examples, ensure that your AI solution stays within its scope, and finally, echoes Director Krautz’s call to consider data as infrastructure, meaning it be standardized and structured.
The whole conversation is available here. To learn more about our RegTech Series, sponsored by Donnelley Financial Solutions (DFIN), visit our webpage.
1100 13TH STREET NORTHWEST SUITE 800WASHINGTON, DC, 20005, UNITED STATESINFO@DATAFOUNDATION.ORG
RETURN TO DATA FOUNDATION