Log in


  • September 14, 2021 9:00 AM | Data Coalition Team (Administrator)

    Each year, federal agencies provide Congress with funding requests that explain the resources needed to run programs and achieve their missions. These publicly available requests, called congressional budget justifications, are not collected into a structured central repository which makes locating particular budget justifications challenging for congressional offices, federal agencies, White House staff, and the American taxpayer. 

    This bill seeks to provide open and transparent data about how agencies allocate resources, a pillar of accountable government. It will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time.   Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost less than $1 million  per year to implement.


    The Congressional Budget Justification Transparency Act (P.L. 117-40) directs federal agencies to publish more information online about federal spending. Specifically, the bill would require:

    • Information on any funds made available to or expended by a federal agency be posted publicly.
    • Agencies to post their annual congressional budget justifications in a structured data format and in a manner that enables users to download the reports in bulk. 
    • The White House Office of Management of Budget (OMB) to coordinate a publicly-available website with a list of each justification by agency and  fiscal year. 


    Congressional budget justifications (CJs) are documents submitted by Executive Branch agencies to support the annual President’s Budget Request, typically in February. The justifications are intended to be plain-language explanations for how agencies propose to spend funding that they request from congressional appropriators, core priorities and performance goals, and a summary of past performance. 


    Agency budget justifications contain a wealth of information about agency performance and priorities but are published as large, unwieldy documents. Currently, agencies are only required to produce a machine-readable summary table for the budget submission, meaning many data elements and core features of the justification are not captured. 

    The absence of consistent, machine-readable data means the American public, congressional offices, third-party intermediaries, and even OMB staff must manually review and transpose information in the budgets for relevant analysis. Moreover, the lack of a structured database limits the accessibility of detailed budget proposals to those who know how to find them, which in turn limits transparency for the American public and clear opportunities for accountability and oversight. 


    There is no publicly-available, comprehensive list of agencies that must publish CJs. However,  according to a 2019 survey conducted by Demand Progress of 456 agencies, over 20% did not publish any CJs publicly. Only 13 agencies of those surveyed (3%) published their CJs online in both FYs 2018 and 2019. While all 24 Chief Financial Officers Act agencies (i.e., large agencies) were among those who did publish their CJs online, independent agencies were found to be especially difficult to locate, according to the survey. Demand Progress noted in their survey methodology that they found more than 40 alternative document titles. This lack of standards creates confusion, inhibits transparency, and causes roadblocks to those who need access to budget information to support decisions about resource allocation or to fulfill transparency and accountability goals.  


    Open and transparent data about how agencies allocate resources are a pillar that supports an accountable government. This bill will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time.   Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost $500,000 per year to implement.


    Staff across federal agencies, congressional offices, and even the White House budget office spend countless hours searching, collating, and repurposing content for budget formulation activities each year. Part of this exercise often requires agency staff to review old congressional justification materials to identify historical funding trends. By simply adjusting how information is published, staff supporting budget formulation and execution across agencies and branches of government will be able to more efficiently and accurately portray budgetary information to support decision-making on resource allocations. The same is true for reviewing and applying agency performance measures to promote effective performance management in the budget formulation and execution processes. 


    OMB coordinates the federal budget formulation and execution processes. After annual budgets are developed and proposed funding levels agreed to within the Executive Branch, agencies are required to submit congressional justification materials for review and clearance by OMB staff. This requirement, established in OMB Circular A-11, dictates that agency justification materials align with the formal President’s Budget Request published annually by OMB. 

    OMB also requires agencies to publish justifications at a vanity URL (agencyXYZ.gov/CJ) following transmittal to Congress, unless exempted for national security purposes. However, while OMB publishes top-line budgetary information in the President’s Budget Request volumes, OMB does not provide a consolidated database or repository for agency justifications. OMB already publishes many other budget documents on a central website, and adding the CBJs to that site would be a useful resource for Congress, agency staff, journalists, watchdogs, and the general public.


    S. 272  passed the Senate in June 2020 and the House in August 2021. It is expected to be signed by the president in the coming days.


    Both the House and Senate versions have a bipartisan set of sponsors. U.S. Representatives Mike Quigley (D-IL) and James Comer (R-KY) ) in the House, and Sens. Thomas Carper (D-DE) and Rob Portman (R-OH) in the Senate.


    Campaign for Accountability

    Data Coalition 

    Demand Progress 


    Government Information Watch 

    National Taxpayers Union

    Open The Government 

    Protect Democracy 

    R Street Institute Senior 

    Executives Association Society of Professional Journalists 

    Taxpayers for Common Sense

    Union of Concerned Scientists

  • September 07, 2021 9:00 AM | Data Coalition Team (Administrator)

    It’s no secret that the government collects a trove of data from the American people – estimated to cost $140 billion each year. But the value of that information is much higher, if it can be successfully and securely applied to make decisions about policies that improve lives and the economy. 

    Four years ago, the 15-member U.S. Commission on Evidence-Based Policymaking issued its final report to Congress and the President. Since then, while the world has changed drastically, the vision from the Evidence Commission is more relevant than ever: to enable the use of data in our society to solve real problems for the American people.  

    The Evidence Commission accomplished its mission with just 18-months to learn about the nature of the country’s data challenges, study the contours of potential solutions, and reach a bipartisan agreement on salient, timely recommendations. It is already a major success story to be emulated by future government commissions, and the impact is still ongoing. 

    During a press conference on Sept. 7, 2017 releasing the final Evidence Commission recommendations, then-Speaker Paul Ryan and Senator Patty Murray stood side-by-side to applaud the realistic, practical solutions offered by the commission members. Speaker Ryan said: “it’s time to agree where we agree.” And in that spirit, days later, Speaker Ryan and Sen. Murray jointly filed the monumental Foundations for Evidence-Based Policymaking Act (Evidence Act). 

    Enacted 16-months after the commission’s report with overwhelmingly bipartisan support in Congress, the Evidence Act was the most significant government-wide reform to the national data infrastructure in a generation. The Evidence Act created chief data officers and evaluation officers in federal agencies, established processes for planning data priorities and research needs, required government data to be open by default, and enabled new data sharing capabilities within one of the world’s strongest privacy-protective frameworks. In short, the legal authority of the Evidence Act was a game changer for how our government responsibly manages and uses data. The work to implement that law is now ongoing across federal agencies.

    The Evidence Act also has tremendous implications for state and local governments, federal grantees, researchers, and even allies on the international stage. The law positions the United States as a clear leader in the dialogue about producing useful evidence for decision-making, while also shifting the discourse about the role of data infrastructure in supporting basic program administration. 

    What’s possible today that was not four years ago? A lot. Take for example the recent efforts to improve talent in the federal government by aligning roles under the chief data officers and evaluation community. Agencies like the US Department of Agriculture are launching new enterprise data capabilities to understand what data they have and use it. Coordination across new data leaders is producing new innovations for the government, like the use of natural language processing to accelerate the review of comments on federal rules. Real dialogue is now underway to break down the barriers and silos of data within agencies, and promote more public access. A new portal for researchers to have a one-stop-shop for applying to access restricted data is under development. New pilot projects of privacy-preserving technologies are underway as public-private partnerships. All of these activities will lead to greater capacity to use data and, therefore, better information to solve the government’s most wicked problems. 

    While real progress is being made, there are other areas ripe for attention from leaders at the White House where implementation of the Evidence Act has lagged. Here are two examples:

    • Presumption of Accessibility Regulation — A key recommendation from the Evidence Commission included in the new law was to assume that data are sharable unless prohibited by law or regulation. This presumption of accessibility requires the White House Office of Management and Budget (OMB) to first take a regulatory action, which has disappointingly not yet even been published in a draft form for public feedback.
    • Guidance on New Open Data Requirements – The Evidence Act’s requirement that agencies make more data accessible and open is also paired with new transparency requirements about agencies inventorying data and publishing information about key contents of datasets. These nuanced activities require OMB to also issue guidance to agencies to facilitate consistency across federal agencies as well as prioritizing which high-value data should be made first.

    The Evidence Act was a starting point, but there is still yet more work underway to implement the Evidence Commission’s recommendations. Earlier this year, Rep. Don Beyer filed the National Secure Data Service Act as a strategy to take many of the commission’s remaining recommendations for a new infrastructure capable of securely combining data, creating a pathway for implementation. That bill quickly passed the U.S. House with strong bipartisan support and is now awaiting further action in the Senate. In parallel, the new Advisory Committee on Data for Evidence Building continues to study the challenges identified by the commission and is devising recommendations that will also further address the Evidence Commission’s work. 

    While much progress has been made based on the commission’s advice, there is still a long path ahead in the United States to implement effectively and ensure the remaining recommendations come to fruition. Importantly, the Evidence Commission is itself an example for how to develop and use evidence in policy making. Fortunately, because of the commission members’ diligent service to the country and the leadership from Speaker Ryan, Sen. Murray, Rep. Beyer and others, the country is well on its way to realizing the promise of evidence-based policymaking.

  • August 24, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation

    On the first day of their Administration, the Biden-Harris team issued an Executive Order on Advancing Racial Equity and Support for Underserved Communities Through the Federal Government (Executive Order 13985). The executive order was issued to promote and protect equitable policies and data in the Federal Government. These efforts supported the inclusion of marginalized groups in Federal research and analysis, the improvement of equitable policies, and to provide each person with the opportunity to reach their full potential.   

    In order to ensure the implementation of the program, the White House Domestic Policy Council (DPC) is “directed to coordinate the efforts to embed equity principles, policies, and approaches across the Federal Government.” This includes efforts to remove systemic barriers, develop policies to advance equity, and encourage communication between the National Security Council and the National Economic Council. As noted in the EO, it is the responsibility of the Office of Management and Budget (OMB) to analyze and “assess whether agency policies create or exacerbate barriers to full and equal participation by all eligible individuals.” This responsibility is key to identifying and quantifying the challenges toward equity. 

    The Executive Order recognized the important role of disaggregating data, or data that has been broken down by detailed sub-categories, such as race, ethnicity, gender, disability, income, veteran status, and other key demographic variables, by creating the Equitable Data Working Group. The Working Group has been tasked with “identifying inadequacies in existing Federal data collection infrastructure and laying out a strategy for improving equitable data practices in the Federal government.” This is accomplished through the collection of new data or through the combination of multiple data sources in order to fill the data gaps that make assessments of equity difficult, which in turn supports evidence-based policies within the Federal government and state and local governments through vertical policy diffusion. “By exploring key policy questions dependent upon underutilized, inaccessible, or missing data, the Equitable Data Working Group explores ways to leverage government data in order to measure and promote equity.” 

    Despite overwhelming positives in exposing gaps of data, the Group recognizes there are possible unintended consequences when considering privacy and the vulnerability of underserved populations. With this in mind, aggregating data into summary data can help understand broad trends within these communities without disseminating personal data. For example, the National Crime Victimization Survey (NCVS) collects data on self-reported accounts of criminal victimization. The NCVS produces reports that break down victimization data by race, ethnicity, gender, age, marital status, and income. However, once the data is able to be separated by race, researchers and analysts can provide summary statistics and better insights into disparities, without exposing personal identifiers. This protects the privacy of those who have been surveyed while still leveraging the data collected, while helping us answer important policy questions about crime. 

    The Data Coalition Initiative will be looking for how the Working Group is approaching these issues when its first report is provided to Ambassador Susan Rice, Assistant to the President for Domestic Policy, this fall, which will identify and discuss the barriers and gaps of equitable data identified through case studies, along with recommendations on how to address these problems.

    The Working Group report will also include a plan to foster new partnerships among Federal agencies, academic and research partners, state, local, and tribal governments, community and advocacy groups, and other stakeholders, in order to leverage Federal data for new insights on the effects of structurally biased policies, and to advance capacity for multilayered, intersectional analysis of Federal datasets. The Data Coalition is looking forward to the chance to engage with the Working Group on its efforts, and will continue to provide updates as their important work progresses. 

  • July 28, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation

    The nation is preparing to send its children back to school this fall, but there will be many questions about the  on-going impacts the pandemic has on our children, both in the short and long term. While there are a great many strengths of our country’s educational infrastructure, the data infrastructure applied to improving learning and the workforce continues to face substantial gaps. In order to understand, adapt to, and mitigate the impact of the pandemic, we must ensure that there is a robust data infrastructure. One way to ensure there is timely useful data about our learners and workers is to provide significant and sustained funding for the Statewide Longitudinal Data Systems (SLDS). 

    SLDS is a Federal government program that allows and provides access to historical data on public-school enrolled students and teachers starting from the 2006-2007 school year. The SLDS system was designed to improve data-driven decisions impacting student learning and education. It focuses on the connection among PreK, K-12, postsecondary, and workforce education data. School districts, public schools, and teachers can access the data system via their district’s Student Information Systems (SIS). It is accessible through a free application that is available to eligible state grant recipients, such as school districts, schools, and teachers. This data includes assessment scores, daily attendance, enrollment, courses, and grades. In its most advantageous state, it enables grantees to link individual level data from Pre-K to the labor market.

    The SLDS plays a significant role in creating data-driven policies. While the information is collected and stored, the grant program also provides more accessible data in order to get a better understanding of a policy’s impact on student learning. Moreover, it encourages policy efficiency and equity by quantifying educational measurements over time. Data-driven systems such as SLDS provide transparency about which policies affect students and the significance of their impact.

    The SLDS has meaningful benefits, although there are also challenges when implementing a data-driven program. States have been able to put this data to work to better support students on pathways to the workforce. Currently, every state, the District of Columbia and Puerto Rico has an SLDS that connects data between some data systems, but few can connect early education, K-12, postsecondary, and workforce. This makes it challenging to study and evaluate programs intended to improve outcomes in college and the workforce. As states and federal programs strive to boost education attainment and close the skill gaps in the workforce, it is vital that our country has the ability to produce rigorous analyses based on high-quality data.

    New, sustained investment in SLDS data can provide the important information to answer the critical questions policymakers, educators, as well as parents and students. This will require a significant, multi-year investment of $1 billion. This funding should focus on modernizing SLDS data systems to build more interoperable and accessible data platforms with privacy-preserving technology as well as building capacity to use SLDS data through state research-practice partnerships that bring both real-time learning and longitudinal data, as well as diversify representation of practitioners. Finally, funding should be directed to ensuring robust governance and accountability structures are put into place to ensure these systems transparently address the real priorities, needs and community expectations. 

    Not only is this funding necessary to improve the data infrastructure to meet the needs of learners and workers, it is necessary to make this a sustained funding level, so that these systems have the resources to evolve to meet ever changing research needs and privacy protection. 

    Sustained and continued financial investment in the SLDS program would help ensure data-driven success and proper-use of the data. An increase in funding will help provide the much needed update to the data infrastructure necessary to advance evidence based policymaking, and modernize privacy protection. Providing this funding for SLDS is smart investment that ensures we will have the evidence and data to provide the best outcomes for our students.

  • July 07, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation 

    Crime data – which includes data on types of crime, demographics of victims and perpetrators, corrections, recidivism and reentry and court information – is crucial evidence that is used to inform policy decisions in all jurisdictions. In the United States, national crime data is aggregated at the federal level, by the Department of Justice’s Bureau of Justice Statistics (BJS) and Federal Bureau of Investigations (FBI). Reliable and up-to-date criminal justice statistics are imperative in order for policymakers to make evidence-based decisions. However, as questions around policing and criminal justice become ever more pressing, it is worth exploring the challenges and limitations of crime data so that we may identify opportunities to improve both the data, and the policy decisions informed by the data.

    The National Crime Victimization Survey (NCVS) is the nation’s leading source for statistical information on victims of crime. Victim, offender and crime characteristics are reported along with reasons for reporting or not reporting the crime. However, there are serious limitations to the NCVS. The survey is self-reported by the victim and is not recorded when a crime is committed nor when there is victim to a crime. A sample of households in the United States are taken every 3.5 years and households are interviewed 7 times within that span. Only 71% of households sampled responded to the survey in 2019. Since the survey is only a sample, it does not capture variation in victimization patterns at the local or state level. Therefore, victimization patterns at the city level would require additional research. Additionally, data on the effectiveness of policing practices when addressing crime and victimization is lacking from NCVS reports.

    An example of the type of effective additional research is the Data Foundation’s Policing in America Project a multi-pronged, open data effort to systematically improve evidence about how the American people view the criminal justice system and police forces.  The project focuses on the value of building data capabilities to enable a more robust understanding of the relationship between perceptions of law enforcement agencies and the conditions in select cities, including disparate perceptions by sub-populations. 

    In addition to the NCVS, which relies on traditional surveys of victims, a good deal of crime data is reported to the Department of Justice by local law enforcement. The FBI’s Uniform Crime Report (UCR) has been used to provide crime statistics since 1930. The BJS, the primary statistical agency of the Department of Justice, uses UCR data in their publications and datasets. BJS has long been trusted to publish up-to-date and accurate information, utilized by academia and professionals for criminal justice reports and open access data. The data from BJS and the UCR includes local, state, and national level data on corrections, courts, crime, the Federal justice system, forensic science, law enforcement, recidivism and reentry, tribal crime, and victims of crime. The data is reported by local law enforcement agencies to form a national database of criminal justice statistics. In practice, this has led to incomplete and non-standard reporting to the FBI. Local jurisdictions may have different definitions of crime that can make uniform crime reporting difficult. There may be lags in reporting for local agencies, as well as incomplete data.  

    One challenge comes from inconsistent reporting from local law enforcement agencies (LEAs) which can make arrests difficult to calculate. Reporting data is voluntary, so LEAs may not always report the same data every year. But UCR only uses data from these voluntary reports. The procedure to calculate the aggregated national county and state arrest rates does not take into account the population covered by the UCR. Due to the variable population in UCR coverage each year, this would have a significant effect on the arrest rates. This proposes serious problems in analyzing national time series (over time) trends. Perhaps the main limitation on UCR data, however, is the difference between actual and reported crime. 

    In addition to inconsistent reporting, the data that is reported is not standardized. Some states may have differing definitions of crime, as well as wholly different crimes on the books. One example would be some states, such as Minnesota, have a 3rd degree murder charge, whereas other states would classify that as manslaughter. Similar challenges exist with hate crime statutes, which may be vastly different, include different demographic information, or may not be a part of the criminal penal code. 

    Timeliness of data is also incredibly important for informing policy, but there are significant lags in crime data. As of June 2021, aggregated arrest data was last reported in 2016, half a decade to date. Despite the availability of raw arrest data on the FBI’s Crime Data Explorer up to 2019, the Bureau of Justice Statistics has not reported the arrest figures. This means that any policy decisions based on crime data are based on data that is missing the most timely insights. 

    And finally, data needs to be usable by the public, academic researchers, and policymakers. This means that it needs to be published in an accessible format. Crime data has some significant challenges in this respect. But there are tools to help. The BJS Arrest Data Analysis Tool allows researchers to find national estimates and/or agency-level counts of crime. This data is sourced from the FBI’s Uniform Crime Report. The tool is significant in that viewers are easily able to generate arrest figures at the national and local level without needing the data science background required for raw data processing. While some of these challenges are unique to the crime data, many of these challenges exist in the data infrastructure throughout the country. Many initiatives are being undertaken in order to help address these problems, and optimize data for evidence-based policymaking. 

    The Foundations for Evidence-Based Policymaking Act of 2018 became law on January 14th, 2019. The bill requires available agency data that is accessible and reports that utilize statistical evidence to support policymaking. Annually, agencies must craft a learning agenda to address policy concerns to the Office of Management and Budget (OMB). This is an opportunity for the Department of Justice to identify and address what needs to be improved and work with stakeholders to ensure that the necessary improvements can be made. This includes an Open Data Plan that must detail how each respective agency plans to make their data open to the public. 

    Investment in crime and policing data has been modest, preventing meaningful updates to data collection and modernization. Additional funding for criminal justice data collection and reporting is recommended. Increasing local law enforcement training on reporting and correctly classifying all crimes to the FBI can help increase accuracy and reliability. In addition, increased efforts for interagency collaboration between local law enforcement agencies and the FBI can provide more accurate aggregate data. With these common-sense improvements, crime data can be more effective in helping craft evidence based policymaking. 

  • May 13, 2021 9:00 AM | Data Coalition Team (Administrator)

    The emerging need to securely share, link, and use information collected by different government agencies and entities is challenged today based on administrative, legal, and operational hurdles. The National Secure Data Service Act (H.R. 3133), sponsored by Rep. Don Beyer (D-VA), seeks to implement a demonstration project for a data service that could rapidly address policy questions and reduce unintended burdens for data sharing, while aligning with design principles and concepts presented in recommendations from data and privacy experts. The proposal specifically cites an effort to support full implementation of recommendations made by the bipartisan U.S. Commission on Evidence-Based Policymaking for data linkage and access infrastructure. 

    Why is the National Secure Data Service Act necessary? 

    The federal government’s data infrastructure is largely decentralized. Individual agencies and programs may collect data without sharing or using information already collected by other parts of government. This imposes undue burdens on the American public and businesses through repeated reporting of information the government already has. Creating a capacity to securely share information while protecting confidentiality and deploying other privacy safeguards offers tremendous potential for developing new insights and knowledge to support statistical analysis and summary-level information relevant for evidence-based policymaking and practice. 

    The National Secure Data Service Act builds on the bipartisan and unanimous recommendations from the U.S. Commission on Evidence-Based Policymaking from 2017, a consensus proposal from the National Academies of Sciences, Engineering and Medicine in 2017, and a suggested roadmap published by the Data Foundation in 2020. The proposed legislation creates an expectation for the National Science Foundation to make rapid progress in launching a data service and transparently supporting government-wide evidence-building activities. 

    How will a National Secure Data Service protect privacy?

    Under the proposed legislation, the data service at NSF must adhere to federal privacy laws, including the Confidential Information Protection and Statistical Efficiency Act of 2018 (CIPSEA). This law was reauthorized by Congress with bipartisan approval in 2018, establishing one of the strongest government privacy laws in the world, including strong criminal and civil penalties for misuse. The proposed data service can only operate using the CIPSEA authority and in compliance with the Privacy Act of 1974. The data service will also provide information to Congress about specific policies and practices deployed for protecting data. 

    Will the American public have knowledge about projects conducted at the National Secure Data Service?

    Yes. Consistent with principles about transparency specified by experts from the Evidence Commission, National Academies panel, and the Data Foundation, the proposed legislation specifically directs NSF to publish information about activities that are underway. In addition, Congress will receive a report on all projects, expected to include information about the costs and benefits of each. 

    How does the proposed legislation relate to the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act)?

    The National Secure Data Service builds on existing capabilities and authorities established in the Evidence Act, while also providing a resource for federal agencies, researchers, and data analysts to responsibly produce insights that can address questions in agency evidence-building plans (learning agendas). When Congress approved the Evidence Act, the creation of an advisory committee was intended to signal Congress’ continued interest in establishing a data service and provide relevant information to support implementation of next steps within two years of enactment. Now, more than two years after enactment of the Evidence Act the advisory committee continues to meet and consider technical implementation details. The proposed legislation sets up the formal authorization of a data service to continue this momentum. 

    Does the National Secure Data Service Act supersede advice expected in 2021 and 2022 from the Federal Advisory Committee on Data for Evidence Building?

    No. The Federal Advisory Committee on Data for Evidence Building is a collection of nearly 30 experts considering a range of topics related to data linkage and use. Nothing in the proposed legislation restricts the ability of the advisory committee to offer OMB recommendations, as required by its charge in the Evidence Act. Instead the legislation specifically encourages NSF to consider practices and recommendations from the advisory committee as part of its administrative implementation efforts. The role of the advisory committee is also likely increasingly influential in supporting tangible implementation of activities at NSF under the proposed legislation. 

    Will a National Secure Data Service displace existing data linkage activities in the Federal Statistical System?

    No. The data service is designed to supplement rather than displace any existing, successful, and sufficiently secure data linkage arrangements. Statistical agencies engaged in production-level data collection, sharing, and publication for the development of federal statistical indicators will receive additional capabilities from the National Secure Data Service but could retain existing practices. 

    Is the National Science Foundation the right agency to operate a data service?

    In 2020, the Data Foundation published a white paper establishing a framework for considering where to operate a data service in government that can meet broad needs government-wide and from across the research and evaluation communities. After exploring the range of potential options, the authors recommended the National Science Foundation given its ability to deploy the strong privacy authorities under CIPSEA, existing expertise in social sciences and computer science, the presence of one of the existing federal statistical agencies with expertise in confidentiality protections and data linkage, and NSF’s close connections and existing relationships with the research community.  

    The text of the National Secure Data Service Act provides NSF flexibility to determine how to implement a data service, including the possibility of issuing a contract through a Federally-Funded Research and Development Center, as recommended in the Data Foundation white paper. This recommendation was presented to the Federal Advisory Committee on Data for Evidence Building in April 2021 and to the NSF Social, Behavioral and Economics Sciences Advisory Committee in May 2021, receiving favorable perspectives and comments from respective committee members. 

    How much will implementation of a National Secure Data Service cost?

    Precise implementation costs will vary based on the level of services and activities applied at a data service. In 2021, the Data Coalition recommended that a National Secure Data Service receive an initial annual appropriation of $50 million to support development and launch of core linkage capabilities, privacy protective features, and necessary disclosure avoidance protocols, among other features. 

    Has the Data Coalition taken a position on the National Secure Data Service Act?

    In 2020, the Data Coalition called on Congress to authorize a data service to support pandemic response activities, then later reiterated support following publication of the Data Foundation white paper. supporting the recommendations from the Data Foundation’s white paper. 

    Representing the broad membership and interests from the data community, the Data Coalition endorsed the National Secure Data Service Act filed on May 13, 2021. The Data Coalition has also encouraged administrative actions to make progress on the establishment and launch of a data service, including NSF’s recent activities on America’s DataHub. 

    Does NSF support the legislative proposal? 

    The Administration has not formally weighed in on the proposal with a Statement of Administration Policy, however, NSF did provide technical feedback on a draft of the legislative text. 

    Last edited May 13, 2021

  • May 11, 2021 9:00 AM | Data Coalition Team (Administrator)

    This month our RegTech 2021 series, continued by examining government uses of artificial intelligence (AI). Just last year, Congress passed legislation encouraging the government to move from pilots and demonstration projects to scaled up, applied solutions. The discussion featured two fireside chats with government leaders: Henry Kautz, Division Director, Information & Intelligent Systems, National Science Foundation (NSF) and Mike Willis, Associate Director in the Division of Economic and Risk Analysis, Securities and Exchange Commission (SEC). 

    First Director Kautz discussed the work in  AI at NSF, as the agency seeks to fund larger, more interdisciplinary projects. Lately, the agency has been focused on establishing AI Institutes, virtual centers organized around themes, connecting colleges and universities to partners in the private and public sector. Themes include AI-Augmented Learning, AI-Driven Innovation in Agriculture and the Food System, and Advanced Cyberinfrastructure. Director Kautz emphasized the importance of NSF’s role in supporting foundational, pre-competitive research and development in these private-public partnerships. 

    When thinking about what challenges the government is facing, he recommends that agencies consider improving coordination among themselves on how to best make use of AI internally. He pointed out the success of coordinating bodies like the Joint Artificial Intelligence Center at the Department of Defense, but encourages the government to think more broadly about  the big questions facing the government.  Additional suggestions to scale up AI include building up AI expertise within the government, especially at the managerial level, being sensitive to and aware of AI skepticism, and rethinking traditional procurement practices. He also emphasized the need for explainability and transparency in ensuring ethical uses of AI and conceptualizing data as infrastructure. 

    In the next fireside chat, Preethy Prakash, Vice President of Business Development at eBrevia, spoke with Mike Willis from the SEC. Willis, speaking for himself, spoke of the SEC’s steps to make registrant disclosures more accessible and usable, after noticing well over 90% of EDGAR visitors are machines. 

    Even though these data sets are highly desired by the public and outside uses of AI, the role of AI within the SEC today is largely focused on the enhancement of the effectiveness of staff analytical procedures, including those related to risk assessments, identifying potential areas for further investigations for activities like insider trading, comment letter analysis, and entity mappings.  

    When asked how to think about creating quality data that is interoperable, Willis pointed directly to the Evidence Act which defines the term “open government data asset” based upon an underlying open standard. “Leveraging industry and market standards, I think, are a very useful way to drive down compliance costs, while streamlining the validation and analysis of the data, including for AI and ML purposes,” Willis stated. He went on to note how these open standards are a great example of public-private partnerships discussed previously.

    As the SEC continues to implement AI, Willis outlined some change management considerations. His recommendations were to ensure that you have talented, qualified professionals, help people understand the problems and processes that AI can help supplement, provide use cases and examples, ensure that your AI solution stays within its scope, and finally, echoes Director Krautz’s call to consider data as infrastructure, meaning it be standardized and structured. 

    The whole conversation is available here. To learn more about our RegTech Series, sponsored by Donnelley Financial Solutions (DFIN), visit our webpage.

  • May 04, 2021 9:00 AM | Data Coalition Team (Administrator)

    Adopting data standards are especially important in the United States where the regulatory structure consists of separate regulatory agencies that have different missions and mandates. Each of these agencies has its own data and collection systems designed around distinct supervisory responsibilities. A significant step to rationalizing data needs and establishing standards across financial regulatory agencies is the Financial Transparency Act (FTA) H.R. 2989.  The FTA, introduced by Representatives Carolyn Maloney (D-NY) and Patrick McHenry (R-NC), would require Financial Regulatory Authorities [1] to adopt standards that: allow search ability, establish consistent formats and protocols; and provide transparency into data definitions and requirements.  The overarching goal of the Act is to:

    “…to further enable the development of RegTech and Artificial Intelligence applications, to put the United States on a path towards building a comprehensive Standard Business Reporting program to ultimately harmonize and reduce the private sector’s regulatory compliance burden, while enhancing transparency and accountability…”

    To accomplish this goal the FTA would require the use of common identifiers for transactions, legal entities, and products. While efforts have begun on legal entity and unique product identifiers, implementation and adoption has been a long process. Requiring the use of identifiers through the FTA should hasten adoption. To make these identifiers available throughout the private sector, the FTA would require that these identifiers be non-proprietary and available through open sources. The FTA requirement that data be searchable will make data more useful for the public and private sector by making data discovery easier and simplifying the capabilities to integrate these data into analytical tools through the use of industry and technology best practices. Data transparency will further be enhanced by the FTA by requiring metadata to be available, helping data providers and data users to have a clear understanding of the data definitions and context.  

    Regulated entities in the financial service industry have long made clear the need for consistent data definitions and protocols within and between agencies.  Just as important has been the need to have standard cross jurisdictional definitions. By requiring data standardization, including the use of common identifiers, the FTA has the potential to significantly reduce the compliance costs for regulated entities, improve data accuracy, and improve the private sector’s access to regulatory data. All of which furthers the transparency of financial markets.  

    High quality regulatory data, especially at large financial institutions is a continuing challenge.  An underlying cause of these data quality issues is the lack of uniform data standards across data sets and agencies requiring institutions to transform data to meet different regulators’ needs. This creates risk of misstatement, often from misinterpretations of the requirements and the use of manually intensive processes. This creates the need for intensive quality assurance processes and enhanced internal controls to insure the proper level of data quality, adding significant effort and risk at financial firms and regulatory bodies.  

    Change management is another area where regulators and firms can benefit from the FTA.  Inefficient communication channel for reporting requirements of data requirements creates the risk of misinterpretation reporting requirements. For the most part, these requirements are communicated in regulations, reporting instructions, and statistical standards using plain English. The current model makes it difficult for firms, especially large complex firms,  to communicate requirements throughout the organization (from report preparers to data owners).  This increases the misinterpretation of reporting requirements that could ultimately result in non-compliance with regulations or formal supervisory actions. As regulatory requirements continue to grow and increase in complexity, the FTA requirement to provide machine readable metadata is the automation requirement elaboration is a critical step to improving data quality and improving change management processes. 

    Data comparability is needed for financial regulators to fulfill their missions. Without data standards, obtaining comparable data to gain insights and apply advanced analytics is a difficult task.  While conceptually there is agreement for the need of adopting data standards. The number of financial regulatory agencies and their differing missions can be a significant obstacle to the goal.  This need was recognized when the Office of Financial Research within the Department of the Treasury was established and given a mandate to establish standards. While many regulators understand the need and benefit of data standardization, adoption has been slow. These efforts are often slowed by shifting data needs, the need to maintain legacy data sets, and the need to modernize technology capabilities. For example, it is only now the Legal Entity Identifier is being a requirement across financial regulators. This is why the FTA would be important steps forward. By requiring compliance with data standardization through legislation, will not only hasten the pace to standard data definitions, it should result in greater collaboration between regulators and between regulators and the private sector.

    Now, more than ever there is a need for the FTA. The complexity and interconnectedness of financial firms has made data core to financial supervision. The need for standardization grows more crucial as the volume and velocity of data requirements increases. These data are crucial to understanding the activities and risk at financial institutions and markets. This has proven to be even more so in recent market stresses. Increasingly regulatory data are needed in near real time. Without standardization the effectiveness of processes and data quality will not meet regulatory expectations or needs. The FTA will be an important step in helping regulators and the private sector improve overall data capabilities of the financial services industry and its regulators.


    [1] Department of the Treasury, Securities and Exchange Commission, Federal Deposit Insurance Company, Office of the Comptroller of the Currency, Bureau of Consumer Financial Protection, Federal Reserve System, Commodity Future Exchange Commission, National Credit Union Administration, and Federal Housing Finance Agency.

    Kenneth Lamar is the Principal Partner of Lamar Associates LLC.  He is an Independent Senior Advisor Advisor AxiomSL.  Previously, Mr. Lamar was a senior official at the Federal Reserve Bank of New York acting as an Advisor to the Director of Research and Leading the Data and Statistics Function.

  • April 28, 2021 9:00 AM | Data Coalition Team (Administrator)

    On April 28th, Data Foundation President Nick Hart sent a letter to Treasury Secretary Janet Yellen strongly encouraging the Financial Stability Oversight Council (FSOC) to advance uniform data standards for the information currently collected by regulatory entities, as a means to promote data sharing and high-quality information for regulators and investors alike.

    The need for better standards in financial markets has been a long-standing priority of the Data Coalition and its members. The letter is available here as a pdf.


    Department of the Treasury

    1500 Pennsylvania Avenue NW

    Washington DC, 20220

    Delivered by Electronic Mail

    RE: Advancing Data Quality and Transparency in FSOC Agencies


    Secretary Yellen –

    Congratulations on your confirmation as the Secretary of the Treasury. Your role as Chair of the Financial Stability Oversight Council (FSOC) established by the Dodd-Frank Wall Street Reform and Consumer Protection Act is critical. On behalf of the members of the Data Coalition, I write to encourage that in 2021 FSOC prioritize actions that drastically improve the quality of data available for regulatory oversight while also reducing reporting burdens on regulated entities. 

    The Data Coalition is an initiative that aligns interests across organizations in the national data community, advocating for responsible policies to make government data high-quality, accessible, and useful. As part of the non-profit Data Foundation, the Data Coalition specifically works to unite communities that focus on data science, management, evaluation, statistics, and technology, in industry, non-profits, and universities.

    In 2017, a Treasury Department report identified opportunities to make the current reporting regime for regulated entities less complicated and more consistent.[1] The Securities and Exchange Commission’s Investor Advocate has stressed the need for standards to make access to information easier and less costly for investors.[2]The Data Coalition strongly encourages FSOC to advance uniform data standards for the information currently collected from regulated entities as a means to promote data sharing and higher-quality information for regulators and investors alike. We know that data standards can enable better information processing and data reconciliation. The country needs machine-readable data as well as machine-readable data standards.

    One aspect of the standards that the Data Coalition specifically encourages is the adoption of a common legal entity identifier. Across 36 federal agencies there are more than 50 distinct entity identifiers in use.[3] The Federal Reserve, among others, already recognizes   the benefits of applying such an identifier in the U.S. regulatory context.[4] One possible standard to fill this gap is the G-20-backed Legal Entity Identifier (LEI) which is non-proprietary, verified, and part of a globally-managed system that the U.S. has already contributed to developing.[5]

    Applying common-sense and much-needed data standards for the United States financial regulatory ecosystem can be promoted and achieved administratively under the leadership of FSOC. The Data Coalition recommends that FSOC take rapid action to improve the quality of data reported in our financial system.

    The Data Coalition members look forward to supporting FSOC member-agencies in continuing to ensure the American people have access to information they need to make good financial decisions. We welcome the opportunity to provide technical assistance to any FSOC member-agencies in order to advance coherent, sound data policy.



    Nick Hart, Ph.D.

    President, Data Foundation


    [1] U.S. Department of the Treasury. A Financial System That Creates Economic Opportunities: Banks and Credit Unions. Washington, D.C., 2017. Available at: https://www.treasury.gov/press-center/press-releases/Documents/A%20Financial%20System.pdf

    [2] Referenced in: R.A. Fleming and A.M. Ledbetter. “More Data, More Problems.” The Regulatory Review, 2021. Available at: https://www.theregreview.org/2021/04/26/fleming-ledbetter-more-data-more-problems/

    [3] M. Rumsey. Envisioning Comprehensive Entity Identification for the U.S. Federal Government. Washington, D.C.: Data Foundation, 2018. Available at: https://www.datafoundation.org/envisioning-comprehensive-entity-identification

    [4] J.A. Bottega and L.F. Powell. Creating a Linchpin for Financial Data: Toward a Universal Legal Entity Identifier. Finance and Economics Discussion Series. Washington, D.C.: Federal Reserve Board Divisions of Research & Statistics and Monetary Affairs, 2011. Available at: https://www.federalreserve.gov/pubs/feds/2011/201107/201107pap.pdf.

    [5] Global Legal Entity Identifier Foundation, 2021. Available at: https://www.gleif.org/.

  • April 12, 2021 9:00 AM | Data Coalition Team (Administrator)

    When the United States government issued its Federal Data Strategy in 2019, it recognized the role of data as a component of infrastructure. This point must not be lost as the country proceeds in considering the President’s American Jobs Plan, the infrastructure proposal presented by the White House in March 2021. 

    The reason data is infrastructure is simple – no aspect of traditional or social infrastructure, spending, or implementation of any government service or activity can be designed, implemented, and monitored without a coherent, sustained data capacity. Data infrastructure involves the people, processes, systems, and resources that enable the entire ecosystem for data-driven decision-making from the collection of information through the analysis and presentation. It involves the systems on which information is stored, but also much more to ensure the information stored is high-quality and usable. 

    Data infrastructure requires investment, just like building and repairing roads or creating a greener economy. For too long, our country has underinvested in its national data infrastructure and we have an opportunity to correct this problem. Congress and the White House can leverage the American Jobs Plan to ensure the country has critical information and capabilities for using data to support accountability, learning, and transparency for the American public. 

    Here are three suggestions of data infrastructure components that meet longstanding needs and support national priorities:

    1. Build a National Secure Data Service. The federal government lacks a comprehensive, secure capability to share and link sensitive information for analytics. A data service can bring together information from federal, state, local, and private collections to support answering priority questions about employment, earnings, and equity. Establishing a data service as a new resource for infrastructure in government promotes the use of data, while addressing longstanding barriers to sharing information that can unnecessarily impede our ability to learn about what strategies and policies work best. This could be included in the $180 billion investment for R&D.  
    2. Prioritize Needed Workforce Development Information. As part of the effort to build capacity for the national workforce development capabilities, the country needs to ensure appropriate data are collected through the relevant programs and activities. This includes the ability to connect information about education with information about earnings and occupations. Setting aside up to 1 percent of the planned investment to ensure data collection, management, and evaluation activities can support effective implementation would be prudent to make investments in state longitudinal data systems. This could be adopted as part of the $100 billion workforce investment.  
    3. Promote R&D on Data-Privacy Technologies. Emerging privacy-preserving technologies offer opportunities for the country to rapidly expand analytical and research capabilities, yet many of these technologies will need additional R&D prior to widespread deployment. With a major planned investment in R&D capabilities at the National Science Foundation, a portion of the $50 billion technology investment should include a focus on data protection.  

    If the country is serious about building an infrastructure for the challenges of the 21st century, we must also have the data to ensure the policies and investments achieve intended goals.



Powered by Wild Apricot Membership Software