Log in


  • March 31, 2021 9:00 AM | Data Coalition Team (Administrator)

    On March 30th, the Data Coalition sent the following letter to Office of Management and Budget Deputy Director Shalanda Young. This letter outlines ways to improve the use of data in decision-making. The suggestions fall into six key areas: leadership, statutory compliance, information sharing, administrative processes, and workforce capacity and resources.  A .pdf version is available here.

    March 30, 2021


    Eisenhower Executive Office Building

    1650 Pennsylvania Avenue NW

    Washington DC, 20502

    Delivered by Electronic Mail

    RE: Strategies for Encouraging Use of Data to Support Biden-Harris Administration Priorities 

    Deputy Director Young –

    Recognizing the enormity of the challenges the Biden-Harris Administration is responding to in 2021 and the many issues your team at OMB is working diligently to address, the Data Coalition members encourage OMB to put data at the center of decision-making and planning activities. As the country looks toward the better days ahead for our country, we must know what works and in what contexts, and enable more evidence-based policymaking. This requires investing in the data and evidence infrastructure.

    The Data Coalition members applaud the leadership expressed in President Biden’s memorandum to agency heads about scientific integrity and evidence that elevates the role of data in decision-making activities. Building on President Biden’s memorandum and other recent executive actions, this letter provides actionable suggestions for the Administration and OMB to consider, including several apolitical efforts underway in recent years that we hope can continue or be reinvigorated this year. We encourage that these activities be integrated into the Fiscal Year 2022 President’s Budget, the President’s Management Agenda, revisions to OMB Circular A-11, regulatory review requirements, implementation of the American Rescue Plan, and any forthcoming national infrastructure plan to the extent possible.

    The Data Coalition is an initiative that aligns interests across organizations in the national data community, advocating for responsible policies to make government data high-quality, accessible, and useful. As part of the non-profit Data Foundation, the Data Coalition specifically works to unite communities that focus on data science, management, evaluation, statistics, and technology, in industry, non-profits, and universities.

    Recent legislation like the Foundations for Evidence-Based Policymaking Act (Evidence Act) – including the OPEN Government Data Act and the Confidential Information Protection and Statistical Efficiency Act (CIPSEA) – are broad in focus, implicating a whole-of-government approach to better using data. Similarly, because of the Data Coalition’s broad membership and interests, our expertise and suggestions are not isolated to a single policy domain. Our suggestions fall along six key areas: leadership, statutory compliance, information sharing, administrative processes, and workforce capacity and resources. Each is discussed in greater detail below.

    Leadership and Transparency

    The Data Coalition appreciates the efforts from the Biden-Harris Administration to date in elevating and encouraging the central role of data and evidence in decision-making. Following through on promises and bold statements will require sustained engagement and leadership from all levels of the Administration. Specifically, we hope the Biden-Harris Administration and OMB will consider the following:

    • Issue a National Data Strategy Focused on Presidential Priorities. The Executive Branch developed a reasonable Federal Data Strategy  with extensive input from federal agencies, industry, non-profits, and academia. The strategy, with its annual action plan, provided a clear sense of direction and organization for the vast number of needed improvements for data infrastructure in government. OMB should re-issue and re-enforce a coherent national data strategy and corresponding action plan as part of the President’s Management Agenda, organized around the key priorities of the administration.
    • Improve Coordination Within OMB. The division of labor on data and information management activities at OMB is understandably segregated. However, this approach has recognized limitations for effectively coordinating data matters government-wide. Improving collaboration across the Office of Information and Regulatory Affairs, Economic Policy, Performance and Personnel Management, the Office of Electronic Government, and other divisions of OMB should be prioritized by OMB leadership. The Data Coalition members encourage a deliberate focus on sustained, coordinated leadership within OMB for delivering on data priorities and commitments. Additional legal authority is not likely needed to accomplish this coordination, but rather support from the Director, Deputy Director, and other senior leaders in the agency is instrumental for encouraging divisions to effectively and routinely coordinate information management activities. We hope that OMB will also leverage improved internal coordination to support agency leaders in collaborating with each other to support data and evidence priorities, including through the President’s Management Council, Chief Data Officers Council, Evaluation Officers Council, Interagency Council on Statistical Policy, Chief Information Officers Council, and other interagency bodies.
    • Task the Advisory Committee on Data for Evidence Building (ACDEB) with Priority Questions. The ACDEB, or Evidence Act Advisory Committee, can be a resource for OMB and the White House in studying and addressing core data issues across government with expertise from across government, industry, and academia. Despite the recent delegation of the chair role from OMB to the National Science Foundation, Congress intended for OMB to play a central role in steering the direction of the committee and intended the Federal Chief Statistician to chair the committee. ACDEB has not yet issued recommendations publicly to date. OMB leadership should provide clear direction and suggestions to the advisory committee about areas of focus and need to support response to Biden-Harris Administration priorities, which may include the role of data linkage and sharing for addressing economic mobility, social inequities, climate change, or pandemic response.
    • Direct Agencies to Make Evidence Act Planning Documents Publicly Accessible. Agencies have made some progress in developing evidence-building plans, evidence assessments, and evaluation plans as required by the Evidence Act, yet much of this progress has not been shared publicly or conducted with meaningful stakeholder engagement. The Data Coalition encourages that when issuing updated guidance to agencies, consistent with Section 5 of President Biden’s scientific integrity memo, that OMB prioritize transparency in process and final publication of learning agendas and other data and evidence planning documents. Specifically, the Data Coalition strongly encourages OMB to direct agencies to publish machine-readable plans and for OMB to provide a central portal for the data and evidence community to access such information. This requirement need not wait for publication of quadrennial strategic plans and should be implemented as soon as possible.

    Statutory Compliance with New Data Laws

    The advent of new laws like the Evidence Act, the Digital Accountability and Transparency (DATA) Act, and others suggest a clear intent from Congress to prioritize open data and the organization of information for accountability and decision-making purposes. Passage of these laws must be matched with effective, timely implementation efforts within the Executive Branch to achieve their purpose. The Data Coalition encourages OMB to prioritize implementation in several key areas from recent data laws, such as:

    • Issue Required CIPSEA Regulations. Title III of the Evidence Act (Sec. 303) placed responsibilities with OMB to issue regulations related to Statistical Policy Directive #1 and public trust in data, the designation of CIPSEA units, and to authorize the use of the presumption of accessibility authority under CIPSEA. This third provision and new authority, based on a recommendation from the U.S. Commission on Evidence-Based Policymaking, allows the statistical system to request access to and use of administrative records for statistical purposes, generating relevant insights for decision-makers within a strong privacy framework. The unanimity of the Evidence Commission’s recommendations and the near-unanimous movement of the legislation through Congress suggests widespread support for this provision being quickly instituted by OMB. The Data Coalition encourages OMB to advance all of the delayed CIPSEA regulatory actions, including publishing the presumption of accessibility as an Interim Final Rule. Such an approach, permissible under the Administrative Procedure Act, allows for rapid action while not precluding further revisions based on valuable public comment and input.
    • Provide Guidance on Making Data Open by Default. Title II of the Evidence Act – the OPEN Government Data Act – outlines expectations for federal agencies to publish machine-readable open data as a default, subject to guidance issued by the OMB Director. More than two years after enactment of this law, OMB has not issued open data guidance that addresses risks of re-identification, security considerations, costs and benefits of converting data assets, and procedures for developing prioritized open data plans. The Data Coalition calls on OMB, in collaboration with the Chief Data Officers Council and other relevant interagency bodies, to publish this long-overdue guidance on open data.
    • Prioritize GREAT Act Implementation. Now in the second year of implementation of the Grant Reform Efficiency and Agreements Transparency (GREAT) Act, OMB is tasked with partnering with the Department of Health and Human Services (HHS) to issue data standards within 2 years of the law’s enactment. Ensuring these standards are developed with a collaborative and consultative process must be a priority to achieve this law’s intent, higher-quality data and lower burden on grantees for reporting. The Data Coalition members encourage OMB to consider the use of non-proprietary identifiers in developing standards and for the report required by Section 7 of the GREAT Act.
    • Accelerate Implementation of New AI Laws. In 2020, Congress approved the bipartisan Artificial Intelligence (AI) Initiative Act and the AI in Government Act. These new laws provide an opportunity for the White House and government to lean-in on long-overdue activities to accelerate government adoption and use of AI. The Data Coalition members look forward to collaborating with OMB and agencies in implementing these important authorities as the Executive Branch shifts from researching AI to ethically using AI at a production level to improve our society.


    Responsible, Secure Data Sharing

    As the Biden-Harris Administration and OMB continue to respond to the global pandemic, address economic uncertainties, and design policies aimed at reducing social inequities, our country must ensure the necessary data can be shared and used across agencies and in collaboration with experts in the data and research community. The Data Coalition encourages OMB to fund strategies that enable responsible, secure data sharing as part of the Fiscal Year 2022 Budget Request and any forthcoming supplementation appropriations requests.

    • Establish a National Secure Data Service. In 2017 the U.S. Commission on Evidence-Based Policymaking recommended a National Secure Data Service to help address the challenges created by the decentralized nature of federal government data collection. In 2020, building on the Evidence Commission’s proposal, a suggestion to immediately launch a data service as a Federally-Funded Research and Development Center at the National Science Foundation emerged to rapidly accelerate progress on this deficit in government’s infrastructure. The Data Coalition members encourage OMB to include start-up funding for a data service in the FY 2022 Budget Request, and to proceed rapidly with other administrative and operational steps to launch a data service. Of note, a data service could substantially improve existing capabilities for analyzing disparities and inequities by race, ethnicity, and gender in program implementation without introducing new risks or potential harms from adding sensitive data elements to existing data collections for government services and benefits. The Evidence Commission’s unanimous recommendations offered a compelling case for the data service; hopefully the Biden-Harris Administration will advance this much-needed resource for the data community to address policymakers’ priority questions.
    • Expand Access to Income and Earnings Information. Multiple administrations have proposed expanding access to certain data assets with income and earnings information to support research and evaluation activities. The Data Coalition members encourage the Biden-Harris Administration to use the FY 2022 President’s Budget to acknowledge support for such proposals and to transmit proposed legislative language to Congress — such as limited expansions to access the National Directory of New Hires and certain tax information — that can enable improved analysis about eligibility for benefit programs and impacts on employment and training programs at the national, state, and local levels.
    • Pilot Test Innovative Approaches for Secure Data Sharing. As the country and world explore strategies for more efficiently using existing data assets while deploying privacy and confidentiality protections, the US has an opportunity to provide global leadership in the deployment and use of privacy-preserving technologies. The Data Coalition encourages OMB and federal agencies to consider applications and tests of multi-party computation and other approaches over the next year, focusing on high-value data assets where there are currently disincentives for sharing information across jurisdictions or entities. OMB should allocate resources from existing appropriate funds and identify new resources needs to support pilot tests and demonstration projects of privacy-preserving technologies in domestic, non-security agencies.

    Data Standards and Administrative Processes

    There are longstanding needs for improved data standards and technical specifications as well as simple improvements to administrative processes that could benefit evidence-building and data use in our society. The broad existing authority in the Paperwork Reduction Act for the Chief Statistician to issue standards and the ability of OMB’s Office of Information and Regulatory Affairs to align certain processes have long been under-recognized for their role in making data use more efficient. The Data Coalition appeals to OMB to not lose sight of the small changes that can have major implications and efficiencies for reducing burden and increasing value with data in our society, including: 

    • Review and Modify Inefficiencies in the Paperwork Reduction Act Implementation. The Paperwork Reduction Act has long been described as both a support and an impediment to government data collection and use. Government must do more to remove the inefficient, unnecessary barriers that inhibit generating high-quality insights needed for addressing and monitoring public policy matters. The Paperwork Reduction Act should not be about just burden reduction, but a means to improve and enhance the value of government data to benefit society. Reimagining how this law is implemented was a focus of a Data Coalition working group in 2020, which offered a series of small process modifications that could drastically improve the usefulness of this law while reducing a well-recognized pain point in government that distracts from real data management and analysis activities. The Data Coalition encourages OMB to accelerate the implementation of automated tools for information collection requests and clarifying its guidance on pain points for agencies’ PRA implementation.
    • Apply Long-Overdue Data Standards for National Collections in Public Health and Economic Response. Setting data standards is a basic process by which stakeholders and the data community agree on a common meaning for particular data elements. Having agreed-upon and applied data standards supports innovation, sharing, and identifying meaning in data analysis. Existing authority could be used to deploy better standards for public health, to support addressing the coronavirus pandemic, and for economic response and recovery, with improved entity identification in financial regulations and other entity reporting. Data standards will by no means solve all data issues facing government and society today, but data standards offer a common starting point to build upon. The Data Coalition strongly encourages OMB and relevant federal agencies to pursue improved standards in public health reporting and in financial reporting that can strengthen the quality of information used in our country’s response to current crises.
    • Update OMB Data Standard on Race and Ethnicity. OMB Statistical Policy Directive No. 15 provides for a common standard for race and ethnicity, which is widely recognized to be outdated and in need of revision. Over the past four years, despite an effort to consider revisions, OMB failed to propose and finalize an updated standard that could be used to improve analysis about inequities in government services and policies. The Data Coalition encourages OMB to update the federal race and ethnicity data standard as soon as possible based on public consultation and expert judgment about necessary revisions.
    • Refocus on DATA Act Implementation and Budget Transparency. The federal budget formulation and execution processes are among the most routine in government, notwithstanding challenges in timely appropriations actions. As the federal government implements budgetary obligations, prioritizing transparency and clarity for the American public can support efforts to build and restore public trust in government operations and activities. While many agencies made tremendous progress in implementing the DATA Act, too many agencies are struggling to achieve implementation requirements and expectations that support OMB’s use of the data, just as much as the American people, industry, and researchers. OMB can support improvements in data quality by demonstrating to federal agencies how the data can be useful, such as adopting DATA Act standards and approaches for producing federal budget crosscuts (e.g., climate change, disaster), monitoring forthcoming earmarks, and even publishing connections to agency budget justifications. The Data Coalition encourages OMB to approach transparency in government spending data in a more holistic manner and to lead among agencies in using the available data assets in published OMB analyses and databases.

    Workforce Capacity and Resources

    None of the goals for Executive Branch agencies can be reasonably attained without dedicated attention to workforce capacity and resources. When it comes to data policy and implementing the suggestions of this letter or other aspects of cohesive data management in agencies, the Data Coalition encourages OMB to work in collaboration with agencies and Congress to enable dedicated attention to capacity and resources. 

    • Strengthen and Diversify the Federal Data Workforce. OMB today can use existing survey mechanisms and data collections facilitated by the Office of Personnel Management (e.g., Federal Employee Viewpoint Survey) to analyze a broad range of expertise, diversity and inclusion attributes across every unit of government. But identifying gaps rapidly using real data, OPM and OMB can also determine where the workforce needs are greatest. In conjunction with agency assessments for evidence-building capacity, required by the Evidence Act and submitted to OMB in September 2020, OMB should use every available means to prioritize ensuring agencies have the people to meet emerging data, evidence, and innovation needs. OMB should encourage agencies to use existing authorities – such as the Intergovernmental Personnel Act – and prioritize hiring to fill critical gaps in data science, statistics, and evaluation.
    • Provide Adequate Resources for Chief Data Officers. Federal agencies need the capacity to pursue open data, data transparency, data governance, and data analysis activities; every agency needs adequate resources to truly recognize data as a strategic asset. Providing at least $50 million in implementation funding for FY 2022 for Chief Data Officers will directly support efforts to improve accountability and transparency of government policies and programs by better managing and using data.
    • Establish Practices for Flexible Data and Evidence Funding. While many agencies need additional resources to support evidence-building and data management activities, some agencies will also benefit from recognize funding flexibilities such as Evidence Incentive Funds, set-asides of discretionary appropriations, and waiver authorities in mandatory programs. The Data Coalition encourages OMB to include funding flexibilities in the FY 2022 Budget Request for every agency to support Evidence Act implementation.

    Our country needs good data to support useful evidence for decision-makers. OMB has a central role in fostering a cohesive data and evidence ecosystem. If this is successful, and data are a forethought in our government’s operations, our society will benefit from realistic solutions, accelerated policy coordination, and real innovation. The Data Coalition members look forward to supporting OMB and federal agencies in continuing to build a stronger national data infrastructure and welcome the opportunity to provide further expertise and perspective on the suggestions in this letter. We welcome the opportunity to provide technical assistance to any OMB officials or career staff in order to advance coherent, sound data policy in the United States.



    Nick Hart, Ph.D.

    President, Data Foundation

  • March 22, 2021 9:00 AM | Data Coalition Team (Administrator)

    RegTech data consists of information collected and used in the financial regulatory system. But the system for reporting this information in the U.S. is reliant on document-based reports. Having information in static documents limits the ability of regulators and private sector entities to deploy and benefit from emerging technologies, like machine learning and artificial intelligence (AI). 

    AI is already in use across the commercial, healthcare, and defense industries, but it has not yet been widely applied in the U.S. federal government. But we’re starting to see that change.  Federal lawmakers are considering promising strategies to deploy technologies to effectively leverage the government’s data assets in order to support technologies like AI. Congress is taking steps towards encouraging more government adoption of AI solutions. Executive Branch agencies are considering how to apply AI to achieve their missions, but there are still foundational policy issues that must be addressed to ensure the platform for applying innovative technologies exists within government, including improvements to the data quality and the adoption of open data standards, that AI  relies heavily on to function properly.

    Proposals such as the Financial Transparency Act that would require the adoption of a non-proprietary, legal entity identifier and other laws, like the OPEN Government Data Act that require standardized, machine-readable data are helping to set the stage for modernizing federal data and the financial regulatory reporting system. This, in turn, fosters an environment for technologies like AI to improve the efficiency of the financial sector and regulatory reporting. Requiring U.S. financial regulators to adopt consistent data fields and formats for information collected from the industry can help bring the full benefits of AI to the RegTech sector.  

    Better standardization of regulatory reporting requirements across the agencies would significantly improve the ability of the U.S. public sector to understand and identify the buildup of risk across financial products, institutions, and processes. Having good quality, standardized data is an important steppingstone to reaping the benefits of the ongoing digitization of financial assets, digitization of markets, and growing use of new, cutting-edge technologies, such as artificial intelligence.

    Panelists in last week’s webinar, which marked the kicked off RegTech 2021 Series, presented by Donnelley Financial Solutions, provided advice to the government from private sector experts, emphasizing the value of AI for everyday uses and a wide variety of uses. These experts set expectations for what AI can be used for, discussed challenges in scaling up pilot projects, as well as the government’s unique place to push for standardized, structured content. 

    RegTech 2021 Series: Move.AI - Advancing Regulatory Technology with AI

    Learn more about the RegTech Series here and join us for the remainder of the series by registering for our next event, where we will hear from government experts. 

  • December 17, 2020 9:00 AM | Data Coalition Team (Administrator)

    It’s been said and written about many times, but 2020 presented new and staggering challenges across the world and in this country.  Data policy was no exception. 

    At the beginning of the year, the President’s 2021 budget included positive signals for data and evidence priorities. Many agencies made progress in prioritizing Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act) implementation and offered considered assessments of resource needs. Additionally, secure data sharing and access became a substantial part of agency planning, and artificial intelligence funding offered the potential to benefit core data infrastructure. 

    As the pandemic hit, it became increasingly clear that high quality, accessible and useable data would be necessary for an informed, evidence-based response. Here are just a few of the examples of what has been accomplished in the past 12 months: 

    Data Priorities in COVID Response

    The Data Coalition provided an open letter to Congress that outlined key recommendations for the pandemic response that aligned with our policy priorities, including improving data standards, expanding access to essential data, and implementing transparency and oversight for relief and stimulus funding. 

    Coronavirus Aid, Relief, and Economic Security (CARES) Act 

    Oversight on Government Spending

    The  CARES Act created vital transparency and reporting requirements that mean intense coordination across the federal enterprise in order to manage the high volume of information required for effective oversight. The Data Coalition’s Budget Transparency Taskforce strongly urged the Pandemic Response Accountability Committee to use existing infrastructure and data analysis standards in order to quickly establish meaningful transparency for emergency spending associated with the country’s response to the pandemic.

    Public Health Data

    The Centers for Disease Control and Prevention (CDC) received $500 million as a part of the CARES Act for improving our existing public health surveillance and analytics infrastructure, in order to provide more timely and accurate health data to inform pandemic response.

    Pulse Surveys

    The Data Coalition first encouraged Congress to support and fund the development of a large-scale, household survey on COVID-19 impacts in March. The Census Bureau is now collecting new data on households and small businesses through their new pulse surveys. These government collections are done on a large scale and provide official data for researchers, policymakers, and others. These government efforts are essential and work together with private philanthropic efforts like the COVID Impact Survey. The Data Coalition will continue to advocate for efforts to promote this type of valuable data collection, including a pulse survey for assessing the impacts of COVID-19 on the nation's education infrastructure

    Progress on National Secure Data Service 

    Three years ago, the experts on the U.S. Commission on Evidence-Based Policymaking unanimously recommended a national secure data service be established. Throughout this year, the Data Coalition has been meeting with Congressional offices highlighting a data service as one reasonable, low-cost strategy for support the use of government-collected data to respond to the pandemic. Recognizing recent changes to federal law and the contemporaneous environment, the recommendation to establish a new Federally Funded Research and Development Center at the National Science Foundation.

    Legislative Victories and Priorities

    The Taxpayers Right to Know Act in the National Defense Authorization Act (H.R. 6395)

    The Taxpayers Right-to-Know Act (TPRTK) was a part of the Data Coalition’s legislative agenda and was included in the final version of this year’s NDAA, which passed both chambers with a veto-proof majority. TPRTK uses existing government-wide financial data standards to make information about federal expenditures more readily available and transparent to American taxpayers. This will increase the understanding of how to improve the productivity and impact of federal programs delivering valuable services to the American public. 

    Open the Courts Act of 2020

    This bill passed the House of Representatives this year, with provisions that will require the electronic court records system to comply with data accessibility standards. Though this bill will need more work in the upcoming Congress, we were glad to see this part of our legislative agenda progress as well. 

    Health STATISTICS Act

    The Data Coalition provided technical assistance to Rep. Scott Peters (D-CA), who along with Reps. Lucy McBath (D-GA), Anna G. Eshoo (D-CA), and Brian Fitzpatrick (R-PA) introduced the bipartisan Health Standards to Advance Transparency, Integrity, Science, Technology Infrastructure, and Confidential Statistics Act of 2020 (Health STATISTICS) Act. This legislation builds on the Evidence Act by addressing the weaknesses of our public health surveillance system by ensuring the  CDC and the public have access to timely accurate and actionable data critical to pandemic response

    AI Advancements

    The Data Coalition endorsed the principles for a national AI strategy outlined in a House Resolution, sponsored by Reps Will Hurd (R-TX) and Robin Kelly (D-IL). With existing laws, such as the Foundations for Evidence Act and the OPEN Government Data Act, the adoption of AI envisioned in the Hurd-Kelly resolution can be achieved by transparently and equitably promoting the use of high-quality data across government. 

    Federal Data Strategy Forum: Year 2

    The Data Coalition and the Data Foundation co-hosted an open forum on the Federal Data Strategy, which is entering its second year. Speakers reinforced the need for a federal data strategy to meet social needs and expressed interest and enthusiasm for helping policymakers implement new legal and regulatory framework. 

    Looking forward to 2021

    In 2021, as we work to control the pandemic and towards economic recovery, data will play an important role. Prioritizing data and evidence-building priorities will help inform effective and efficient policies to address existing and emerging challenges. In addition to our recommendations for the incoming Administration, we will continue to support the Chief Data Officer Council and other advisory bodies, advocate for the continued implementation of the Evidence  Act, and support the continuation of a Federal Data Strategy. The Data Coalition is looking forward to continuing our advocacy and building a strong stakeholder community.

  • December 09, 2020 9:00 AM | Data Coalition Team (Administrator)

    What is Version Control?

    Every writer has files somewhere named something like “GrantProposalDraft1-finaledited-reallyfinal,” which implies a whole sequence of versions that may or may not be saved elsewhere. Version control systems automate this process when working with computer code and keep a history of all work. 

    Version control is invaluable for any complex software project involving team collaboration. Consider the development of a large program like macOS. Hundreds of Apple employees in different departments, different buildings, and even different time zones simultaneously make updates, rewrite segments of code, and fix bugs. Without version control there is no way to tell who changed what and when. Changes slip through the cracks and inconsistencies compound in the code.

    To a great extent, the private sector has solved this problem. If you work anywhere near software development you have likely heard of git, a free and open-source distributed version control system, and GitHub, the hosting service that lets you manage repositories.

    Git was developed in 2005, and other automated version control systems pre-date it by a number of decades. But governments and legal systems have maintained their own ‘version control’ with labor-intensive processes to amendment and update law.

    GitHub for Government?

    When it comes to version control for legislative documents, the main questions to address are relatively straightforward. What was the law? What is the law? But it might surprise you to know that for many laws, particularly those that were recently changed, there is no current official version and no way to see a precise history of amendments over time. 

    This is a big problem. In the Canadian House of Commons a lack of version control has led to embarrassing headlines: Senate debates wrong version of government bill for the second time in less than three months.

    Even apart from such snafus, all legislative bodies struggle with the need to quickly understand how a proposed bill would impact existing laws, or how an amendment would change a proposed bill. Even for experienced legislators, their staff, and policy lawyers, comparing versions of bills and laws is an arduous, manual, expensive, time-consuming process.

    So, why can’t the U.S. Congress, and other legislatures, just use git? Unfortunately, there are a number of processes that make standard version control for computer code not applicable to legal documents. 

    • Amendments, not versions. Amendments to laws are not made as versions. Instead, you often get a single sentence with textual language like “strike,” “insert,” “remove,” or “repeal” that must be interpreted. Amendments typically explain how they would change existing laws using prose, not redlines.
    • Acts, not repositories. Each law Congress passes is a new Act. New Acts are changed at many hierarchical levels by subsequent Acts. Congress does try to codify all Acts into the United States Code by passing “codification bills,” but this project is decades behind, which means many Acts have not been incorporated into the Code yet, and there is no comprehensive repository of U.S. federal law. To a programmer, that would be as if each new commit that you have changes a whole bunch of repositories in an unlimited way.
    • Standard Diff doesn’t work. Unlike showing the changes between two versions of the same file, matching “the same” section of an amended law requires semantic judgement, creating difficulty grouping changes.

    The difficulty of applying version control systems to law is compounded when you consider the different types of changes and materials attached to legislative documents, including amendments, hearings, floor speeches, testimony, votes, conference committees, effective dates, regulatory implications, and countless other details that need to be tracked.

    To bridge the divide between modern version control, git, and coding and the esoteric traditions and processes of law in the United States, experts with broad and deep competence in both distinct fields are necessary. Those experts — who could be referenced in shorthand as “lawyers with GitHub accounts” — are part of a small but growing community.

    Modernization in the House

    In June 2020, the Clerk of the U.S. House of Representatives issued an initial report on a rule change enacted at the start of the 115th Congress, commonly called the Comparative Print Project. Despite the unfortunate use of “print” in its working title, the report states that “the project will result in a robust, scalable, and secure web application.”

    The scope of the Comparative Print Project calls for two distinct types of comparison at various points in the legislative process.

    Clause 12(a) calls for a document that illustrates changes and differences made by a legislative proposal to current law. How does H.R. 123 change the Social Security Act (non-codified law) and 38 USC 321 (positive or codified law)?

    Clause 12(b) calls for a document-to-document comparison between different versions of bill language. How does the Rules Committee Print differ from the bill reported by the committee?

    Legal and Tech Experts with Proven Solutions

    In August 2018, a contract was awarded to Xcential Legislative Technologies to build document comparison software for the U.S. House that can track changes in law and will ultimately be able to show what the law was at any point in time.

    Xcential got its start by building the system that the California Legislature now uses to write, update, and amend laws. Today, Xcential’s largest project is in the U.S. Congress, working on the House Modernization Project which is involved in many different aspects of the legislative workflow. Xcential designed an open standard XML for legislation called United States Legislative Markup (USLM) and converted the entire U.S. Code into USLM, paving the way for version control for law.

    Solving Version Control for Law

    Xcential addressed the challenges of version control for the law with three central solutions: machine-readable amendments, machine-readable legal citations, and the creation of a legally-relevant diff. 

    The House Clerk’s report explains that Xcential’s team “compiled a current law dataset stored in a custom repository solution and developed natural language processors [NLP] to do the work of recognizing, interpreting, retrieving, and executing the amendatory language contained in the legislative proposal.”

    In order to do this, Sela Mador-Haim, an NLP expert at Xcential, took hundreds of thousands of amendatory phrases, deciphered the grammar and semantics of those phrases, and put them into a machine-readable format. That effort enables the translation of legal documents produced by the U.S. Congress into a format that is machine-readable.

    The same machine-readable translation process was then completed for legal citations. When citations can be machine-processed a query language is provided. Xcential is then able to go into the database and specify precise addresses in the law for references within legislative documents. Combining machine-readable amendatory phrases and legal citations gives us an address where the change is to be made and a language that describes the change, resulting in machine-executable instructions. 

    The goal is to create a legally-relevant diff. The challenge inherent in doing so is not simply identifying revised portions of legislative text, but understanding what is legally relevant to the drafter. 

    In particular, the goal of version control for the law, set forth in the Comparative Print Project by the House Clerk and Legislative Counsel, is the illustrate changes between the following: 

    • Two versions of a bill, resolution, or amendment (document to document comparisons).
    • Current law and current law as proposed to be changed by amendments contained in a bill, resolution, or amendment to current law (codified and non-codified law).
    • A bill or resolution and the bill or resolution as proposed to be modified by amendments (amendment impact).

    According to the House Clerk’s report, Xcential’s NLP tool is currently “performing very well and with a high degree of accuracy.” The report offered the following figure indicating the solution’s success.


    Version control for law is neither as simple as coders imagine, nor as complex as lawyers would make it. While the focus of the project described above was on federal law in the United States it contains lessons that can be applied in your local city council up to national jurisdictions around the world.

    This article was originally published by HData and Xcential Legislative Technologies in Data First, a monthly newsletter covering the modernization of legislation and regulation. Subscribe to Data First.

  • November 19, 2020 9:00 AM | Data Coalition Team (Administrator)

    On November 18th, the Data Coalition sent the following letter to Congress urging them to establish a school pulse survey for assessing the impacts of COVID-19 on the Nation’s education infrastructure.

    The .pdf version is available here.

    November 18, 2020

    RE: An Open Letter on Establishing a School Pulse Survey for COVID-19 Mitigation Behaviors

    Delivered by Electronic Mail


    Members of Congress – 

    As COVID-19 caseloads surge around the United States to alarming levels, the effects on schools, parents, students, and educators will certainly be profound. Unfortunately, today there is little systematically-collected data available for research and analysis that can help policymakers and administrators understand real-time mitigation activities in schools across the country, and importantly, the impacts on students and educators. 

    The COVID-19 pandemic is highlighting many gaps in our country’s data infrastructure, including for our entire educational system. The Data Coalition recently highlighted the challenges for our federal data collection and data management efforts that should be addressed to effectively respond to the global pandemic. Admittedly, we failed to identify one important issue: as our country’s educational institutions respond to the pandemic with varied approaches there is no comprehensive, timely mechanism for assessing the public health and other impacts of these efforts.  

    When it comes to educating our nation’s children – our government institutions and educational community have an obligation to ensure relevant information is collected and analyzed to effectively and equitably enable learning for those who will be responsible for our country’s future.  

    The U.S. Department of Education’s Institute of Education Sciences (IES) is one of the most qualified and respected institutions for presenting information about the state of education in the United States. But IES is unable to capture and present real-time information about schools across the country in responding to the pandemic, assessing learning losses, and gauging the adequacy of mitigation policies and behaviors. 

    Despite the gaps at the Education Department in collecting information, we know anecdotally schools across the country are innovating, adjusting, and piloting new approaches that could improve learning for the next generation of students. How will our country, our parents, our school administrators, and our policymakers know what works best for responding to the pandemic?

    In short, our country needs better evidence about the impacts of the pandemic on our educational institutions. The United States needs a School Pulse Survey, conducted monthly over the next year.

    There are some existing and admirable efforts to gauge the impacts of COVID-19 on schools, students, and teachers around the country. For example, the Census Bureau’s weekly Household Pulse Survey, which provides some weekly insights at the household level, including based on questions about learning. The insights are unfortunately not useful at gauging school- or district-level impacts. Philanthropic and academic institutions have also stepped in to fill some gaps, though the scope of scale of existing projects continues to present challenges for developing comprehensive insights in real-time. 

    Additionally, IES’s National Center for Education Statistics (NCES) provides a wealth of information about educational quality and learning across the U.S. The current data collection and analytical activities of NCES, however, take years to implement. The country cannot wait five years – over even one – to better understand the impacts of COVID-19 on our students. 

    A School Pulse Survey can do both: gauge school and district level impacts in real time. This will rapidly fill in existing gaps in knowledge about activities in the country while existing data collection mechanisms continue unimpeded. We encourage Congress to appropriate or direct the Department of Education to allocate up to $5 million and at least 1 FTE to develop, manage, execute, and report an Education Pulse Survey with monthly waves during the coming school year.

    A School Pulse Survey for school administrators should specifically gauge behavioral and mitigation behaviors applied for responding to the global COVID-19 pandemic, and also capture information relevant for assessing learning loss and other key outcomes or indicators of educational quality in the coming year. Such a survey could be designed to minimize burden on school administrators while also ensuring policymakers and key decision-makers have the information they need to ensure our educational institutions provide the quality of learning that our children deserve. 

    Everyone in our country has suffered or been adversely impacted during the global pandemic. We need to build a rapid-cycle evidence capable of ensuring our children – and our future – benefit from innovations that are happening in the county. A national Education Pulse Survey must be part of the solution. 

    It is one thing to say our children are our future. It is another to ensure we have the evidence to make that future both a possibility and a reality.  Our country’s students need us to make evidence-based policy choices informed by timely, accurate and relevant data.  The Data Coalition strongly urges the Congress and the Department of Education to act to rapidly build this evidence base. 

    Thank you for your consideration of how to meaningfully deploy data in our country for effectively responding to the pandemic. We know you have many critical choices in coming weeks but hope prioritization of a new School Pulse Survey will be among those priorities. For immediate assistance, please contact me or the Data Coalition’s policy manager, Corinna Turbes (corinna.turbes@datacoalition.org). 



    –Nick Hart, Ph.D.

    CEO, Data Coalition

  • October 20, 2020 9:00 AM | Data Coalition Team (Administrator)

    When the next Administration is sworn in to lead the Federal Government’s Executive Branch in January 2021, there will be a large number of pressing priorities for fulfilling campaign promises and addressing the many challenges facing the country. During this unprecedented time in our country, the need for valid, reliable data is clear. As America’s premier voice on data policy, the Data Coalition and its members strongly encourage the administration in 2021 to prioritize using data to determine how to most effectively and efficiently address the country’s emerging challenges. To do so, the President, political appointees, and career civil servants can all help ensure agencies are collecting the data and developing the evidence necessary to understand critical priorities challenges, while also planning for the country’s future policy needs. The 10 following recommendations are common-sense steps that will improve our nation’s data infrastructure and support evidence-based policymaking. The Data Coalition encourages prioritization of these recommendations during the transition planning for 2021. 

    #1: Reissue, revitalize, and refocus the Federal Data Strategy to support evidence-based decision-makingThe Federal Data Strategy is a 10-year plan developed with widespread civil servant and civil society input, outlining principles and practices every agency should implement over the next decade. Refocusing a national data strategy on core priorities — such as pandemic response, economic recovery, social equity, and financial oversight — will provide agencies much-needed direction about how to most efficiently implement data governance and open data strategies that meet policymaker expectations. An action plan and strategy in 2021 should explicitly complement rapid implementation of the Foundations for Evidence-Based Policymaking Act, the OPEN Government Data Act, the re-authorized Confidential Information Protection and Statistical Efficiency Act, and the Grant Reporting Efficiency and Agreements Transparency Act. Specific actions include issuing implementation guidance to agencies on the OPEN Government Data Act’s data inventory and open data expectations, as well as ensuring the legal framework for new data sharing authorities can be accessed by agencies with final regulations expected under the Evidence Act.

    #2: Provide adequate resources for Chief Data Officers and other agency data leaders to implement core data priorities. Federal agencies need the capacity to pursue open data, data transparency, data governance, and data analysis activities; every agency needs adequate resources to truly recognize data as a strategic asset. Providing at least $50 million in new, immediate implementation funding for Chief Data Officers will directly support efforts to improve accountability and transparency of government policies and programs by better managing and using data.

    #3: Launch new capabilities for secure, responsible data sharing, including establishing a National Secure Data Service. Development of a National Secure Data Service within the Executive Branch is long overdue, and the administration should prioritize using existing legal authorities as appropriate to provide new data analytic capabilities. This service was unanimously suggested by the U.S. Commission on Evidence-Based Policymaking to fill a substantial gap in existing capabilities to securely, confidentially combine datasets for research purposes. In addition, the administration must explore other areas where increased data sharing and linkage may be necessary, such as addressing improper payments and enforcement actions across government.  

    #4: Apply reasonable, open, and consensus data standards for financial services reportingAs the country plans for economic recovery, improved data quality is critically needed for financial regulatory agencies, financial markets, and investors. Basic improvements in data quality and consistency can be achieved by pursuing the implementation of common business identifiers and other aspects of proposed legislation, such as the Financial Transparency Act.

    #5: Expand access to certain income and earnings data for research and evidence-building activities. Core indicators for measuring economic mobility and stability in the country require access to certain income and earnings data, often already collected by the government. The administration should pursue improvements in data quality to systems like the National Directory of New Hires, and propose any adjustments to federal law to expand access for research activities and to support the production of relevant open data. Similar proposals for restricted access to certain tax data for improving critical national economic indicators, such as the Gross Domestic Product, and economic statistics should be prioritized.  

    #6: Improve the system for compiling national COVID-19 data and relevant health information by prioritizing public health data standards. The country lacks basic public health data standards as part of the effort to respond to the pandemic, the next administration should prioritize the adoption of basic standards to support aggregation of local and state-level data for national analyses. 

    #7: Ensure the American public has access to reliable federal spending data spending data. While vast improvements to government spending data were made over the past decade, far too much information is still low-quality or difficult to access. The next administration should make agency congressional budget requests available as structured data, and publicly available in a centralized database for the American public to have insights about the budget formulation process. In addition, current spending data could benefit from improved capabilities and application of existing government-wide financial data standards to make information about federal expenditures more readily available and transparent to taxpayers.  

    #8Ethically and responsibly implement emerging data analytics capabilities like artificial intelligence and machine learning in government, beyond exploratory research. The next administration should take proactive steps to address the potential for bias in AI applications by improving the underlying data, thoughtfully designing algorithms, and addressing human bias. Applications should include clear evaluation metrics for any AI pilot programs and experiments, including a focus on how these projects can be scaled. 

    #9: Strengthen and diversify the federal data workforce, including by establishing a data science occupational series. The administration should direct the Office of Personnel Management to rapidly establish a new occupational series for data science, encourage agencies to use the series for new hires, and subsequently promote strategies for improving diversity in the field for women and persons of color. 

    #10: Modernize implementation of the Paperwork Reduction Act to ensure timely, valid, and reliable data collection. The next administration should take steps to consider strategies for administratively improving the implementation of one of the country’s most important data laws, and propose any necessary modifications to Congress. Modernization should aim to improve alignment with recent legal authorities and current analytical capabilities, to ensure the quality and value of government information is maximized, and minimize the burden on the American public in providing information to the government.

    Recognizing that good decision-making needs good data, the Data Coalition calls on the next administration to prioritize efforts to improve the quality, accessibility, and usability of our country’s data. In doing so, the administration will support ongoing efforts to transform society’s capabilities to generate insights that can be used to promote transparency and accountability of our government in parallel with efforts to devise strategies for improving the effectiveness and efficiency of government operations.   

    PDF Version available here

  • September 23, 2020 9:00 AM | Data Coalition Team (Administrator)

    The Paperwork Reduction Act of 1995 (PRA) is perhaps one of the least well-known data collection, management, and sharing authorities in the Federal government. In its own right, the PRA upholds important principles for government data through accountability and transparency mechanisms. The PRA is most widely recognized as the law that requires federal agencies to develop Information Collection Requests (ICRs). ICRs apply to information collected from 10 or more respondents and are a central component for PRA’s governing of the government’s data collection processes. 

    The PRA is intended to enable agencies to meet the government’s legitimate need for information without unduly burdening those who have and can supply that information. While the PRA is a powerful and useful instrument, there are real frustrations about its implementation for government employees, public stakeholders, and government contractors alike. Clearance processes can be burdensome and time-consuming, resulting in delays in data collections necessary for agencies to fulfill their missions. At a minimum, the normal process takes four months to satisfy existing statutory and administrative requirements, though six to nine months is considered more realistic for new, non-emergency requests.

    This year, a group of experts convened by the Data Coalition considered potential reforms to address identified challenges and barriers to effective implementation of the PRA’s purpose and intent. Building on the recommendation from the Commission on Evidence-Based Policymaking—suggesting some modifications may be needed to the PRA to support evidence-building activities as well as other expertise—the following proposals are intended to advance a system that enhances the value of government-collected information, eases the burden and costs imposed on agencies, and prioritizes data as an asset, all while encouraging transparency and public trust in government data.

    The Working Group recommends the following: 

    Recommendation #1: Congress should review and propose modifications to the existing public comment procedures for the PRA

    Recommendation #2: Congress should establish a more streamlined process for ICR review and adjust the scope of the PRA applicability.

    Recommendation #3: Congress must clarify the expectation for the Chief Data Officers to coordinate ICRs under the data governance process within individual agencies.

    Recommendation #4: OMB should accelerate the implementation of a government-wide automated tool for ICRs that support agency data inventories. 

    Recommendation. #5: OMB should issue clarifying guidance to agencies on pain points in implementing the PRA

    Read the full recommendations here.

    The Data Coalition calls on Congress and the Executive Branch to prioritize modest improvements to the Paperwork Reduction Act to improve the efficiency of government data collection and management. Modernizing the Paperwork Reduction Act is long overdue. Small changes to the existing legal framework can offer substantial improvements for reducing the burden on the American public and the value of government-collected data. 

  • July 29, 2020 9:00 AM | Data Coalition Team (Administrator)

    The Senate Homeland Security and Governmental Affairs Committee reported the bipartisan CFO Vision Act of 2020 (S. 3287) favorably to the Senate floor last Wednesday, July 22. Now the bill awaits a vote by the entire body. 

    This bill would standardize and clarify the roles of agency Chief Financial Officers (CFO) across the government, which were first created by the 1990 CFO Act. It also would require CFOs to coordinate with other senior personnel such as the Chief Data Officer, Chief Evaluation Officer, and Chief Information Officer. It would also require agencies to submit performance-based financial management metrics to the Government Accountability Office, the Office of Management and Budget, and Congress.  

    This is a significant step forward in improving the government’s financial management. Open and transparent data about how agencies allocate resources is a foundational part of accountable government. Congress has taken meaningful steps forward in modernizing the way spending information is collected, reported, and published. Legislation like the DATA Act and GREAT Act aims to strengthen federal agencies’ oversight and management of spending data. The CFO Vision Act furthers these goals by modernizing and clarifying the responsibilities of the CFO and thereby delivering accountability to the public. This effort is an important step forward in using data to improve the government’s financial performance and accountability. The need for these improvements has never been more clear, as CFOs will have an important role in the oversight and implementation of COVID-relief spending. 

    The Data Coalition is pleased to see this bill attract strong bipartisan support from both chambers of Congress. We urge Congress to continue to work to improve financial data and government accountability.

  • July 17, 2020 9:00 AM | Data Coalition Team (Administrator)

    The Data Coalition sent the following letter to Health and Human Services Secretary Alex Azar, along with key officials at the Office of Management and Budget (OMB) regarding the updated guidance issued on July 10th for hospitals reporting COVID-19-related administrative records.

    The letter outlines the Data Coalition’s concerns regarding data quality and transparency with the new reporting process. Our letter also asked for clarification on the updated guidance and how it will align with expectations for agencies communicated in 2019 from OMB in the principles of the Federal Data Strategy.

    The Data Coalition supports efforts to increase the capabilities of federal agencies to produce high quality, accessible, and useable, but in such a way that also builds transparency and public trust.

    The full text of the letter follows.

    July 17, 2020 

    Secretary Azar, 

    On July 10, the Department of Health and Human Services (HHS) issued updated guidance for hospitals reporting COVID-19-related administrative records. The guidance directs hospitals to report information about testing, capacity, and patient flows directly to HHS using a new contractor, circumventing historic practice for such data collection to occur within the Centers for Disease Control (CDC). For many stakeholders, the HHS guidance raises questions about the data quality and transparency with the new reporting process, which could have implications for accessibility and use of the data. In addition, the publication of the guidance appears to not fully align with expectations for agencies communicated in 2019 from the Director of the White House of Office of Management and Budget (OMB) in the principles for the Federal Data Strategy.

    HHS’s approach for daily data collection and systems management during an ongoing pandemic must ensure relevant data will continue to be accessible for researchers and organizations supporting the response, while also remaining transparent to the American people with appropriate open data. Consistent with the Federal Data Strategy’s principle of transparency, the Data Coalition calls on HHS to provide additional details to the American public about the intent, role, and purpose of the modified approach for data collection and publication of critical COVID-19 hospital data. In particular, as Practice #30 of the strategy encourages, HHS should promote public trust with transparency in communicating how data will be used. HHS should also articulate how the Department applied the principle of responsiveness for gathering and incorporating stakeholder feedback on this shift in reporting. 

    HHS’s limitations in data sharing capabilities are not new, are widely documented, and pose practical limitations for ensuring researchers have access to needed, relevant information to support COVID-19 responses. HHS’s Chief Data Officer, the Director of the National Center for Health Statistics, and other data leaders across the agency must effectively collaborate to support realistic data governance for the agency’s data, as expected in the bipartisan Foundations for Evidence-Based Policymaking Act of 2018, directed under the Federal Data Strategy, and specified in HHS’s Data Strategy.

    On behalf of the Data Coalition’s members, we look forward to supporting HHS in continually strengthening the Department’s capabilities for producing high-quality, accessible, and useful data. But in this work, transparency and public trust are essential; we strongly encourage HHS to take deliberate steps to maximize the application of these principles moving forward and to address current concerns for the new guidance. 


    Nick Hart, Ph.D.

    CEO, Data Coalition



    Russell Vought, Acting OMB Director

    Paul Ray, OMB/OIRA Administrator

    Eric Hargan, HHS Deputy Secretary

  • June 10, 2020 9:00 AM | Data Coalition Team (Administrator)

    The CARES Act established the Pandemic Response Accountability Committee (PRAC), along with several other oversight mechanisms. This body has an $80 million budget to help it oversee $2.4 trillion in economic relief to individuals, businesses, and health care providers in response to the coronavirus pandemic. The PRAC was modeled after the Recovery Accountability and Transparency Board, formed to oversee the funds associated with the 2008 financial crisis. Fortunately, the government’s ability to gather and publish spending data has greatly improved as a result of the 2008 crisis, due to the passage and implementation of laws like the DATA Act, the Evidence Act, and the GREAT Act. The Data Coalition sent the following letter, urging the PRAC to continue this momentum, and leverage existing data standards and practices to their, and society’s, advantage.  

    Dear Mr. Horowitz, 

    The Data Coalition, America’s premier voice on data policy, works with Congress and the Executive Branch to ensure responsible data policies for data to be open and accessible, in order to promote transparency and public trust. Open and transparent information about how agencies allocate resources is a pillar that supports accountable government, which in turn promotes public trust in our institutions. 

    The Coronavirus Aid, Relief, and Economic Security (CARES) Act created vital transparency and reporting requirements that will mean intense coordination across the federal enterprise in order to manage the high volume of information required for effective oversight. The Data Coalition members strongly urge the Pandemic Response Accountability Committee to use existing infrastructure and data analysis standards in order to quickly establish meaningful transparency for emergency spending associated with the country’s response to the pandemic. 

    An important part of this process will be to use existing federal data standards for the data fields agencies expect to collect, analyze, and publish in support of the PRAC’s statutory goals. This should include integrating agency reporting requirements required by the Federal Funding Accountability and Transparency Act of 2006, the Digital Accountability and Transparency (DATA) Act of 2014, and the recently enacted Grant Reporting Efficiency and Agreements Transparency (GREAT) Act.  

    We also encourage the PRAC to, in partnership with agencies, invest in federal transparency platforms and data systems to meet the requirements in the CARES Act. As state governments and congressional leaders have also recommended, the PRAC should follow the past model of the Recovery Accountability and Transparency Board, which established a digital recipient reporting system that used standardized data to deliver accountability and transparency without imposing an undue burden on recipients of federal funding.

    By leveraging data resources thoughtfully and appropriately and building on existing transparency efforts, the PRAC will be able to improve oversight and the impact of the recovery funds. This serves the PRAC’s mission, bolsters transparency efforts already underway, and strengthens the ability of the American people to hold their government accountable. 


    Thank you for your consideration. 



    Nicholas R. Hart, PhD

    CEO, Data Coalition



Powered by Wild Apricot Membership Software