Log in


Log in


  • November 16, 2021 9:00 AM | Data Coalition Team (Administrator)

    Earlier this year, the Chief Data Officers Council requested public feedback on how they can continue to improve the government’s management, use, protection, dissemination and generation of data in government’s decision-making and operations.

    In response, the Data Coalition hosted a virtual public forum to create an opportunity for the data community to offer feedback, recommendations, and advice to the federal CDO Council. As a result of that forum, as well as research informed by the Data Foundation's CDO Insights Survey, we offered the following 12 recommendations to the CDO Council.  Our full comments are available here.

    • Recommendation 1 – CDOs should work with their agency CFO and OMB to increase CDO funding flexibilities and direct resources.  Most CDOs do not have adequate resources to fulfill their statutory responsibilities and support agency missions. CDOs need sustained, predictable, and adequate resources to implement data priorities. Congress should authorize CDOs to use additional funding flexibilities and set-aside authorities, as well as provide increased direct appropriations for CDOs to succeed. This longer-term resourcing plan aligns with the congressional intent in establishing the CDO role through the Evidence Act, which created the position indefinitely rather than for a short-term period. 
    • Recommendation 2 – CDOs should work with OMB to clarify responsibilities and expectations. While CDOs are operating with their peer community of practice and under the general framework of the Evidence Act and the Federal Data Strategy, additional guidance from OMB can help align emerging priorities from the administration with the activities implemented by CDOs. In addition, CDOs will benefit from  clearer expectations on reporting requirements, including how to address required due dates and expectations about what should be reported to OMB, Congress, and the American people. Additional guidance could also include more tactical direction about what steps CDOs should take, how to  prioritize the steps, and areas for interagency cooperation and collaboration. 
    • Recommendation 3 – Congress should remove the statutory sunset of the CDO Council. Currently the CDO Council is scheduled by law to sunset in 2025. It has proven itself to be a valuable coordinating body and community of practice for CDOs. The CDO Council provides vital technical assistance and a valuable community of practice to convene and share knowledge. Since most CDOs and their offices are relatively new to the role and responsibilities, additional support from the CDO Council and peers in the form of technical assistance, resources for strategic planning, and other planning processes can support the entire CDO community, including for CDOs operating with limited staff and capacity. With the expectation that CDO roles continue indefinitely, the coordination of the CDO Council should as well. The CDO Council or OMB should include this request in an appropriate forum, such as the package of FY 2023 President’s Budget legislative proposals.

    • Recommendation 4: The CDO Council should work to create an ecosystem of data-literate and data-fluent workers. The need for more staff capacity was the top request articulated by CDOs in the Data Foundation’s CDO Insights Survey. CDOs did not just request FTEs, but rather to add specific highly-skilled data scientists, data architects, and data engineers required to successfully carry out data governance and management activities. One cross-agency effort that was viewed positively by participants was the January 2021 joint hiring initiative coordinated by the Office of Personnel Management. Ten agencies joined together to put out the call to hire 50 senior data scientists. In addition to getting high-level expertise into CDO offices, it is important that there are base levels of data literacy throughout the workforce in order to support a culture of data. CDOs should create a shared framework for data skills needed to support their agencies as well as definitions for various roles throughout their agencies and the types of skills required.
    • Recommendation 5: CDOs should emphasize their role as designated leaders to promote training and data fluency among staff of departments. Commitment from agency leadership to establish a strong data culture in agencies is critical for a coordinated training and retention strategy. This should include identifying current gaps in skills, capitalizing on existing training programs and models, and developing training programs when necessary. Specific limitations of privacy frameworks in which agencies are operating should also be addressed as part of training.
    • Recommendation 6: The CDO Council should work with the OMB to ensure that forthcoming implementation guidance to agencies on data inventories prioritizes machine readability and interoperability. Implementing and updating the metadata necessary for data inventories across federal agencies can be an intense process, representing a significant workload. In order to deploy automation technologies that reduce workload, as well as improve the quality of data inventories and the quality of aggregating services like, the metadata standards associated with these inventories should be machine readable and interoperable. Machine readable, interoperable metadata supports easier discovery and use of data, especially as the number of data sets within data inventories continues to grow. In addition to ease of discovery, machine readability allows for the automation of several processes that can help reduce burden on custodians of data inventories.
    • Recommendation 7: CDOs should focus on data sharing standards that facilitate interoperability, data linkage, and privacy. Standardization and creation of data standards are emphasized in the recently released Advisory Committee on Data for Evidence Building Year 1 report. The report provides a review of the state of data for evidence building in the federal government, particularly on opportunities for secure data sharing. There are a number of examples of entities that securely aggregate, integrate, and share information. We strongly urge the CDO Council to align its work with the efforts undertaken by the Advisory Committee. Additionally, data sharing approaches must prioritize application of robust confidentiality and privacy safeguards. Various tiers of access and pilot projects testing the use of privacy-enhancing data linkage and multiparty computation offered limited preliminary evidence about potential promise for such approaches, however, further investment into privacy-enhancing technologies is needed in government prior to operating at scale. 
    • Recommendation 8: Publicly accessible data must be prioritized. Ensuring that government data are easily accessible and usable aligned with efforts to promote transparency and accountability for government. Accessibility also facilitates collaboration with researchers, the private sector, and other levels of government, which can lead to more efficiency and innovation in public service.
    • Recommendation 9CDOs can improve communication about how they demonstrate the value of using data. Where possible, in coordination with the Evaluation Officer and the Statistical Official appointed under the Evidence Act, CDOs should engage in deliberate steps to provide metrics, summaries, and, when possible, evaluations that highlight the impact and cost savings of their efforts. To gain support within their organizations, CDOs need to show their leadership strategically, including valuable accomplishments that improve the ability for other staff to better perform the roles. In so doing, CDOs may help build a more compelling case about the need for resources to create and grow the staff and gain leadership buy-in within their agency. If the CDO is able to show programmatic savings in time and/ or dollars caused by their activities, they establish a base for justifying the use of existing resources and requesting greater resources in the future. Even small wins are vital to building support. CDOs also benefit in helping to manage organizational change, encourage data literacy, and increase the influence of evidence-informed decision making.
    • Recommendation 10: CDOs should conduct regular maturity assessments to accurately gauge existing data capacity and needs. Maturity assessments like the one required by the Evidence Act should be a continuous process rather than simply for compliance. Understanding the day-to-day operational needs for data and data skills will allow CDOs to effectively direct resources and training to areas within their agencies that may be in most need of support. By measuring levels of data literacy, use of data, and other aspects at the project level, CDOs can ensure that they are facilitating the growth of a strong data culture within their agency.
    • Recommendation 11:  The CDO Council should create a permanent data ethics working group to ensure the Data Ethics framework continuously meets emerging needs, to provide resources and guidance to agencies, and to partner with relevant professional associations for ongoing education and training on data ethicsThere is a need for clear, unified guidance from the CDO Council in regard to ethics and equity standards for data. Existing frameworks, such as the Federal Data Strategy ethics framework, provides guidance for developing a single standard going forward, but the CDO Council should collaborate with ethics-focused organizations outside of government to encourage  application of best practices and continuous improvement to those practices.
    • Recommendation 12: The CDO Council should work with CIOs to facilitate the adoption of appropriate modernized technology. Data collection, management, and analysis present unique challenges and needs for technology, such as automation to optimize data collection and tools that can streamline data collection, analysis, and storage. When adopting new technology to help support data functions, we encourage the CDO Council to partner with CIOs and other relevant stakeholders to leverage existing technologies, where possible, in order to avoid “reinventing the wheel.”

    The success of CDOs in the federal government hinges on their ability to perform expected and critical tasks. If they are successful, government data can be an asset, creating a robust data infrastructure that will serve a variety of purposes, including improving operational decision-making and evidence-based policymaking capabilities. While there are challenges, the progress of CDOs over the past year is commendable. We hope to continue a productive working relationship and dialogue with the Council going forward and are happy to respond to any questions you may have regarding these recommendations.

  • November 05, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author: Amanda Hejna, Data Foundation Fellow, and Senior Associate with Grant Thornton Public Sector

    The Advisory Committee on Data for Evidence Building (ACDEB) was formed over a year ago to provide recommendations to the White House Office of Management and Budget (OMB) on how agencies can better use data to build evidence and improve decision making across the federal government and beyond. Composed of data experts from all levels of government and the public sector, the Committee was charged with forming a foundational understanding of the current state of and future needs for the use of data for evidence building and in doing so fulfill the spirit and vision of the Evidence Act. 

    Throughout the first year, the Committee focused particularly on developing a vision and framework for National Secure Data Service that would connect data users at all levels of government and the public and establish a unified evidence-building system across the federal government. At the culmination of Year 1, the Committee presented seven high-priority recommendations to the Director of OMB. These actionable and timely items will contribute directly to ongoing implementation of the Evidence Act and the establishment of a successful National Secure Data Service: 

    • Evidence Act Regulations: Provide additional guidance and regulations under the Evidence Act related to the operations and responsibilities of statistical agencies and implementation of the OPEN Government Data Act. 
    • Chief Statistician of the United States: Designate a full-time Chief Statistician of the United States within OMB.
    • Standard-Setting Procedures: Establish clear procedures for stakeholder engagement on future data standards for data sets government-wide. The importance of data standardization is a multi-faceted topic that includes considerations such as data quality, data definitions, legal frameworks, and reporting requirements, among others.
    • Appropriations Requests: Increase funding requests to support implementation of the Evidence Act and the Federal Data Strategy in the President’s Budget request to Congress in fiscal year 2023.
    • Value-Driven Pilot Program: Establish a pilot program including projects from federal agencies, states, and localities, to demonstrate the value of increased coordination and data sharing across government.
    • Privacy-Preserving Technologies Case Studies: Publish case studies where federal, state, and local governments used privacy-preserving technologies to encourage future, widespread use of these methodologies. This recommendation falls under the purview of the U.S. Chief Statistician in collaboration with the Interagency Council on Statistical Policy.
    • Communication: Develop a communication and education strategy to facilitate the success of a National Secure Data Service. This strategy should be developed by the U.S. Chief Statistician and should consider a wide range of stakeholders including the public, data providers, researchers, and policymakers at all levels of government.

    A number of subcommittees drilled down into specific focus areas and presented additional recommendations to the broader ACDEB. Focus areas included Legislation and Regulations; Governance, Transparency, and Accountability; Technical Infrastructure; Government Data for Evidence Building; and Other Services and Capacity-Building Opportunities. These preliminary recommendations will be integrated into the Committee’s Year 2 agenda as it looks to define the steps needed to fully operationalize the National Secure Data Service. In the next year, the Committee will continue to expand on its success to advance the use of data for evidence building and ultimately produce better results for the American people.

  • October 26, 2021 9:00 AM | Data Coalition Team (Administrator)

    Last week the White House Office of Management and Budget (OMB) released the Federal Data Strategy (FDS) 2021 Action Plan, an interagency effort meant to coordinate and leverage data as a strategic asset across the Federal government. Building upon the FDS 2020 and stakeholder engagement, the newly released strategy places emphasis on workforce development and data leadership within agencies.

    Part of the Executive Branch’s management agenda, the FDS is a 10-year plan to establish best practices for ethical data governance, management, and use. The FDS is an iterative process, with each Action Plan intended to incorporate lessons learned from agencies the prior year, public comments, and takeaways from conversations with data professionals from both government and non-government stakeholders –– such as the forum hosted last year by Data Coalition. 

    OMB identified major successes from Year 1 regarding the formation of agencies’ planning, governance, and data infrastructure foundation. For example, praise for the establishment of the interagency Federal Chief Data Officer (CDO) Council, the creation of a data upskilling pilot, and improvements to data inventories within

    Learning from Year 1’s successes, and identified challenges –– such as the need for more statutory requirements, published guidance on timelines, and additional interagency working groups –– the 2021 FDS lays out 11 action categories of 40 practices for agencies to implement going forward. Year 2 seeks to offer agencies more flexibility in achieving the Action Plan milestones in hope to meet agencies where they are in their foundational activities from FDS 2020. 

    Five out of 11 actions require specific interagency councils to identify pilot projects or government-wide services, highlighting the necessity of collaboration among data leadership. Some Year 2 practices include making public non-classified AI use case inventories, improved linkage and governance of wildfire fuel data, and creation of a data-skills training playbook. The 2021 FDS also reiterates goals from 2020, such as continued assessment of data to answer agency questions as well as maturation of data governance and infrastructure.

    Although the Data Coalition members appreciate the Year 2 strategy’s focus on workforce development and the role of data leadership within agencies, there are still many barriers to the next steps of implementation of improved data practices across the Federal government. On November 9, Data Coalition will be hosting a public forum to discuss key takeaways from the Action Plan, seek feedback to the Federal CDO Council’s recent Request for Information, and gather additional information on how to best assist in a collaborative effort to realize the full benefits of the evidence-informed policy in practice.

  • September 14, 2021 9:00 AM | Data Coalition Team (Administrator)

    Each year, federal agencies provide Congress with funding requests that explain the resources needed to run programs and achieve their missions. These publicly available requests, called congressional budget justifications, are not collected into a structured central repository which makes locating particular budget justifications challenging for congressional offices, federal agencies, White House staff, and the American taxpayer. 

    This bill seeks to provide open and transparent data about how agencies allocate resources, a pillar of accountable government. It will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time.   Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost less than $1 million  per year to implement.


    The Congressional Budget Justification Transparency Act (P.L. 117-40) directs federal agencies to publish more information online about federal spending. Specifically, the bill would require:

    • Information on any funds made available to or expended by a federal agency be posted publicly.
    • Agencies to post their annual congressional budget justifications in a structured data format and in a manner that enables users to download the reports in bulk. 
    • The White House Office of Management of Budget (OMB) to coordinate a publicly-available website with a list of each justification by agency and  fiscal year. 


    Congressional budget justifications (CJs) are documents submitted by Executive Branch agencies to support the annual President’s Budget Request, typically in February. The justifications are intended to be plain-language explanations for how agencies propose to spend funding that they request from congressional appropriators, core priorities and performance goals, and a summary of past performance. 


    Agency budget justifications contain a wealth of information about agency performance and priorities but are published as large, unwieldy documents. Currently, agencies are only required to produce a machine-readable summary table for the budget submission, meaning many data elements and core features of the justification are not captured. 

    The absence of consistent, machine-readable data means the American public, congressional offices, third-party intermediaries, and even OMB staff must manually review and transpose information in the budgets for relevant analysis. Moreover, the lack of a structured database limits the accessibility of detailed budget proposals to those who know how to find them, which in turn limits transparency for the American public and clear opportunities for accountability and oversight. 


    There is no publicly-available, comprehensive list of agencies that must publish CJs. However,  according to a 2019 survey conducted by Demand Progress of 456 agencies, over 20% did not publish any CJs publicly. Only 13 agencies of those surveyed (3%) published their CJs online in both FYs 2018 and 2019. While all 24 Chief Financial Officers Act agencies (i.e., large agencies) were among those who did publish their CJs online, independent agencies were found to be especially difficult to locate, according to the survey. Demand Progress noted in their survey methodology that they found more than 40 alternative document titles. This lack of standards creates confusion, inhibits transparency, and causes roadblocks to those who need access to budget information to support decisions about resource allocation or to fulfill transparency and accountability goals.  


    Open and transparent data about how agencies allocate resources are a pillar that supports an accountable government. This bill will make it possible for Congress and the American public to better understand what their government is allocating resources to and to provide capabilities to analyze how budget proposals, appropriations, and budget execution have changed over time.   Relative to the federal government’s $4 trillion budget, the proposed legislation is a low-cost activity, estimated by the Congressional Budget Office to cost $500,000 per year to implement.


    Staff across federal agencies, congressional offices, and even the White House budget office spend countless hours searching, collating, and repurposing content for budget formulation activities each year. Part of this exercise often requires agency staff to review old congressional justification materials to identify historical funding trends. By simply adjusting how information is published, staff supporting budget formulation and execution across agencies and branches of government will be able to more efficiently and accurately portray budgetary information to support decision-making on resource allocations. The same is true for reviewing and applying agency performance measures to promote effective performance management in the budget formulation and execution processes. 


    OMB coordinates the federal budget formulation and execution processes. After annual budgets are developed and proposed funding levels agreed to within the Executive Branch, agencies are required to submit congressional justification materials for review and clearance by OMB staff. This requirement, established in OMB Circular A-11, dictates that agency justification materials align with the formal President’s Budget Request published annually by OMB. 

    OMB also requires agencies to publish justifications at a vanity URL ( following transmittal to Congress, unless exempted for national security purposes. However, while OMB publishes top-line budgetary information in the President’s Budget Request volumes, OMB does not provide a consolidated database or repository for agency justifications. OMB already publishes many other budget documents on a central website, and adding the CBJs to that site would be a useful resource for Congress, agency staff, journalists, watchdogs, and the general public.


    S. 272  passed the Senate in June 2020 and the House in August 2021. It is expected to be signed by the president in the coming days.


    Both the House and Senate versions have a bipartisan set of sponsors. U.S. Representatives Mike Quigley (D-IL) and James Comer (R-KY) ) in the House, and Sens. Thomas Carper (D-DE) and Rob Portman (R-OH) in the Senate.


    Campaign for Accountability

    Data Coalition 

    Demand Progress 


    Government Information Watch 

    National Taxpayers Union

    Open The Government 

    Protect Democracy 

    R Street Institute Senior 

    Executives Association Society of Professional Journalists 

    Taxpayers for Common Sense

    Union of Concerned Scientists

  • September 07, 2021 9:00 AM | Data Coalition Team (Administrator)

    It’s no secret that the government collects a trove of data from the American people – estimated to cost $140 billion each year. But the value of that information is much higher, if it can be successfully and securely applied to make decisions about policies that improve lives and the economy. 

    Four years ago, the 15-member U.S. Commission on Evidence-Based Policymaking issued its final report to Congress and the President. Since then, while the world has changed drastically, the vision from the Evidence Commission is more relevant than ever: to enable the use of data in our society to solve real problems for the American people.  

    The Evidence Commission accomplished its mission with just 18-months to learn about the nature of the country’s data challenges, study the contours of potential solutions, and reach a bipartisan agreement on salient, timely recommendations. It is already a major success story to be emulated by future government commissions, and the impact is still ongoing. 

    During a press conference on Sept. 7, 2017 releasing the final Evidence Commission recommendations, then-Speaker Paul Ryan and Senator Patty Murray stood side-by-side to applaud the realistic, practical solutions offered by the commission members. Speaker Ryan said: “it’s time to agree where we agree.” And in that spirit, days later, Speaker Ryan and Sen. Murray jointly filed the monumental Foundations for Evidence-Based Policymaking Act (Evidence Act). 

    Enacted 16-months after the commission’s report with overwhelmingly bipartisan support in Congress, the Evidence Act was the most significant government-wide reform to the national data infrastructure in a generation. The Evidence Act created chief data officers and evaluation officers in federal agencies, established processes for planning data priorities and research needs, required government data to be open by default, and enabled new data sharing capabilities within one of the world’s strongest privacy-protective frameworks. In short, the legal authority of the Evidence Act was a game changer for how our government responsibly manages and uses data. The work to implement that law is now ongoing across federal agencies.

    The Evidence Act also has tremendous implications for state and local governments, federal grantees, researchers, and even allies on the international stage. The law positions the United States as a clear leader in the dialogue about producing useful evidence for decision-making, while also shifting the discourse about the role of data infrastructure in supporting basic program administration. 

    What’s possible today that was not four years ago? A lot. Take for example the recent efforts to improve talent in the federal government by aligning roles under the chief data officers and evaluation community. Agencies like the US Department of Agriculture are launching new enterprise data capabilities to understand what data they have and use it. Coordination across new data leaders is producing new innovations for the government, like the use of natural language processing to accelerate the review of comments on federal rules. Real dialogue is now underway to break down the barriers and silos of data within agencies, and promote more public access. A new portal for researchers to have a one-stop-shop for applying to access restricted data is under development. New pilot projects of privacy-preserving technologies are underway as public-private partnerships. All of these activities will lead to greater capacity to use data and, therefore, better information to solve the government’s most wicked problems. 

    While real progress is being made, there are other areas ripe for attention from leaders at the White House where implementation of the Evidence Act has lagged. Here are two examples:

    • Presumption of Accessibility Regulation — A key recommendation from the Evidence Commission included in the new law was to assume that data are sharable unless prohibited by law or regulation. This presumption of accessibility requires the White House Office of Management and Budget (OMB) to first take a regulatory action, which has disappointingly not yet even been published in a draft form for public feedback.
    • Guidance on New Open Data Requirements – The Evidence Act’s requirement that agencies make more data accessible and open is also paired with new transparency requirements about agencies inventorying data and publishing information about key contents of datasets. These nuanced activities require OMB to also issue guidance to agencies to facilitate consistency across federal agencies as well as prioritizing which high-value data should be made first.

    The Evidence Act was a starting point, but there is still yet more work underway to implement the Evidence Commission’s recommendations. Earlier this year, Rep. Don Beyer filed the National Secure Data Service Act as a strategy to take many of the commission’s remaining recommendations for a new infrastructure capable of securely combining data, creating a pathway for implementation. That bill quickly passed the U.S. House with strong bipartisan support and is now awaiting further action in the Senate. In parallel, the new Advisory Committee on Data for Evidence Building continues to study the challenges identified by the commission and is devising recommendations that will also further address the Evidence Commission’s work. 

    While much progress has been made based on the commission’s advice, there is still a long path ahead in the United States to implement effectively and ensure the remaining recommendations come to fruition. Importantly, the Evidence Commission is itself an example for how to develop and use evidence in policy making. Fortunately, because of the commission members’ diligent service to the country and the leadership from Speaker Ryan, Sen. Murray, Rep. Beyer and others, the country is well on its way to realizing the promise of evidence-based policymaking.

  • August 24, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation

    On the first day of their Administration, the Biden-Harris team issued an Executive Order on Advancing Racial Equity and Support for Underserved Communities Through the Federal Government (Executive Order 13985). The executive order was issued to promote and protect equitable policies and data in the Federal Government. These efforts supported the inclusion of marginalized groups in Federal research and analysis, the improvement of equitable policies, and to provide each person with the opportunity to reach their full potential.   

    In order to ensure the implementation of the program, the White House Domestic Policy Council (DPC) is “directed to coordinate the efforts to embed equity principles, policies, and approaches across the Federal Government.” This includes efforts to remove systemic barriers, develop policies to advance equity, and encourage communication between the National Security Council and the National Economic Council. As noted in the EO, it is the responsibility of the Office of Management and Budget (OMB) to analyze and “assess whether agency policies create or exacerbate barriers to full and equal participation by all eligible individuals.” This responsibility is key to identifying and quantifying the challenges toward equity. 

    The Executive Order recognized the important role of disaggregating data, or data that has been broken down by detailed sub-categories, such as race, ethnicity, gender, disability, income, veteran status, and other key demographic variables, by creating the Equitable Data Working Group. The Working Group has been tasked with “identifying inadequacies in existing Federal data collection infrastructure and laying out a strategy for improving equitable data practices in the Federal government.” This is accomplished through the collection of new data or through the combination of multiple data sources in order to fill the data gaps that make assessments of equity difficult, which in turn supports evidence-based policies within the Federal government and state and local governments through vertical policy diffusion. “By exploring key policy questions dependent upon underutilized, inaccessible, or missing data, the Equitable Data Working Group explores ways to leverage government data in order to measure and promote equity.” 

    Despite overwhelming positives in exposing gaps of data, the Group recognizes there are possible unintended consequences when considering privacy and the vulnerability of underserved populations. With this in mind, aggregating data into summary data can help understand broad trends within these communities without disseminating personal data. For example, the National Crime Victimization Survey (NCVS) collects data on self-reported accounts of criminal victimization. The NCVS produces reports that break down victimization data by race, ethnicity, gender, age, marital status, and income. However, once the data is able to be separated by race, researchers and analysts can provide summary statistics and better insights into disparities, without exposing personal identifiers. This protects the privacy of those who have been surveyed while still leveraging the data collected, while helping us answer important policy questions about crime. 

    The Data Coalition Initiative will be looking for how the Working Group is approaching these issues when its first report is provided to Ambassador Susan Rice, Assistant to the President for Domestic Policy, this fall, which will identify and discuss the barriers and gaps of equitable data identified through case studies, along with recommendations on how to address these problems.

    The Working Group report will also include a plan to foster new partnerships among Federal agencies, academic and research partners, state, local, and tribal governments, community and advocacy groups, and other stakeholders, in order to leverage Federal data for new insights on the effects of structurally biased policies, and to advance capacity for multilayered, intersectional analysis of Federal datasets. The Data Coalition is looking forward to the chance to engage with the Working Group on its efforts, and will continue to provide updates as their important work progresses. 

  • July 28, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation

    The nation is preparing to send its children back to school this fall, but there will be many questions about the  on-going impacts the pandemic has on our children, both in the short and long term. While there are a great many strengths of our country’s educational infrastructure, the data infrastructure applied to improving learning and the workforce continues to face substantial gaps. In order to understand, adapt to, and mitigate the impact of the pandemic, we must ensure that there is a robust data infrastructure. One way to ensure there is timely useful data about our learners and workers is to provide significant and sustained funding for the Statewide Longitudinal Data Systems (SLDS). 

    SLDS is a Federal government program that allows and provides access to historical data on public-school enrolled students and teachers starting from the 2006-2007 school year. The SLDS system was designed to improve data-driven decisions impacting student learning and education. It focuses on the connection among PreK, K-12, postsecondary, and workforce education data. School districts, public schools, and teachers can access the data system via their district’s Student Information Systems (SIS). It is accessible through a free application that is available to eligible state grant recipients, such as school districts, schools, and teachers. This data includes assessment scores, daily attendance, enrollment, courses, and grades. In its most advantageous state, it enables grantees to link individual level data from Pre-K to the labor market.

    The SLDS plays a significant role in creating data-driven policies. While the information is collected and stored, the grant program also provides more accessible data in order to get a better understanding of a policy’s impact on student learning. Moreover, it encourages policy efficiency and equity by quantifying educational measurements over time. Data-driven systems such as SLDS provide transparency about which policies affect students and the significance of their impact.

    The SLDS has meaningful benefits, although there are also challenges when implementing a data-driven program. States have been able to put this data to work to better support students on pathways to the workforce. Currently, every state, the District of Columbia and Puerto Rico has an SLDS that connects data between some data systems, but few can connect early education, K-12, postsecondary, and workforce. This makes it challenging to study and evaluate programs intended to improve outcomes in college and the workforce. As states and federal programs strive to boost education attainment and close the skill gaps in the workforce, it is vital that our country has the ability to produce rigorous analyses based on high-quality data.

    New, sustained investment in SLDS data can provide the important information to answer the critical questions policymakers, educators, as well as parents and students. This will require a significant, multi-year investment of $1 billion. This funding should focus on modernizing SLDS data systems to build more interoperable and accessible data platforms with privacy-preserving technology as well as building capacity to use SLDS data through state research-practice partnerships that bring both real-time learning and longitudinal data, as well as diversify representation of practitioners. Finally, funding should be directed to ensuring robust governance and accountability structures are put into place to ensure these systems transparently address the real priorities, needs and community expectations. 

    Not only is this funding necessary to improve the data infrastructure to meet the needs of learners and workers, it is necessary to make this a sustained funding level, so that these systems have the resources to evolve to meet ever changing research needs and privacy protection. 

    Sustained and continued financial investment in the SLDS program would help ensure data-driven success and proper-use of the data. An increase in funding will help provide the much needed update to the data infrastructure necessary to advance evidence based policymaking, and modernize privacy protection. Providing this funding for SLDS is smart investment that ensures we will have the evidence and data to provide the best outcomes for our students.

  • July 07, 2021 9:00 AM | Data Coalition Team (Administrator)

    Author Austin Hepburn, Research and Policy Intern, Data Foundation 

    Crime data – which includes data on types of crime, demographics of victims and perpetrators, corrections, recidivism and reentry and court information – is crucial evidence that is used to inform policy decisions in all jurisdictions. In the United States, national crime data is aggregated at the federal level, by the Department of Justice’s Bureau of Justice Statistics (BJS) and Federal Bureau of Investigations (FBI). Reliable and up-to-date criminal justice statistics are imperative in order for policymakers to make evidence-based decisions. However, as questions around policing and criminal justice become ever more pressing, it is worth exploring the challenges and limitations of crime data so that we may identify opportunities to improve both the data, and the policy decisions informed by the data.

    The National Crime Victimization Survey (NCVS) is the nation’s leading source for statistical information on victims of crime. Victim, offender and crime characteristics are reported along with reasons for reporting or not reporting the crime. However, there are serious limitations to the NCVS. The survey is self-reported by the victim and is not recorded when a crime is committed nor when there is victim to a crime. A sample of households in the United States are taken every 3.5 years and households are interviewed 7 times within that span. Only 71% of households sampled responded to the survey in 2019. Since the survey is only a sample, it does not capture variation in victimization patterns at the local or state level. Therefore, victimization patterns at the city level would require additional research. Additionally, data on the effectiveness of policing practices when addressing crime and victimization is lacking from NCVS reports.

    An example of the type of effective additional research is the Data Foundation’s Policing in America Project a multi-pronged, open data effort to systematically improve evidence about how the American people view the criminal justice system and police forces.  The project focuses on the value of building data capabilities to enable a more robust understanding of the relationship between perceptions of law enforcement agencies and the conditions in select cities, including disparate perceptions by sub-populations. 

    In addition to the NCVS, which relies on traditional surveys of victims, a good deal of crime data is reported to the Department of Justice by local law enforcement. The FBI’s Uniform Crime Report (UCR) has been used to provide crime statistics since 1930. The BJS, the primary statistical agency of the Department of Justice, uses UCR data in their publications and datasets. BJS has long been trusted to publish up-to-date and accurate information, utilized by academia and professionals for criminal justice reports and open access data. The data from BJS and the UCR includes local, state, and national level data on corrections, courts, crime, the Federal justice system, forensic science, law enforcement, recidivism and reentry, tribal crime, and victims of crime. The data is reported by local law enforcement agencies to form a national database of criminal justice statistics. In practice, this has led to incomplete and non-standard reporting to the FBI. Local jurisdictions may have different definitions of crime that can make uniform crime reporting difficult. There may be lags in reporting for local agencies, as well as incomplete data.  

    One challenge comes from inconsistent reporting from local law enforcement agencies (LEAs) which can make arrests difficult to calculate. Reporting data is voluntary, so LEAs may not always report the same data every year. But UCR only uses data from these voluntary reports. The procedure to calculate the aggregated national county and state arrest rates does not take into account the population covered by the UCR. Due to the variable population in UCR coverage each year, this would have a significant effect on the arrest rates. This proposes serious problems in analyzing national time series (over time) trends. Perhaps the main limitation on UCR data, however, is the difference between actual and reported crime. 

    In addition to inconsistent reporting, the data that is reported is not standardized. Some states may have differing definitions of crime, as well as wholly different crimes on the books. One example would be some states, such as Minnesota, have a 3rd degree murder charge, whereas other states would classify that as manslaughter. Similar challenges exist with hate crime statutes, which may be vastly different, include different demographic information, or may not be a part of the criminal penal code. 

    Timeliness of data is also incredibly important for informing policy, but there are significant lags in crime data. As of June 2021, aggregated arrest data was last reported in 2016, half a decade to date. Despite the availability of raw arrest data on the FBI’s Crime Data Explorer up to 2019, the Bureau of Justice Statistics has not reported the arrest figures. This means that any policy decisions based on crime data are based on data that is missing the most timely insights. 

    And finally, data needs to be usable by the public, academic researchers, and policymakers. This means that it needs to be published in an accessible format. Crime data has some significant challenges in this respect. But there are tools to help. The BJS Arrest Data Analysis Tool allows researchers to find national estimates and/or agency-level counts of crime. This data is sourced from the FBI’s Uniform Crime Report. The tool is significant in that viewers are easily able to generate arrest figures at the national and local level without needing the data science background required for raw data processing. While some of these challenges are unique to the crime data, many of these challenges exist in the data infrastructure throughout the country. Many initiatives are being undertaken in order to help address these problems, and optimize data for evidence-based policymaking. 

    The Foundations for Evidence-Based Policymaking Act of 2018 became law on January 14th, 2019. The bill requires available agency data that is accessible and reports that utilize statistical evidence to support policymaking. Annually, agencies must craft a learning agenda to address policy concerns to the Office of Management and Budget (OMB). This is an opportunity for the Department of Justice to identify and address what needs to be improved and work with stakeholders to ensure that the necessary improvements can be made. This includes an Open Data Plan that must detail how each respective agency plans to make their data open to the public. 

    Investment in crime and policing data has been modest, preventing meaningful updates to data collection and modernization. Additional funding for criminal justice data collection and reporting is recommended. Increasing local law enforcement training on reporting and correctly classifying all crimes to the FBI can help increase accuracy and reliability. In addition, increased efforts for interagency collaboration between local law enforcement agencies and the FBI can provide more accurate aggregate data. With these common-sense improvements, crime data can be more effective in helping craft evidence based policymaking. 

  • May 13, 2021 9:00 AM | Data Coalition Team (Administrator)

    The emerging need to securely share, link, and use information collected by different government agencies and entities is challenged today based on administrative, legal, and operational hurdles. The National Secure Data Service Act (H.R. 3133), sponsored by Rep. Don Beyer (D-VA), seeks to implement a demonstration project for a data service that could rapidly address policy questions and reduce unintended burdens for data sharing, while aligning with design principles and concepts presented in recommendations from data and privacy experts. The proposal specifically cites an effort to support full implementation of recommendations made by the bipartisan U.S. Commission on Evidence-Based Policymaking for data linkage and access infrastructure. 

    Why is the National Secure Data Service Act necessary? 

    The federal government’s data infrastructure is largely decentralized. Individual agencies and programs may collect data without sharing or using information already collected by other parts of government. This imposes undue burdens on the American public and businesses through repeated reporting of information the government already has. Creating a capacity to securely share information while protecting confidentiality and deploying other privacy safeguards offers tremendous potential for developing new insights and knowledge to support statistical analysis and summary-level information relevant for evidence-based policymaking and practice. 

    The National Secure Data Service Act builds on the bipartisan and unanimous recommendations from the U.S. Commission on Evidence-Based Policymaking from 2017, a consensus proposal from the National Academies of Sciences, Engineering and Medicine in 2017, and a suggested roadmap published by the Data Foundation in 2020. The proposed legislation creates an expectation for the National Science Foundation to make rapid progress in launching a data service and transparently supporting government-wide evidence-building activities. 

    How will a National Secure Data Service protect privacy?

    Under the proposed legislation, the data service at NSF must adhere to federal privacy laws, including the Confidential Information Protection and Statistical Efficiency Act of 2018 (CIPSEA). This law was reauthorized by Congress with bipartisan approval in 2018, establishing one of the strongest government privacy laws in the world, including strong criminal and civil penalties for misuse. The proposed data service can only operate using the CIPSEA authority and in compliance with the Privacy Act of 1974. The data service will also provide information to Congress about specific policies and practices deployed for protecting data. 

    Will the American public have knowledge about projects conducted at the National Secure Data Service?

    Yes. Consistent with principles about transparency specified by experts from the Evidence Commission, National Academies panel, and the Data Foundation, the proposed legislation specifically directs NSF to publish information about activities that are underway. In addition, Congress will receive a report on all projects, expected to include information about the costs and benefits of each. 

    How does the proposed legislation relate to the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act)?

    The National Secure Data Service builds on existing capabilities and authorities established in the Evidence Act, while also providing a resource for federal agencies, researchers, and data analysts to responsibly produce insights that can address questions in agency evidence-building plans (learning agendas). When Congress approved the Evidence Act, the creation of an advisory committee was intended to signal Congress’ continued interest in establishing a data service and provide relevant information to support implementation of next steps within two years of enactment. Now, more than two years after enactment of the Evidence Act the advisory committee continues to meet and consider technical implementation details. The proposed legislation sets up the formal authorization of a data service to continue this momentum. 

    Does the National Secure Data Service Act supersede advice expected in 2021 and 2022 from the Federal Advisory Committee on Data for Evidence Building?

    No. The Federal Advisory Committee on Data for Evidence Building is a collection of nearly 30 experts considering a range of topics related to data linkage and use. Nothing in the proposed legislation restricts the ability of the advisory committee to offer OMB recommendations, as required by its charge in the Evidence Act. Instead the legislation specifically encourages NSF to consider practices and recommendations from the advisory committee as part of its administrative implementation efforts. The role of the advisory committee is also likely increasingly influential in supporting tangible implementation of activities at NSF under the proposed legislation. 

    Will a National Secure Data Service displace existing data linkage activities in the Federal Statistical System?

    No. The data service is designed to supplement rather than displace any existing, successful, and sufficiently secure data linkage arrangements. Statistical agencies engaged in production-level data collection, sharing, and publication for the development of federal statistical indicators will receive additional capabilities from the National Secure Data Service but could retain existing practices. 

    Is the National Science Foundation the right agency to operate a data service?

    In 2020, the Data Foundation published a white paper establishing a framework for considering where to operate a data service in government that can meet broad needs government-wide and from across the research and evaluation communities. After exploring the range of potential options, the authors recommended the National Science Foundation given its ability to deploy the strong privacy authorities under CIPSEA, existing expertise in social sciences and computer science, the presence of one of the existing federal statistical agencies with expertise in confidentiality protections and data linkage, and NSF’s close connections and existing relationships with the research community.  

    The text of the National Secure Data Service Act provides NSF flexibility to determine how to implement a data service, including the possibility of issuing a contract through a Federally-Funded Research and Development Center, as recommended in the Data Foundation white paper. This recommendation was presented to the Federal Advisory Committee on Data for Evidence Building in April 2021 and to the NSF Social, Behavioral and Economics Sciences Advisory Committee in May 2021, receiving favorable perspectives and comments from respective committee members. 

    How much will implementation of a National Secure Data Service cost?

    Precise implementation costs will vary based on the level of services and activities applied at a data service. In 2021, the Data Coalition recommended that a National Secure Data Service receive an initial annual appropriation of $50 million to support development and launch of core linkage capabilities, privacy protective features, and necessary disclosure avoidance protocols, among other features. 

    Has the Data Coalition taken a position on the National Secure Data Service Act?

    In 2020, the Data Coalition called on Congress to authorize a data service to support pandemic response activities, then later reiterated support following publication of the Data Foundation white paper. supporting the recommendations from the Data Foundation’s white paper. 

    Representing the broad membership and interests from the data community, the Data Coalition endorsed the National Secure Data Service Act filed on May 13, 2021. The Data Coalition has also encouraged administrative actions to make progress on the establishment and launch of a data service, including NSF’s recent activities on America’s DataHub. 

    Does NSF support the legislative proposal? 

    The Administration has not formally weighed in on the proposal with a Statement of Administration Policy, however, NSF did provide technical feedback on a draft of the legislative text. 

    Last edited May 13, 2021

  • May 11, 2021 9:00 AM | Data Coalition Team (Administrator)

    This month our RegTech 2021 series, continued by examining government uses of artificial intelligence (AI). Just last year, Congress passed legislation encouraging the government to move from pilots and demonstration projects to scaled up, applied solutions. The discussion featured two fireside chats with government leaders: Henry Kautz, Division Director, Information & Intelligent Systems, National Science Foundation (NSF) and Mike Willis, Associate Director in the Division of Economic and Risk Analysis, Securities and Exchange Commission (SEC). 

    First Director Kautz discussed the work in  AI at NSF, as the agency seeks to fund larger, more interdisciplinary projects. Lately, the agency has been focused on establishing AI Institutes, virtual centers organized around themes, connecting colleges and universities to partners in the private and public sector. Themes include AI-Augmented Learning, AI-Driven Innovation in Agriculture and the Food System, and Advanced Cyberinfrastructure. Director Kautz emphasized the importance of NSF’s role in supporting foundational, pre-competitive research and development in these private-public partnerships. 

    When thinking about what challenges the government is facing, he recommends that agencies consider improving coordination among themselves on how to best make use of AI internally. He pointed out the success of coordinating bodies like the Joint Artificial Intelligence Center at the Department of Defense, but encourages the government to think more broadly about  the big questions facing the government.  Additional suggestions to scale up AI include building up AI expertise within the government, especially at the managerial level, being sensitive to and aware of AI skepticism, and rethinking traditional procurement practices. He also emphasized the need for explainability and transparency in ensuring ethical uses of AI and conceptualizing data as infrastructure. 

    In the next fireside chat, Preethy Prakash, Vice President of Business Development at eBrevia, spoke with Mike Willis from the SEC. Willis, speaking for himself, spoke of the SEC’s steps to make registrant disclosures more accessible and usable, after noticing well over 90% of EDGAR visitors are machines. 

    Even though these data sets are highly desired by the public and outside uses of AI, the role of AI within the SEC today is largely focused on the enhancement of the effectiveness of staff analytical procedures, including those related to risk assessments, identifying potential areas for further investigations for activities like insider trading, comment letter analysis, and entity mappings.  

    When asked how to think about creating quality data that is interoperable, Willis pointed directly to the Evidence Act which defines the term “open government data asset” based upon an underlying open standard. “Leveraging industry and market standards, I think, are a very useful way to drive down compliance costs, while streamlining the validation and analysis of the data, including for AI and ML purposes,” Willis stated. He went on to note how these open standards are a great example of public-private partnerships discussed previously.

    As the SEC continues to implement AI, Willis outlined some change management considerations. His recommendations were to ensure that you have talented, qualified professionals, help people understand the problems and processes that AI can help supplement, provide use cases and examples, ensure that your AI solution stays within its scope, and finally, echoes Director Krautz’s call to consider data as infrastructure, meaning it be standardized and structured. 

    The whole conversation is available here. To learn more about our RegTech Series, sponsored by Donnelley Financial Solutions (DFIN), visit our webpage.



Powered by Wild Apricot Membership Software