Jupyter4NFDI Survey Summary

Author

Julian Kohne

Published

March 12, 2025

Jupyter4NFDI Survey

The Jupyter4NFDI Survey was designed and conducted to gauge the current state of Jupyter usage within the NFDI consortia and to gather feedback on the Jupyter4NFDI project. In total, 75 people from 53 German research institutions participated in the user survey between Nov. 28th 2024 and Jan. 6th 2025. This report provides an overview of the survey results, with respect to the participants, their current usage of Jupyter, and their expectations from the Jupyter4NFDI project. The results contribute to developing the Jupyter4NFDI infrastructures in accordance with the requirements and expectations of the NFDI community. They will also enable the Jupyter4NFDI consortium to consolidate efforts and resources within the NFDI and, if applicable, outside collaborators.

For any questions, feedback, or comments, please contact the Jupyter4NFDI consortium here

To check out the detailed results, you can just click on the Binder button below. This will open an interactive Jupyter Notebook where you can explore the data in more detail. We have provided the code for you to run the descriptive analysis but you can also explore the data yourself!

Run Notebook


Questionnaire

Below you can find a diagram that showcases the questions asked in the survey and the branching of the survey based on participants responses.

flowchart TD
    A1[Name of your organisation?] --> A2[Which NFDI consortium are you connected to?]
    A2[Which NFDI consortium are you connected to?] --> A3[Describe your role in the NFDI consortia]
    A3[Describe your role in the NFDI consortia] --> A4[Do you already know Jupyter4NFDI?]
    A4 --> A6[What do you expect from Jupyter4NFDI?]
    A6 --> A7[Where, and how you are using Jupyter?]
    A7 --> A8[Select the appropriate branch that describes your activity.]
    A8 --> B4[Other]
    B4 --> F1
    A8 --> B1[User]
    A8 --> B2[Representative/Manager]
    A8 --> B3[Resource Provider]
    B3 --> D1[Are you responsible for the operation/management of an infrastructure and/or service within the NFDI consortia?]
    D1 --> D2[Does your institute/center have its own IaaS cloud/cluster?]
    D2 -->|Yes| D3[On which basis do you run your cloud/cluster or shared resources?]
    D2 -->|No| D4[Do you already operate a JupyterHub or similar service?]
    D3 --> D4
    D4 -->|JupyterHub| D6[Can you provide a link to this service?]
    D4 -->|Similar Service| D5[Which similar service do you use instead?]
    D6 --> D7[Which spawner do you use?]
    D4 -->|No| D8[Are you willing to connect shared resources to the central JupyterHub?]
    D5 --> D8
    D7 --> D8
    D8 --> |Yes| D11[How many shared resources would you offer - CPUs, RAM, GPUs, storage?]
    D8 --> |No| G1
    D14 --> G1
    D8 --> |Only specified Users| D9[Which users would be eligible to use your resources?]
    D9 --> D10[Can you specify these user group within the NFDI?]
    D10 --> D12[Are there any specific policies attached to the usage? Please provide links, if available.]
    D12 --> D13[What benefits do you expect from connecting your resources?]
    D11 --> D12
    B2 --> E1[Do you know about Jupyter?]
    E1 --> E2[How large is the group which you represent? Please indicate the number of people.]
    E2 --> E3[What benefits do you expect from a centralised NFDI Jupyter service for the group that you are representing?]
    E3 --> E4[Do you have dedicated partners offering Jupyter services?]
    E4 -->|Yes|E5[Can you say which institution and/or person is responsible for the operation?]
    E4 -->|No|G1
    E4 -->|I don't know|G1
    B1 --> F1[For what purpose do you use Jupyter?]
    F1 --> F2[How are you currently using Jupyter services?]
    F2 -->|Yes| F3[Who's the provider of the JupyterHub?]
    F2 --> F4[What are your resource requirements?]
    F3 --> F4
    F4 --> F5[What are your environment requirements?]
    F5 --> F5b[Do you have any other resource or environment requirements?]
    F5b --> F6[Do you require access to data outside of the notebook?]
    F6 -->|Yes| F7[Do you need write access to shared data?]
    F6 -->|No| F9
    F6 -->|I don't know| F9
    F7 --> F8[Which other external data sources do you need?]
    F8 --> F9[Would you like to offer software or services through the platform?]
    F9 -->|Yes| F10[Can you briefly elaborate which software or services you would offer via the platform?]
    F9 -->|No| F11[Do you know about Binder?]
    F10 --> F11
    F11 --> F12[Do you need reproducibility from Git or data repo - binder-like functionality / FAIR digital objects]
    F12 --> F13[Do you know about JupyterLite?]
    F13 --> F14[Do you know about Google Colab?]
    F14 --> F15[What advantages would you expect from using the Jupyter4NFDI service compared to the services mentioned before - Binder, JupyterLite, and Google Colab?]
    F15 --> F16[What do you think might be missing in the Jupyter4NFDI service?]
    F16 --> F17[One can run various backends behind a JupyterHub proxy. What other services would you be interested in?]
    F17 --> G1[Can we contact you in the future if we have further questions or would like to send you more information?]
    E5 --> G1
    D13 --> G1
    G1 -->|Yes|G2[What is your first name?]
    G2 --> G3[What is your last name?]
    G3 --> G4[What is your email address?]
    G4--> G5[Would you be willing to participate in a user study?]
    G5--> G6[Would you like to provide additional relevant information to the Jupyter4NFDI service team that was not asked in the survey?]
    
%% Styling Nodes with fill colors

%% Nodes in #cce7f9: B1, B4, and F1 to F17
style B1 fill:#cce7f9,stroke:#333,stroke-width:1px
style B4 fill:#cce7f9,stroke:#333,stroke-width:1px
style F1 fill:#cce7f9,stroke:#333,stroke-width:1px
style F2 fill:#cce7f9,stroke:#333,stroke-width:1px
style F3 fill:#cce7f9,stroke:#333,stroke-width:1px
style F4 fill:#cce7f9,stroke:#333,stroke-width:1px
style F5 fill:#cce7f9,stroke:#333,stroke-width:1px
style F5b fill:#cce7f9,stroke:#333,stroke-width:1px
style F6 fill:#cce7f9,stroke:#333,stroke-width:1px
style F7 fill:#cce7f9,stroke:#333,stroke-width:1px
style F8 fill:#cce7f9,stroke:#333,stroke-width:1px
style F9 fill:#cce7f9,stroke:#333,stroke-width:1px
style F10 fill:#cce7f9,stroke:#333,stroke-width:1px
style F11 fill:#cce7f9,stroke:#333,stroke-width:1px
style F12 fill:#cce7f9,stroke:#333,stroke-width:1px
style F13 fill:#cce7f9,stroke:#333,stroke-width:1px
style F14 fill:#cce7f9,stroke:#333,stroke-width:1px
style F15 fill:#cce7f9,stroke:#333,stroke-width:1px
style F16 fill:#cce7f9,stroke:#333,stroke-width:1px
style F17 fill:#cce7f9,stroke:#333,stroke-width:1px

%% Nodes in #ffe8c4: B2, E1, E2, E3, E4, E5
style B2 fill:#ffe8c4,stroke:#333,stroke-width:1px
style E1 fill:#ffe8c4,stroke:#333,stroke-width:1px
style E2 fill:#ffe8c4,stroke:#333,stroke-width:1px
style E3 fill:#ffe8c4,stroke:#333,stroke-width:1px
style E4 fill:#ffe8c4,stroke:#333,stroke-width:1px
style E5 fill:#ffe8c4,stroke:#333,stroke-width:1px

%% Nodes in #ccecd8: B3, D1 to D13
style B3 fill:#ccecd8,stroke:#333,stroke-width:1px
style D1 fill:#ccecd8,stroke:#333,stroke-width:1px
style D2 fill:#ccecd8,stroke:#333,stroke-width:1px
style D3 fill:#ccecd8,stroke:#333,stroke-width:1px
style D4 fill:#ccecd8,stroke:#333,stroke-width:1px
style D5 fill:#ccecd8,stroke:#333,stroke-width:1px
style D6 fill:#ccecd8,stroke:#333,stroke-width:1px
style D7 fill:#ccecd8,stroke:#333,stroke-width:1px
style D8 fill:#ccecd8,stroke:#333,stroke-width:1px
style D9 fill:#ccecd8,stroke:#333,stroke-width:1px
style D10 fill:#ccecd8,stroke:#333,stroke-width:1px
style D11 fill:#ccecd8,stroke:#333,stroke-width:1px
style D12 fill:#ccecd8,stroke:#333,stroke-width:1px
style D13 fill:#ccecd8,stroke:#333,stroke-width:1px

Participants

In total, 75 people from 53 German research institutions participated in the user survey between Nov. 28th 2024 and Jan. 6th 2025. Participants institutions were mostly universities and research institutions all across Germany.


In terms of NFDI consortia, survey participants came from 27 different consortia but mostly from consortia with a focus on natural sciences and computational methods.

Within their repsective consortia, participants fulfill various roles, but mostly management and leadership roles, followed by IT- and infrastructure development, and roles as researchers and scientists.

Interestingly, only about half of all participants already know Jupyter4NFDI, despite their backgrounds in quantitative research and computational methods and their membership in other NFDI consortia.


In terms of expectations, participants were mostly interested in easy access, a single entry point and training.

Value
expect_easy_access 53
expect_single_entrypoint 40
expect_training 28
expect_persistant_storage 27
expect_easy_FAIR_objects 26
expect_tech_support 24
expect_X_consortium_collab 22
expect_dont_know 9


With respect to Jupyter usage, most participants classified themselves as users of Jupyter, followed by the role of resource provider and representative/manager.

Var1 Freq
Other 3
Representative/Manager 10
Resource Provider 17
User 38
NA 7



Branch Results

Important

Depending on their role for using Jupyter, the survey was branched with different participants answering different questions tailored to their roles. The following tabsets thus only describe the respective subsets of participants that were shown the questions correspdoning to their role.


Survey participants in the user role where predominantly using Jupyter whenever possible or specifically for workshops and training.




Most of the participants in the user role were using Jupyter through Jupyterlite or through Google Colab.




Participant in the user role using Jupyter through external providers were using a variety of different providers.


For most participants in the user role, all asked for requirement are important:

  • Concurrent Session Average
  • Concurrent Session Peak
  • User Number
  • GPUs per Session
  • CPUs per Session
  • RAM per Session
  • Persistent Storage per Session





Similarly, all asked for environment requirement were important for most users:

  • Lab extensions
  • Software
  • Licenses
  • Custom Images





Most participants in the user role indicate to need write and read access to external data sources in the jupyter notebooks






Among the users, the majority is not sure yet whether they want to offer software and services through jupyter in the future. Among those that do know, about half plan to do so while half plan not to do so.


Among Jupyter users in our sample, most have never head of binder



The majority of the sampled users indicates to need reproducibility from Git or similar systems in their notebooks. Some users are still unsure and only very few outright say that they do not need these features.




The majority of users in the sample indicate that they have never heard of jupyter lite but already use Google Colab.





With respect to other backends running on a JupyterHub proxy, most surveyed users are neither interested in in RStudio Server nor in VScode.





Participants in the resource provioder role were mostly not personally responsible for infrastructures and services




About 50% of participants in the resource provider role had their own IaaS cluster at their institutions.



Of those participants that do have their own IaaS cloud, most of them run them in a bare matal configuration or on Kubernetes.



Most participants in the resource provider role already operate a JupyterHub, following by not using a JupyterHub or similar service. Only two participants indicated to operate a similar service that is not JupyterHub.




For those participants that do operate a JupyterHub, only dew services are available publicly, most of them are in development or for internal use only


For those participants that do operate a JupyterHub, most use a KubeSpawner


For those participants in the resource provider role, none are willing to connect shared resources to the central JupyterHub for an unspecified group of users and only few are willing to share resources with a specific group of users. Most are not willing to connect shared resources to the central JupyterHub at all.



For those participants that are willing to connect shared resources to the central JupyterHub for a specific group of users, all are willing for local users but only have are for NFDI-users or external users respectively





For those participants willing to connect shared resources to the central JupyterHub for a specific group of users, only 50% at most would expect any benefits from doing so.





Of the participants in the role of representatives or managers, all already know Jupyter.



The groups represented from participants in the role of representative or manager are relatively large (Mean = 44.71, Median = 30)




Of the participants in the role of representative or manager, half already have a dedicated partner offering Jupyter Services.




Further Contact

Most participants of the survey were interested in being contacted for further survey and in participanting in the user study.


Summary

Participants

  • In total, 75 people from 53 German research institutions participated in the user survey

  • survey participants came from 23 different consortia but mostly from consortia with a focus on natural sciences and computational methods.

  • Participants institutions were mostly universities and research institutions all across Germany.

  • Within their respective consortia, participants fulfill various roles, but mostly management and leadership roles, followed by IT- and infrastructure development, and roles as researchers and scientists.

  • Interestingly, only about half of all participants already know Jupyter4NFDI, despite their backgrounds in quantitative research and computational methods and their membership in other NFDI consortia.

  • In terms of expectations, participants were mostly interested in easy access, a single entry point and training.

  • With respect to Jupyter usage, most participants classified themselves as users of Jupyter, followed by the role of resource provider and representative/manager.

Branch Results

  • Survey participants in the user role where predominantly using Jupyter whenever possible or specifically for workshops and training.

  • Most of the participants in the user role were using Jupyter through Jupyterlite or through Google Colab.

  • Participant in the user role using Jupyter through external providers were using a variety of different providers.

  • For most participants in the user role, all asked for requirement are important: Concurrent Session Average, Concurrent Session Peak, User Number, GPUs per Session, CPUs per Session, RAM per Session, Persistent Storage per Session

  • Similarly, all asked for environment requirement were important for most users: Lab extensions, Software, Licenses, Custom Images

  • Most participants in the user role indicate to need write and read access to external data sources in the jupyter notebooks

  • Among Jupyter users in our sample, most have never head of binder

  • The majority of the sampled users indicates to need reproducibility from Git or similar systems in their notebooks. Some users are still unsure and only very few outright say that they do not need these features.

  • The majority of users in the sample indicate that they have never heard of jupyter lite but already use Google Colab.

  • With respect to other backends running on a JupyterHub proxy, most surveyed users are neither interested in in RStudio Server nor in VScode.

  • Participants in the resource provider role were mostly not personally responsible for infrastructures and services

  • About 50% of participants in the resource provider role had their own IaaS cluster at their institutions.

  • Of those participants that do have their own IaaS cloud, most of them run them in a bare metal configuration or on Kubernetes.

  • Most participants in the resource provider role already operate a JupyterHub, following by not using a JupyterHub or similar service. Only two participants indicated to operate a similar service that is not JupyterHub.

  • For those participants that do operate a JupyterHub, only few services are available publicly, most of them are in development or for internal use only

  • For those participants that do operate a JupyterHub, most use a KubeSpawner

  • For those participants in the resource provider role, none are willing to connect shared resources to the central JupyterHub for an unspecified group of users and only few are willing to share resources with a specific group of users. Most are not willing to connect shared resources to the central JupyterHub at all.

  • For those participants that are willing to connect shared resources to the central JupyterHub for a specific group of users, all are willing for local users but only half are for NFDI-users or external users respectively

  • For those participants willing to connect shared resources to the central JupyterHub for a specific group of users, only 50% at most would expect any benefits from doing so.

  • Of the participants in the role of representatives or managers, all already know Jupyter.

  • The groups represented from participants in the role of representative or manager are relatively large (Mean = 44.71, Median = 30)

  • Of the participants in the role of representative or manager, half already have a dedicated partner offering Jupyter Services.


Additional Information

For any questions, feedback, or comments, please contact the Jupyter4NFDI consortium here

To check out the detailed results, you can just click on the Binder button below. This will open an interactive Jupyter Notebook where you can explore the data in more detail. We have provided the code for you to run the descriptive analysis but you can also explore the data yourself!

Run Notebook