Talk:Module 9-c: Digital library evaluation, user studies

From DL Curriculum Project

Jump to: navigation, search

Contents

Instructions

1. Click an 'edit' link to type in your comments. Your evaluation may cover general issues concerning the module or a section of it, or you may make more fine-grained comments (e.g., on a particular section by referring to the section number or on a particular point by referring to the page and line number).
2. Add four '~' (without quotations) at the end of each comment. Your id, date and time will show up later.
3. Click the 'Save page' button on the bottom of the 'edit' page.

Module Objectives: Are the objectives appropriate for the topic?

Are the objectives observable? Will students be able to achieve the objectives, given the content in the body of knowledge?

These are good objectives, but it isn’t entirely clear to me how they are met by the curriculum. The first one is addressed, but it wasn’t clear to me where students learn and evaluate the strengths and weaknesses of different approaches or where they learn to select and apply an “appropriate” evaluation method. Appropriate implies that students will learn to compare/contrast different methods with respect to what they want to know and the particular DL, but it wasn’t clear where they learned and/or demonstrated that they could do this. Kellyd 10:26, 5 January 2008 (EST)

The Scope says that the module is concerned with different methods including “cost/benefit analysis,” but I didn’t see any mention of this. Kellyd 10:26, 5 January 2008 (EST)

Check the last sentence in Section 4: “in more systematically consider the potential” ? Kellyd 10:26, 5 January 2008 (EST)

All of this will be covered in 2-2.5 hours?! This doesn’t seem reasonable to me (I'll say more about this later).Kellyd 10:03, 5 January 2008 (EST)

The objectives work well, with three reservations. The first is with the philosophical scope of the module, which seems to include more experimental than naturalistic evaluation. I think that this is okay within the context and aims of the module if these aims are fully explained - and, conversely, for instance, I would not necessarily introduce more quantitative methods into a qualitative methods course - however I think that the objectives and early content should be modified to explain this differential coverage of methods.

Second, it also seems to me that despite the exercise B under Section 12 'Exercises,' that the students will be learning more about evaluation, rather than doing evaluation work (e.g. on some kind of test-bed). I know that they will be developing an evaluation plan, but I think that ways should also be explored to see if they could then implement some of this plan, at least on a limited scale. They would learn more about evaluation this way, and the third objective of _applying_ an evaluation method to a DL would be more fully met. Having some serious hands-on evaluation work would help the students, as apprentice evaluation practitioners, to begin to engage in the questions that Diane notes with regard to selecting and applying various methods. However I guess that there will also be some serious time considerations here if they were to do this. Maybe this could be 2 lesson module?

Third, it does seem like a lot of material for 2.5 hours. It's almost a mini-syllabus in itself. Khoom 18:02, 13 January 2008 (EST)

The objectives are appropriate for a DL evaluation class, but I have some reservations.

First, there is a lot of materials to cover in 2 ½ hours. The learning activities are listed as optional, but they are essential for achieving the objectives, especially the third one. If these exercises are incorporated into the class, there is simply not enough time to achieve all the objectives, unless the instructor is extremely organized and effective, but even if the instructor can cover all the topics, I wonder if students can absorb them in one class. One possibility is to use two class meetings to cover this module. The first class could focus on the roles of DL evaluation in DL design, implementation and management, and the range of evaluation methods available and their strengths and limitations. Students could go off and do their projects. In the second class meeting you could use the group presentations to refer back to methods of evaluation and help students appreciate the importance and challenges of DL evaluation.

Second, the scope of the course is to evaluate “the outcomes, impacts, or benefits of a digital library, including cost/benefit analyses” but the coverage of assessing outcomes, impacts, etc. seems limited and few articles actually address outcomes, benefits, etc. in depth. It is stated that the course will also include “methods that are useful for general user studies” as an effort to understand more fully people’s interactions with DL. It would be helpful if you would clarify what these “general user studies” are? For example, do you mean that the evaluation methods will be useful for understanding user demographics, usage trends, factors for usage, reasons for satisfactions with DL? Maybe a couple of examples will make the distinction between these user studies and usability testing clearer for students.

Third, this module will follow other modules on DL and is in the final portions of the module sequence. Because DL creators are so focused on producing a working system, evaluation often comes as an after thought. But that should not be the case. We need to remind students that evaluation should be closely tied to the objectives and purposes of a DL and should be part of the DL planning and design from the very beginning. Formative evaluations can be conducted during the development of DLs, summative evaluations can be done after a DL is established, and periodical evaluation can be conducted to monitor system usage and performance. We also need to emphasize the iterative nature of evaluation and how it can be used for strategic planning. It would be helpful for students to understand that evaluation is not a one-shot deal. Ingrid Hsiehyeei 10:53, 18 January 2008 (EST)

As for prerequisite knowledge, I think that depends on whether this module ends up being provided in one class or two. If it is covered in one class, I would recommend that students should have taken a research methods course. That way, they will come in with some knowledge of the range of research methods available and you can draw on that background to discuss how the methods can be used in the DL context and talk about special challenges related to DL evaluation. If the module is covered in two classes, it probably will be fine to have no prerequisite. Ingrid Hsiehyeei 10:56, 18 January 2008 (EST)

Body of Knowledge: Does the module address all areas of the topic that need to be addressed?

Will the body of knowledge enable students to achieve the objectives? Are there any topics that you think are critical to add to the body of knowledge? Are there any topics on that you would remove from the body of knowledge?

It seems to me that the module is focused more on usability-types of studies. I see that this is covered in another section, but I'm not sure that the material in this module will give students the ability to conduct a more sophisticated kind of study. It also felt to me as if quantitative/lab-based methods were emphasized more. You don't mention of use these words (quantitative/qualitative), so perhaps I 'read into this' with my own bias. Kellyd 10:13, 5 January 2008 (EST)

Is one implied objective that students will be able to conduct an evaluation that might possibly be published at JCDL? This might be worth thinking about -- does this module equip/prepare students to do this? Kellyd 10:13, 5 January 2008 (EST)

One extremely important topic that is missing is selection of tasks. For most evaluations the systems will need to be exercised in some way whether by users or by machine. This is especially critical when the evaluation involves humans. Technically speaking tasks, just like people, are sampled. One of the biggest mistakes researchers make is to select tasks that insure system success or that are just not representative of what people would actual do with the system. This also opens up another can of worms: rotation of tasks. (I also note that there isn't a discussion of counter-balancing if more than one system/interface, etc. is evaluated using a within subjects design). Kellyd 11:04, 5 January 2008 (EST)

With respect to the sampling section (Page 7 Line 20ish): Why not put convenience sampling first since this is what most people use? And of course, you haven't addressed the age old question: How many do I need? :O) Kellyd 11:04, 5 January 2008 (EST)

Analysis and Interpretation (Page 7 Line 29): Does this module assume that students know how to analyze data? This section is very thin in that regard. I like what you've said here with respect to separating results and interpretation. I also like the emphasis on contextualizing findings. Very nice. Kellyd 11:04, 5 January 2008 (EST)

Objects of evaluation (p2.35 on) should also include web metrics - measurements of traffic to and through the site. There are metrics aspects to some of the objects already mentioned, such as navigation, but it would be useful to have a web metrics section that discusses hows whys and whats in more detail; for instance, analyzing site referrers and referrals, to see where your visitors are coming from; or looking at growth trends over time.Khoom 11:47, 13 January 2008 (EST)

Section 9, Body of Knowledge, Questions that may be asked during an evaluation user study - p.3 l.29. You should add in here something about assessing what is 'doable' in the circumstances. Principal constraints here include time, money, staff members, expertise, eternal dependencies (e.g. access to server logs for web metrics) and so on. So while it would be nice to be able to do everything you want, evaluation-wise, often it's not possible. It might be useful here to work on a 'resource audit' that could be appended to the descriptions of the various evaluation methods listed on p.4 l.21 on, that lists resources required, and expected data gains.Khoom 16:48, 13 January 2008 (EST)

I would have a specific mention in Analysis and Interpretation methods (p.7 l.29 on)that addresses data triangulation (what it is, why it's useful, how to do it, and so on).Khoom 16:53, 13 January 2008 (EST)

I would have a specific section in Analysis and Interpretation methods (p.7 l.29 on)that addresses inductive versus deductive data gathering and interpretation. I think Diane mentions having a section on the pros and cons of naturalistic and experiemnatl studies. I think that this is a very good idea.Khoom 16:53, 13 January 2008 (EST)

“Questions that may be asked” should be integrated into “Evaluation and user studies” to provide a general overview of DL evaluation. You may also want to talk about the benefits and drawbacks of one-time studies and longitudinal studies. Then you can move on to talk about specific objects for evaluation.

Regarding the OBJECT section, while the framework provided by Saracevic (2005) is fine, I would encourage you to pay close attention to COLLECTION USE. Most of us tend to focus on how to organize information, design interface, etc., but what really matters to users is the collection. If we are to understand the impacts and benefits of DL, we need to know how are people using DL, not the search features, but the materials in DL, and whether the use of these materials help them in any meaningful way. In preparing the NISO Framework for creating good digital collections one of our struggles was to identify good studies on DL evaluation. So many studies address process and services, but very few honestly discuss whether DL makes a difference and how.

It also will be useful for students if you make a connection between the OBJECTS of evaluation to the scope/intent of the module to focus on outcomes, benefits, impacts, etc.

“Stages/steps in the evaluation/research process”: We should remind students of the need to think about how the findings will be used. Reporting results to stakeholders is one thing, but realistically thinking about how findings will be applied will be very useful. Evaluators should start with specific questions they want to address, and then identify the best methods to obtain data/answers. They will want to know external factors such as resources, time, and politics will affect the evaluation effort. They also need to know that they have some control over the integrity and rigor of their study and they need to draw conclusions based on solid data. Validity and reliability are critical issues in research and evaluation. You may want to include some readings on these topics.

Data collection methods section: If the module focuses on methods for evaluating outcomes and impacts, I am not sure experimental methodologies and observation methods are as appropriate as interviews, focus groups, diaries, and questionnaires. It seems that experiments and observation methods are more suitable for assessing user interaction with DL interface and services.

The measures listed under the methods seem to be more related to system and much less to users. Again, if the module’s focus is on outcomes, benefits, and impact, then these measures will have to be revised.

Pilot testing evaluation methods is important. I hope it will be mentioned somewhere in the module.

For questionnaires, it would be useful to talk about options for online surveys and their pros and cons.

“Study sample” is covered extensively, and that is appropriate especially for surveys, focus groups, and interviews. I would like to see stratified sampling be included. It would also be useful to show students how to identify potential subject pools and how to recruit to avoid bias. Ingrid Hsiehyeei 10:55, 18 January 2008 (EST)

Readings: Are the readings the best and most appropriate for the topic?

Are there any readings that you think are critical to add to the list? Are there any readings on the list that you would remove?

The readings seem fine to me, but I would add some general book about social science-ish research methods and other data collection and analysis techniques. Perhaps some reference to an interactive IR evaluation would be good? There are some good models out there.Kellyd 10:06, 5 January 2008 (EST)

Re. methods, are there any methods courses that would be pre-reqs for this? That would be the ideal situation for me. It would seem that this course is 'applied methods.' If they have no methods background, it might be hard to explain methods quickly, I think. And as the course is already pretty full of material, passing off the methods load onto another course would free up more space for DLs.Khoom 18:26, 13 January 2008 (EST)

How used are the students to reading? Only two articles seems to be quite a small amount to me. Here's some notes:

- the first part of the Reeves et al. guidebook is a good intro, especially for planning eval

- the Marchionini et al piece on multifaceted evaluation is very useful, and could be added

- the Intro to the Bishop, van HOuse and Buttenfield book on DL users is a good explanation of DLs as sociotechnical systems, and is not too long

I'll read the Scott piece and let you know, but it seems oriented to library info systems, and not DLs (although there is of course overlap).Khoom 18:20, 13 January 2008 (EST)

Readings: More items on evaluation and outcomes assessment would be useful. NISO's Framework for Creating Good Digital Collections (3rd edition, 2007) and Peter Hernon's papers on outcomes assessment would help, I think. Ingrid Hsiehyeei

Learning Activities: Are the activities appropriate for the topic?

Will students be able to accomplish the activities, given the content in the body of knowledge? Will the activities enable students to achieve the objectives? Can you think of any other class activities appropriate for this module?

It would be a good idea to elaborate Point 11 -- Concept Map. I know what this it, but it isn't clear to me what will happen and for what purpose. Kellyd 10:25, 5 January 2008 (EST)

Exercise A is nice and in my experience this is very helpful to students. I think that Exercise B should be described in more detail -- the sketch of the different parts of the proposal seems incomplete. For instance, students should also talk about how they will 'exercise' the system -- if users are involved what will they do? I also don't recall an explicit discussion of "procedures" so I'm not sure if students will understand what they are suppose to do here. Asking students to say something about data analysis is also usually a good thing. One more comment about Exercise B: I don't think that some of the example DLs are actual DLs -- an OPAC is not a DL it is an OPAC. I have a hard time understanding how flickr and MySpace are DLs too. I like Exercise C alot: it gives students an opportunity to practice interviewing and the content will also be informative to them. Kellyd 10:25, 5 January 2008 (EST)

I think example A is a good one. Maybe the instructor could provided some contextualization, for instance outlining some of the underlying quantitative or qualitative theoretical models. While the syllabus tends to assume that there is a fairly-well defined typology of DL evaluation actvities (which is fair enough, it is an introductory course), actual evaluation can be messier, with evaluators mixing, matching and adapting a range of approaches. Some hand-holding for the students might be useful here. Khoom 16:14, 13 January 2008 (EST)

Example B sounds more complex. I guess the definition(s) of DLs advanced earlier in the course will determine whether or not Flickr and other apps are counted as DLs or not. If they are, then the danger is of broadening the definition of DLs until it is no longer useful. I'd tend to agree with Diane: OPACs are not DLs (they're catalogues); Flickr is - what? - a personal archive that is semi-structured through tagging?; and I've heard a lot of claims for MySpace, but DL is not one of them. This matters, because many of the DL evaluation resources that are cited were intended by their author(s) to apply to narrower definitions of DLs, and students might get confused when applying them to objects that are not obviously DLs.

Something else that should be included here to make the exercise more realistic is some way to include the role of the client - some background, and amd also some idea of the client's needs. A further realistic dimension would be to specify budgets and timeframes, as these could affect the choice of evalaution approach. In fact, these latter issus (time, funding) should also be addressed earlier on in Section 9.Khoom 16:42, 13 January 2008 (EST)

With respect to Part 13: For Exercise A -- do students learn about Saracevic's five points for meta-analysis (I note that this isn't really a meta-analysis in the strictest sense of the word :O))? In an earlier section you emphasize the difference between data analysis and interpretation -- it might be useful to ask students to identify these parts of the articles and indicate whether the author has done a good job of separating these things. Finally, it might be useful to also ask students to evaluate the limitations of the studies. Kellyd 10:25, 5 January 2008 (EST)

Exercise C: This is a good exercise, and I note that you note that it depends on access to a robust DL project. My main observations are (a) that Directors might not always be concerned with or knowledgeable about evaluation (crazy but true), (b) that the evaluator(s) might also be good to talk to, if a project has one, and (c) that some evaluation or quasi-evaluation work might be carried out by others in the project, e.g. web admins might be running transaction log analyzers. A further useful question to ask here could be, "What % of your budget do you spend on evaluation?"Khoom 16:59, 13 January 2008 (EST)

Need to say ore about how “concept map” will be used.

Exercise A. Analyzing an evaluation report can be a useful class project. You can form sub-groups and have all of them evaluate the same report and report their findings to the entire class. Having a common target to analyze and discuss improvement will be very helpful.

Exercise B. OPAC is not a DL, Flickr has collections but it is not a DL as defined by most people involved in DL creation. I also would not classify MySpace as a digital library. These entities, in fact, could be useful for clarifying the definition of DL.

Developing an evaluation plan is highly desirable, but I am not sure how realistic it is. 4-5 hours of preparation may not be enough and presentation of 7-10 minutes seems a bit too short.

Exercise C. If students are skillful, they can learn a lot by interviewing the personnel involved in DL evaluation. I think the key is preparation. The instructor will need to help students brainstorm and identify questions they want to ask and help them to be good interviewers. Ingrid Hsiehyeei

Level of Effort and Prerequisites: Is it feasible to teach the module as it is currently constructed?

Is the level of effort required in class appropriate to the scope of the body of knowledge? Is the level of effort required prior to class appropriate? Is the prerequisite knowledge required sufficient for students to comprehend the body of knowledge?

Doing this evaluation was difficult since I can only ‘see’ the outline of the topics that will be discussed. I think the goodness of the module will depend a lot on who is teaching the material (but perhaps that can be said of all modules). Excellent knowledge of the topic and the ability to present the material in an extremely concise manner will be necessary for any instructor. Even though I’ve had a lot of experience teaching this type of material, I don’t know if I could do it in 2-2.5 hours. I also have some questions about the level of depth that students will gain – conducting studies with human subjects is complicated and I am uncomfortable sending the message to people that it can be mastered in 2-2.5 hours. Kellyd 10:15, 5 January 2008 (EST)

Ditto on the observation that it seems like a lot of work. During my reading of the materials earlier, and stimulated by too much caffeine, at one point I started imagining this module as an actual course syllabus ;-)

Seriously though thereare some fairly complex concepts in this module related to methods and experimental design, iterative development and so on. It should be possible to deal with each of these reasonably succinctly, however I wonder whether or not cumulative effect of looking at all of these areas in 2 hours would be constructive or not. Khoom 19:49, 13 January 2008 (EST)

As I stated before, if you are to offer this module as is, students should be required to have some background in research methods. I think that will enable you to use the DL context to fully explore issues related to DL evaluation. Because of the amount of materials included in the module, I would recommend that you use two classes to cover them. Ingrid Hsiehyeei 11:04, 18 January 2008 (EST)

Overall Structure of the Module: Is the current module well structured?

Can the topics and their corresponding resources be easily divided? Is there a clear mapping between the objectives and the content of the body of knowledge section? If not, how could the objectives be mapped to the body of knowledge more clearly?

Overall, the structure seems fine -- it follows the steps that one would execute to do a study. I recommend switching the order of the second and third sections (put the discussion of questions before the discussion of objects). Kellyd 10:49, 5 January 2008 (EST)

I agree with Diane here. Presumably students should be quite familiar with the objects listed in Sec. 2 by the time they get to this module. Sec. 3 makes some fundamental points about evaluation purpose and practice that should be introduced earlier. Particularly, the distinction between formative and summative evaluation, and which is appropriate to use in which context, needs to be discussed more. I'd suggest including this on p.2, l.14 on.Khoom 11:53, 13 January 2008 (EST)

The section about objects (starts on page 3 line 35) was difficult to parse because there seems to be at least three things mixed together: processes, criteria and methods. The items don't seem symmetrical (if that is the right word) or exhaustive. For instance, why is usability only listed for the first process? User satisfaction is a dimension of usability, so why are these things listed separately? Why is user satisfaction listed along-side Failures? These things seem orthogonal. I think it would be useful to list processes by themselves and discuss methods later (and maybe revisit processes by using a table or matrix of processes X methods/measures). Kellyd 10:49, 5 January 2008 (EST)

The passing mention of 'impacts' of DLs on p.3 l.35n is too brief I think. From my point of view 'impact' can be a problematic way to frame DL use. We can of course see changes in particular instances, but it's hard to generate from here to wider societal impacts. There's actually huge controversies going on regarduing the 'impact' of ed techs in general, and whether they have any impact at all, and if they do, what the nature of that impact is. You'd also have to be careful to avoid promoting technological determist models here. I'd like to see Rob Kling's work on social informatics ar least mentioned, if I was teaching this.Khoom 12:06, 13 January 2008 (EST)

With respect to the "Questions" section -- I like your definitions in the glossary of formative and summative evaluations. It seems like you'd want to put them here. It would be particularly useful to emphasize here that one is used during the development cycle and the other is used at the end. This would address the objective of trying to help students pick an appropriate approach. Kellyd 10:49, 5 January 2008 (EST)

"Evaluation sub-section" (Page 4 Line 21): It would be useful to provide more explicit discussion of the advantages and disadvantages of naturalistic and experimental studies. Again, this is a good place to address the objectives. I am not sure I believe that students will be able to grasp the independent/dependent variable idea so quickly. Kellyd 10:49, 5 January 2008 (EST)

Ditto re. the students not necessarily being able to grasp dependent and independent variables quickly.Khoom 19:54, 13 January 2008 (EST)

"Data Collection ..." (Page 5 Line 17): I had a hard time understanding the structure of the information under observations. Why not divide this stuff into Direct and Indirect methods? It might also be worth indicating (or structuring) this stuff along of the lines of, observation of overt behaviors vs. understanding the behaviors (with regard to understanding why something is done). I feel strongly that stimulated recall should be included here. I believe it is a better alternative to think-aloud in most cases. It seems to me that Diaries are also a good way to understand about processes and a good way to tap into affective and emotive dimensions. They also facilitate participant reflection. Interviews are also good ways to learn about/develop evaluation measures. Finally, it might be worth making the open/closed question distinction with respect to surveys. Kellyd 10:58, 5 January 2008 (EST)

Again some good points from Diane here. In interviews and focus groups, I would emphasize the importance of the facilitator(s) working from a set script in each case.Khoom 19:54, 13 January 2008 (EST)

The way you organize the module is fine. My only concerns are 1) the intent to focus on outcomes, impacts, benefits, etc. does not seem to be followed. Some of the evaluation methods and most of the measures are for assessing users' use of DL services and interfaces; 2) there is not enough time for students to absorbe the materials and have the hands-on practice; 3) interviewing people involved in evaluation can shed light on the process and the challenges of evaluation, we probably should not assume students automatically know how to interview people; Ex. C is a worthy effort, prepare students well for this assignment would be helpful. Ingrid Hsiehyeei 11:15, 18 January 2008 (EST)

Additional Comments

Overall, I think that most of the important parts are in the module and I like the exercises. Most of my suggestions above regard rearranging the order in which things are presented. The other main comment is related to the objectives – more discussion of the strengths and weaknesses of the different approaches should appear within the main content of the module. Kellyd 10:36, 5 January 2008 (EST)

Yes, it covers most of the issues. My concerns include:

- not enough discussion of the distinction between qualitative and quantitative methods

- the different eval methods need some kind of annotation that exlpains how expensive they are in terms of different kinds of resources (time, funding, expertise, etc.)

- it does seem like a lot of ground to cover; maybe some of this work could be off-loaded to well written and structured handouts Khoom 19:57, 13 January 2008 (EST)

This is an important module. I would encourage designers of the DL program to make sure students understand evaluation should be an integral part of DL and it can take place before, during, and after a DL is set up. Because of the importance of this topic, I would recommend that you allocate two class periods to it. We also need to remember to assess the usefulness/value of the materials provided by DL, not just the interface, the indexing, the power of the search engines, etc., whose importance we all appreciate. Ingrid Hsiehyeei 11:19, 18 January 2008 (EST)

Personal tools