Box 7914
NC State University
Raleigh, NC 27695-7914
 
In the News:
Rada Chirkova
Information integration and interoperability among information sources are related problems that have received significant attention since the early days of computer information processing. Recently, the advent of XML as a standard for online data interchange holds much promise toward promoting interoperability and data integration. The more recent development of tools and techniques that are loosely referred to as Web 2.0, also emphasize information integration and introduce the concept of mashups. The focus has shifted to one of providing integration and interoperability among a large number of independent and autonomous information sources.

This research proposal builds upon recent advances in Semantic Web and information integration. The ultimate goal is to produce the formalism, and lay the groundwork for a broad spectrum of information integration, including mashups, that facilitate rapid development of efficient and effective integration systems. In this proposal, we concentrate on information integration from XML sources. We introduce the Semantic Model approach that uses simple models to represent the information in information sources. A user query can be specified in terms of the semantic model, or in terms of the schema of any source. It is then translated and executed on the original data at each source. The processing will be largely on XML data, which is either the native model of the source or can be obtained using the wrapper approach to represent sources' data in XML. We consider, in particular, the XML pipeline approach as the preferred architecture for the implementation of an extensible system for integration/mashup.

Networking issues: A network creates the illusion of location independence, but in reality the different data sources are separated by geographical distance. The optimal query plan depends on the properties of information sources (such as volume of data, representation schema, or integrity constraints) as well as on the interconnection-network parameters (such as distance, load, or speed). A robust query-optimization system can benefit from monitoring the network and adjusting the query plan accordingly. Our research will address the impact of networking aspects on the optimization of queries.
Laurie Williams
CERT reports an overall 740% increase in the number of vulnerabilities [A vulnerability is an instance of a [fault] in the specification, development, or configuration of software such that its execution can violate the [implicit or explicit] security policy [15].] reported between 1999 and 2006. Despite increased focus on new techniques for computer system security, 2006 had 35% more vulnerabilities reported than 2005. Rather than gaining a foothold on the security problem through our increased focus, we continue to lose ground. We must build better software. The work outlined in this proposal focuses on the use of security metrics to prioritize risk-based software engineering for security. Specifically, the objective of this proposal is to build a predictive model based upon security metrics obtained from code artifacts, inspections, and testing to highlight vulnerability-prone and attack-prone components for the risk-based prioritization of re-design, inspection, and testing efforts. The model will be built and validated through detailed analysis of industrial code and data of CACC members: automated static analysis alerts, other static metrics to be determined, inspection records, testing records, and customer-reported problems.
Christopher G. Healey
This proposal describes a follow-on one-year research project to apply techniques from scientific visualization to the problem of displaying, monitoring, and analyzing network-based data. Previous work was conducted primarily with Cisco System, with presentations to MCNC. Visualization is an area of computer graphics dedicated to developing ways to convert collections of strings and numbers into images that allow viewers to explore, discover, and analyze within their data. Interest in flexible methods to visualize network environments has grown in recent years, particularly with the recent emphasis on network and data security and reliability. We will partner with one or more CACC members to identify network-related problems within their company that would benefit from an advanced visual representation.
George Rouskas
21st century science and education are increasingly relying on high-capacity guaranteed network services for high-end applications, data-set migration, visualization, and HD-classrooms. Governments recognize the importance of dynamic end-to-end lightpaths for mission-critical e-science, as evidenced by recent funding for dynamic optical networking (e.g. DARPA's CORONET, the EC's PHOSPHORUS). With the advent of optical national and international networks (NLR, GLIF, GEANT2, etc.) and standardization efforts on control planes, there is a critical need to develop and deploy middleware that can establish end-to-end paths over this infrastructure, crossing multiple network administrative domains having multiple network technologies with multiple control mechanisms. Current research practice tends towards collaborations among multiple groups and institutions that require high bandwidth connections to use multiple network-connected resources.This has produced a very difficult problem due to a fundamental lack of scheduling capabilities for these network resources as well as the lack of infrastructure capabilities to provide the automatic, scheduled and rapid establishment of lambdas across administrative domains. Today, use of guaranteed high-performance network resources is achieved by communicating directly with the multiple providers of this infrastructure along the necessary paths, to manually schedule and establish the connections. This process involves manual intervention by network engineers in all domains along the path, increasing the cost, difficulty and delay factors for using these resources.

Simply put, lambdas should be scheduled - come up and go down - based on end-user requirements and not engineering needs. To satisfy the needs of researchers, this has to include the transparent establishment of paths crossing multiple network administrative domains. Currently, scientists have no basic automated mechanism for requesting the scheduling of lambdas and full paths to support their work. The inordinate burden of administrative coordination and timescales for the establishment of lambdas results in the de facto standard practice of lambdas being established and held in place for a year or more typically only periodically used and thus limiting the availability of those resources.

This situation means that use of such network resources and capabilities is restricted to the set of researchers willing to invest the significant time and other resources necessary to discover the network resources, identify the providers, negotiate use of the networks in multiple administrative domains, and manually co-schedule the availability of those network links in order to establish the necessary path for accomplishing some work. This is in addition to securing the use of the network-attached resources during the same window of time. Of course, this is inherently not scalable and it makes the barriers to use these networks far too high for all but a very small set of researchers who are both willing and able to overcome the barriers.

The effort we describe in this proposal is the first step towards a long-term vision of a distributed scheduling infrastructure that will allow researchers to reserve high-performance network paths via straightforward interfaces to support their research endeavors. Our work will support the development of sophisticated scheduling algorithms and capabilities, and complicated policy implementation and enforcement techniques, and their incorporation into single-domain schedulers; in future research, we will extend and incorporate these capabilities into multi-domain schedulers. Simulation results of algorithms will be used for prototype implementation and experimentation, and then results from experimentation will be fed back into the simulation environment for continuous enhancement of the algorithms.
Michael Devetsikiotis and Yannis Viniotis
With the increasing demand for high speed, ubiquitous, virtualized services through remote applications, E-commerce, digital media, and portable devices, provisioning for "on-demand" service system remains a very challenging and current issue.

The VCL Project is an emerging effort to provide on-demand and reservation-based remote access to NC State's extensive library of Engineering, Design, and Scientific software applications to address the increasing needs of both local and distance students and faculty for 24x7 access to advanced computing laboratory facilities. The heart of VCL is a web-based service for scheduling and provisioning of remote access to a set of high-end computational resources. These resources are loaded on demand with a choice of operating system images and predefined application set geared to instructional computing.

We propose to study, analyze and improve the algorithms such as the ones used by the VCL in order to allocate resources to incoming requests, based on past historical data, monitoring of the system (resource) "state" and predictive scheduling. An additional area of crucial interest is the scalability of the design and resources layout in order to allow expansion of the VCL operation across larger geographical areas and large number of locations and campuses.

We plan to accomplish this as a natural continuation of our CACC-funded research in "on demand performance". Our on-demand performance test-bed, established last year under the previous CACC award, will allow detailed and realistic experimentation at NC State. We have already initiated a flexible laboratory testing environment that we envision as a platform for evaluating, testing and improving VCL scheduling and provisioning functions. We have also worked on embedding monitoring agent that can collect data from the VCL, for example by use of the ITM agent installed under the previous CACC award.
Eric Rotenberg
Next-generation computing/communication devices, such as cell phones, PDAs, and sensor-network nodes, will require deeper storage capacity as functionality and feature sets increase. Engineers have corroborated this sentiment, citing embedded memory cost and power as key constraints on their embedded software. DRAM is the clear successor to SRAM due to its lower cost per bit, thus meeting software demands. However, the entire DRAM is refreshed once every 64 ms -that is, 16 times every second - to preserve stored information, a substantial energy drain in power-optimized embedded devices, not to mention a reduction in useful bandwidth. We show that most DRAM cells actually retain information on the order 10s of seconds (64 ms refresh is for handling worst-case cells).

The key lies with exploiting dramatic variations in retention times among different DRAM pages. We recently proposed Retention-Aware Placement in DRAM (RAPID), novel software approaches that can exploit off-the-shelf DRAMs to reduce refresh power to vanishingly small levels approaching non-volatile memory. The key idea is to favor longer-retention pages over shorter-retention pages when allocating DRAM pages. This allows selecting a single refresh period that depends on the shortest-retention page among populated pages, instead of the shortest-retention page overall. We explore three versions of RAPID and observe refresh energy savings of 83%, 93%, and 95%, relative to conventional temperature-compensated refresh. RAPID with off-the-shelf DRAM also approaches the energy levels of idealized techniques that require custom DRAM support. This ultimately yields a software implementation of quasi-non-volatile DRAM.

In addition to providing real value for highly-functional, energy-constrained, and cost-constrained computing/communication devices, we believe RAPID is inexpensively deployable because it is based solely on software and commodity off-the-shelf DRAM. The next step in this research is to integrate RAPID into one or more real system prototypes of interest to CACC members, including a cell phone and/or a sensor node.

The objective of the proposed project is to deploy, experiment with, and evaluate RAPID in real systems. This will enable us to productize RAPID and iron out implementation details, including on-line DRAM retention-time testing and integration of RAPID allocation routines into the operating system's virtual memory system (or equivalent for devices without virtual memory). We will demonstrate extended battery life, increased DRAM bandwidth, reduced DRAM latency variability for QoS, increased functionality and enhanced features through deeper storage capacity, conveniences of instant-on/instant-off computing, enhanced resilience to power outages, and other benefits traditionally afforded by non-volatile memory.
Harry Perros and Yannis Viniotis
The next Generation Network (NGN) will allow the network service providers to offer new services so that to hold on to their customer base and at the same time increase market share. Triple and quadruple play is an example of an NGN service, whereby a bundled set of services is offered to subscribers for a fee. These services include: Internet access (e.g., web surfing, peer-to-peer, email, on-line games), voice (e.g., mobile/fixed phone, VoIP), and digital video services (e.g., TV, IPTV). Other services are also envisioned to be offered in NGN, such as multi-party multi-media teleconferences and mobile telemedicine.

This proposal is the continuation of a project entitled "IP Triple and Quadruple Play Services: Modeling and Design", currently being funded during this academic year 2005/2006. The project deals with the dimensioning of an access network. Specifically, of interest is to determine the size of the upstream and downstream links as a function of the number of ADSL/cable modems supported by the access network. Alternatively, given the size of the upstream and downstream links, determine how many ADSL/cable modems can be supported.

Currently we have obtained results based on simulation. For the following academic year 2007/2008, we propose to develop an optimization algorithm that will permit us to dimension an access network subject to SLA constraints such as percentile end-to-end delay and packet loss. In addition, the same model will be able to permit us to dimension connections through a WAN, such as pseudowires, that are necessary to connect an access network to a distant content provider.

Dimensioning networking gear has a significant impact on the competitive pricing of bundled services. The models we will develop in this project are essential tools for providing proper dimensioning guidelines.
Top Blue Corner06-02: IP Triple and Quadruple Play Services: Modeling and Design
Harry Perros and Yannis Viniotis
The "Triple" and "Quadruple play" are part of the cable and telecom industry's strategy to offer new networking services, which will permit them to hold on to their customer base, and increase market share. Both plays combine offering an integrated bundle of Internet access (e.g., web surfing, peer-to-peer, email, gaming), voice (e.g., mobile/fixed phone, VoIP), and digital video services (e.g., cable TV, IPTV) over a packet network. The subscribers connect to the service via a hierarchy of devices (e.g., access concentrators, head-ends, switches and routers) that aggregate traffic.

The dimensioning of such devices (i.e., figuring out how many users are connected to an access concentrator, how many devices from a hierarchy level to aggregate to the next level) has a significant impact on the competitive pricing of the services.

The general objective of this project is to develop traffic models for evaluating the statistical multiplexing gains possible under the presence of the traffic mix dictated by such triple and quadruple plays. Such models can aid network engineers in properly dimensioning networking gear.
Top Blue Corner06-03: On Demand Testbed: Monitoring for Capacity Planning and Performance Optimization
Michael Devetsikiotis and Yannis Viniotis
With the increasing demand for high speed, ubiquitous, networking services through applications in E-commerce, digital media, and portable devices, provisioning for "on-demand" networking services is a very challenging and current issue. Performance monitoring is crucial for maintaining adequate quality of service and customer satisfaction, while maximizing resource utilization. Ensuring performance in an efficient manner creates a need for advanced techniques and tools in all aspects of monitoring, data analysis, simulation, modeling, optimization and control.

Our on-demand testbed, currently under establishment, will allow detailed and realistic experimentation at NC State, in parallel with activities at IBM where the students are also spending time as interns. We have already initiated a flexible laboratory testing environment that we envision as a platform for evaluating, testing and improving aspects of software and service management tools (e.g., Tivoli TBSM, ITM etc.).

The award will fund research activities aiming to make significant contributions to the capacity planning and automation for monitoring tools by use of modeling, simulation, testbed emulation and on line optimization. Specific goals include:

  • The study of models, response surfaces and advanced simulation methods, and,
  • The creation of an automated paradigm for on-line optimization for capacity tuning.
Mihail L. Sichitiu
During the past year, there has been significant effort and progress in IETF's workgroups toward improving OSPF's performance in mobile ad-hoc networks (MANETs). However, despite the progress in IETF and the previous CACC project, the current solutions are far from optimal. In this project we propose two improvements aimed at increasing OSPF's efficiency in MANETs. In particular we propose an adjacency formation algorithm and an area management scheme, expected to significantly increase OSPF's efficiency in MANETs. We will also consider the effects of multi-topology routing on the adjacency and area management algorithms.
Munindar Singh
Virtual organizations (VOs) are organizations of entities (their members) such as people, institutions, businesses, and their computational resources that collaborate to address collective and individual goals. VOs are virtual in that their emphasis is on the sharing of virtual resources. However, they are no less real than any other organizations. VOs are grounded in the business processes of their members; their behaviors can have financial and legal import. The main difference between VOs and traditional organizations is that the life cycle of VOs operates at much wider time scales. A VO might come together, operate, and disband dynamically within a matter of minutes or hours; yet a VO may continue to exist as long as any human institution. The following characteristics of VOs distinguish them from traditional IT architectures, and are important for our present purposes.

The objective of this project is to address two major, related challenges: (1) how to ensure that the agents interact correctly within and across VOs under different circumstances, and (2) how to specify agents and VOs in a perspicuous policy-based manner that engenders confidence in the functioning of the VOs involved.
Top Blue Corner06-09: On Expediting Software Engineer AWAREness of Anomalous Code-continuation of project #05-15
Laurie Williams and Tao Xie
The objective of our research project is to continue CACC-supported development of the Automated Warning Application for Reliability Engineering (AWARE) tool. AWARE will continuously provide the programmer with prioritized and trained information on faults revealed via compilation, static analysis, and dynamic testing. We are extending the functionality of the tool from last year's grant in the following ways: (1) more sophisticated prioritization of alerts; (2) learning of how often to initiate alert runs and how to display the alerts; (3) enhanced automated test case generation; and (4) redundant test case reduction. AWARE will provide the programmer with better diagnosis information, and ultimately, we believe will improve programmer productivity and product quality. We will assess the efficacy of providing the programmer with this stream of information by working with CACC members.
Tao Xie and Jun Xu
Unit testing is an important activity in assuring high quality of software programs. Although there exist object-oriented test-generation tools for Java programs, two important issues pose barriers for their wide adoption. First, the existing unit-test generation tools usually generate many false warnings that could overwhelm and frustrate developers. Second, the existing unit-test generation tools generate a large number of random input values for the method under test; however, these generated test inputs may not exercise the meaningful, important situations where this method is actually used in the system under test. To address these two important issues in automated unit-test generation, we propose to integrate static and dynamic analysis to improve automatic test generation by considering usage contexts of the unit in the system. We use static and dynamic analysis to collect usage context information for the unit under test, and then use context information to guide test generation. Developers can use the resulting improved Java test-generation tools to augment their manually written tests to better assure high software quality.
Top Blue Corner05-07: Data Prefetching to Improve Throughput in Data-Intensive Applications
Rada Chirkova
Modern information system architectures place applications in an application server and persistent objects in a relational database. In this setting, we consider the problem of improving application throughput; our proposed solution uses data prefetching to minimize the total data-access time of an application, in a manner that affects neither the application code nor the backend DBMS. Our methodology is based on analyzing and automatically merging SQL queries to produce query sequences with low total response time, in ways that exploit the application's data-access patterns.
Mihail Sichitiu
OSPF has emerged as the de facto standard for intra-AS routing in the Internet. Recently mobile ad hoc networks (MANETs) have evolved as an irreplaceable networking technology for situations with sparse or inexistent infrastructure, and highly mobile participants. We propose to extend OSPF's capabilities such that it will be able to scale to large MANETs.
Top Blue Corner05-11: Sangram+Facetop=World's Best: Distributed Extreme Programming Environment
Ed Gehringer
We propose to create a state-of-the-art environment for distributed Extreme Programming by marrying the Sangam editor, developed at NCSU, with the FaceTop user interface, developed at UNC-Chapel Hill. Sangam is a plug-in for the widely used Eclipse development environment that facilitates distributed Extreme Programming by sending events back and forth between driver and navigator. The Facetop is a novel user interface concept that uses semi-transparent, full-screen video overlays to support close pair collaborations. The Facetop allows a distributed pair to recapture some of the face-to-face communi­ca­tions that are lost in no-video distributed pairing sessions. It also allows members of a distributed pair to point conveniently, quickly, and naturally to their shared work, in the same manner (manually) that they do when seated side by side. Combining Sangam and the Facetop will produce an integrated tool that will be a quantum leap forward for distributed Extreme Programming and distributed agile development.
Top Blue Corner05-15: Continuous Checking of Static Analysis and Automated Unit Tests for Java Programs
Laurie Williams and Jun Xu
Both static analysis and dynamic testing are important for finding defects in software applications. We propose to develop the Automated Warning Application for Reliability Engineering (AWARE) tool that would use the computer's available CPU cycles and continuously provide feedback to software developers on compilation, static analysis, and testing defects. The user can train the tool to reduce false positives reported from static analysis. Performing development in the presence of a fault (logic or semantic) lengthens the time to correct the fault as new code builds upon the fault. The longer the developer is unaware of the fault, the worse its effects will be. AWARE provides the developer with information on compilation, static analysis, and testing faults while the new code is fresh in the developer's mind. We will implement AWARE as a plug-in for the open source Eclipse development environment and will validate the effectiveness of the tool via empirical studies of open source programs and by working closely with CACC members.
Top Blue Corner05-19: Visualizing Network Data and Environments
Christopher Healey
This proposal describes a one-year research project to apply techniques from scientific visualization to the problem of displaying, monitoring, and analyzing network-based data. Visualization is an area of com­puter graphics dedicated to developing ways to convert collections of strings and numbers into images that allow viewers to explore, discover, and analyze within their data. Interest in flexible methods to visu­alize network configurations and traffic patterns has grown in recent years, particularly with the recent emphasis on network and data security and reliability. We will partner with a CACC member to identify network-related problems within their company that would benefit from some form of visual representa­tion.
Top Blue Corner05-22: Supporting Evidence-Based Software Engineering
Laurie Williams
To inform their decision making, industry professionals are most influenced by compelling evidence on the effectiveness of a technique in live situations in an environment such as their own. However, few software engineering research studies involve industrial organizations. Consequently, practitioners have little evidence grounded in research results to inform their decision making. With a lack of industry-based results, practitioners may too often base their technology choices on intuition rather than evidence. The field of evidence-based software engineering (EBSE) is emerging to address these challenges. With CACC member support, we have established an exemplar for industry/research collaboration focused on the examination of Extreme Programming and agile practices. We have structured and repeatedly performed the type of industrial case study research that is often lacking. In this proposal, we seek to continue and further this collaboration. Our existing research framework will serve as a basis for our studies, and we wish to extend its applicability and broaden its use beyond Extreme Programming and agile practices to any technology or process of interest to members. In particular, we are interested in applying our research framework to provide industry-based research results of agile software development, software process transition, process customization, reliability and quality prediction, requirements prioritization, pair programming vs. inspection, and distributed software development. As part of this research, we will also adapt an Eclipse plug-in to enable detailed causal analysis of field failures and to empirically examine the defect-removal efficacy of V&V techniques, such as inspections and unit testing, to build up evidence about these practices.
Khaled Harfoush
The desire for ubiquitous network connectivity - anywhere, anytime - to the Internet and to private or corporate networks; to communicate with co-workers, friends, and family; and, increasingly to enjoy online entertainment are driving the rapid growth of wireless technology and the roaming technology between wireless networks. Wireless Local-Area Networks WLANs such as 802.11x standards have quickly become the fastest growing type of consumer networking device. This is due in large part to the mobility and convenience that they offer to users. WLANs are not expensive to build and maintain, and provide shared gross data rates from 10 to 50 Mbit/s as opposed to the limited 10-100 kbit/s offered by cellular wide-area networks such as GSM, GPRS, and UMTS. It is not hard to envision that, in the near future, it will be possible to construct large scale wide-area wireless IP networks interconnecting neighboring wireless islands to each other and to the Internet, offering high bandwidth and extended coverage similar to those currently used to offer cellular phone service. Users will not only be able to use this network in the comfort of their homes, in parks, or in coffee shops; but also while riding trains or even while driving their cars. Offering secure and continuous connectivity to ensure reasonable performance for real-time applications such as video streaming and Voice over IP (VoIP) will be the keys to the commercial success of such networks. Our focus in this proposal is on (1) providing seamless and secure roaming capabilities between WLAN islands, and (2) providing scheduling techniques to reconciliate the competition between applications with different QoS requirements in WLANs.
News Bottom