jueves, 13 de febrero de 2020

Oracle Linux: Desfragmentar un sistema de archivos XFS

Una de las primeras cosas que hacíamos hace ya bastante años atrás, era el proceso de realizar una desfragmentación sobre los discos duros de nuestras computadoras.

Algunos, los valíamos de software "especializado" como Norton Disk Doctor u otros.

La biblioteca de conocimiento popular Wikipedia, define la fragmentación de la siguiente manera:
"La desfragmentación es el proceso conveniente mediante el cual se acomodan los archivos en un disco para que no se aprecien fragmentos de cada uno de ellos, de tal manera que quede contiguo el archivo y sin espacios dentro del mismo. Al irse escribiendo y borrando archivos continuamente en el disco duro, los fragmentos tienden a no quedar en áreas continuas, así, un archivo puede quedar "partido" en muchos pedazos a lo largo del disco, se dice entonces que el archivo está "fragmentado".
Al tener fragmentos de incluso un archivo esparcidos por el disco, se vuelve ineficiente el acceso a los archivos.
Los fragmentos de uno o varios archivos es lo que hace factible la desfragmentación.
El problema de almacenamiento no contiguo de los archivos se denomina fragmentación, es conveniente desfragmentar el almacenamiento de los archivos en dispositivos de almacenamiento electromecánicos por el uso del computador. (Los SSD no son mecánicos)."
En el caso de Oracle Linux podemos usar el comando xfs_fsr para desfragmentar sistemas de archivos XFS completos o archivos individuales dentro de un sistema de archivos XFS.

Como XFS es un sistema de archivos basado en la extensión, generalmente no es necesario desfragmentar un sistema de archivos completo, y no se recomienda hacerlo.

Si ejecuta el comando xfs_fsr sin ninguna opción, el comando desfragmenta todos los sistemas de archivos XFS grabables y actualmente montados que se enumeran en /etc/mtab.

Durante un período de dos horas, el comando pasa por cada sistema de archivos a su vez, intentando desfragmentar el diez por ciento superior de los archivos que tienen la mayor cantidad de extensiones. Después de dos horas, el comando registra su progreso en el archivo /var/tmp/.fsrlast_xfs, y se reanuda desde ese punto si ejecuta el comando nuevamente.

Con el comando xfs_db -r -c "frag -f" se puede evaluar los cambios realizados sobre el nivel de desfragmentación en el sistema de archivos.

Nota: Existen múltiples bugs registrados en MOS, asociados al emplear este comando sobre sistemas de archivos en donde existen archivos de datos repositorios de TABLESPACES, de una base de datos Oracle. Mi recomendación es, evaluar el nivel de fragmentación y si esta es lo suficiente significativa como para realizar esta tarea de mantenimiento, se debe crear una ventana de tiempo, bajar la base de datos y proceder a ejecutar la tarea.

[root@Lab1BD]# df -h
Filesystem                       Size  Used Avail Use% Mounted on
devtmpfs                         2.8G     0  2.8G   0% /dev
tmpfs                            2.8G   36K  2.8G   1% /dev/shm
tmpfs                            2.8G   57M  2.8G   2% /run
tmpfs                            2.8G     0  2.8G   0% /sys/fs/cgroup
/dev/mapper/ol_lab1bd-root   50G   12G   39G  23% /
/dev/sda1                       1014M  169M  846M  17% /boot
/dev/mapper/ol_lab1bd-home  2.1T  639G  1.5T  31% /home
tmpfs                            571M     0  571M   0% /run/user/0
tmpfs                            571M     0  571M   0% /run/user/54321

[root@Lab1BD]# xfs_db -r -c "frag -f" /dev/mapper/ol_lab1bd-home
actual 71329, ideal 4587, fragmentation factor 93.57%
Note, this number is largely meaningless.
Files on this filesystem average 15.55 extents per file

[root@Lab1BD]# xfs_fsr -v /dev/mapper/ol_lab1bd-home
/home start inode=0
ino=174
extents before:11889 after:1 DONE ino=174
ino=186
extents before:17 after:2      ino=186
ino=187
extents before:6 after:1 DONE ino=187
ino=172
extents before:5 after:1 DONE ino=172
ino=188
extents before:5 after:1 DONE ino=188
ino=189
extents before:5 after:1 DONE ino=189
ino=182
extents before:3 after:1 DONE ino=182
ino=144
extents before:2 after:1 DONE ino=144
ino=149
extents before:2 after:1 DONE ino=149
ino=4810
extents before:624 after:1 DONE ino=4810
ino=4789
extents before:40 after:3      ino=4789
ino=4797
extents before:38 after:3      ino=4797
ino=4793
extents before:37 after:2      ino=4793
ino=4796
extents before:37 after:1 DONE ino=4796
ino=4799
extents before:38 after:2      ino=4799
ino=4801
extents before:36 after:2      ino=4801
ino=4802
extents before:36 after:2      ino=4802
ino=4792
extents before:35 after:1 DONE ino=4792
ino=4794
extents before:34 after:2      ino=4794
ino=4795

[root@Lab1BD]# xfs_db -r -c "frag -f" /dev/mapper/ol_Lab1bd-home

actual 58883, ideal 4588, fragmentation factor 92.21%
Note, this number is largely meaningless.
Files on this filesystem average 12.83 extents per file

[root@Lab1BD]#

Oracle Announces Oracle Cloud Data Science Platform


Press Release

New service makes it quick and easy for data science teams to collaboratively build and deploy powerful machine learning models

REDWOOD SHORES, Calif.—Feb 12, 2020
________________________________________

Oracle today announced the availability of the Oracle Cloud Data Science Platform. At the core is Oracle Cloud Infrastructure Data Science, helping enterprises to collaboratively build, train, manage and deploy machine learning models to increase the success of data science projects. Unlike other data science products that focus on individual data scientists, Oracle Cloud Infrastructure Data Science helps improve the effectiveness of data science teams with capabilities like shared projects, model catalogs, team security policies, reproducibility and auditability. Oracle Cloud Infrastructure Data Science automatically selects the most optimal training datasets through AutoML algorithm selection and tuning, model evaluation and model explanation.

Today, organizations realize only a fraction of the enormous transformational potential of data because data science teams don’t have easy access to the right data and tools to build and deploy effective machine learning models. The net result is that models take too long to develop, don’t always meet enterprise requirements for accuracy and robustness and too frequently never make it into production.

“Effective machine learning models are the foundation of successful data science projects, but the volume and variety of data facing enterprises can stall these initiatives before they ever get off the ground,” said Greg Pavlik, senior vice president product development, Oracle Data and AI Services. “With Oracle Cloud Infrastructure Data Science, we’re improving the productivity of individual data scientists by automating their entire workflow and adding strong team support for collaboration to help ensure that data science projects deliver real value to businesses.”

Designed for Data Science Teams and Scientists

Oracle Cloud Infrastructure Data Science includes automated data science workflow, saving time and reducing errors with the following capabilities:
• AutoML automated algorithm selection and tuning automates the process of running tests against multiple algorithms and hyperparameter configurations. It checks results for accuracy and confirms that the optimal model and configuration is selected for use. This saves significant time for data scientists and, more importantly, is designed to allow every data scientist to achieve the same results as the most experienced practitioners.
• Automated predictive feature selection simplifies feature engineering by automatically identifying key predictive features from larger datasets.
• Model evaluation generates a comprehensive suite of evaluation metrics and suitable visualizations to measure model performance against new data and can rank models over time to enable optimal behavior in production. Model evaluation goes beyond raw performance to take into account expected baseline behavior and uses a cost model so that the different impacts of false positives and false negatives can be fully incorporated.
• Model explanation: Oracle Cloud Infrastructure Data Science provides automated explanation of the relative weighting and importance of the factors that go into generating a prediction. Oracle Cloud Infrastructure Data Science offers the first commercial implementation of model-agnostic explanation. With a fraud detection model, for example, a data scientist can explain which factors are the biggest drivers of fraud so the business can modify processes or implement safeguards.
Getting effective machine learning models successfully into production needs more than just dedicated individuals. It requires teams of data scientists working together collaboratively. Oracle Cloud Infrastructure Data Science delivers powerful team capabilities including:
• Shared projects help users organize, enable version control and reliably share a team’s work including data and notebook sessions.
• Model catalogs enable team members to reliably share already-built models and the artifacts necessary to modify and deploy them.
• Team-based security policies allow users to control access to models, code and data, which are fully integrated with Oracle Cloud Infrastructure Identity and Access Management.
• Reproducibility and auditability functionalities enable the enterprise to keep track of all relevant assets, so that all models can be reproduced and audited, even if team members leave.
With Oracle Cloud Infrastructure Data Science, organizations can accelerate successful model deployment and produce enterprise-grade results and performance for predictive analytics to drive positive business outcomes.

Comprehensive Data and Machine Learning Services

The Oracle Cloud Data Science Platform includes seven new services that deliver a comprehensive end-to-end experience designed to accelerate and improve data science results:
• Oracle Cloud Infrastructure Data Science: Enables users to build, train and manage new machine learning models on Oracle Clou using Python and other open-source tools and libraries including TensorFlow, Keras and Jupyter.
• Powerful New Machine Learning Capabilities in Oracle Autonomous Database: Machine learning algorithms are tightly integrated in Oracle Autonomous Database with new support for Python and automated machine learning. Upcoming integration with Oracle Cloud Infrastructure Data Science will enable data scientists to develop models using both open source and scalable in-database algorithms. Uniquely, bringing algorithms to the data in Oracle Database speeds time to results by reducing data preparation and movement.
• Oracle Cloud Infrastructure Data Catalog: Allows users to discover, find, organize, enrich and trace data assets on Oracle Cloud. Oracle Cloud Infrastructure Data Catalog has a built-in business glossary making it easy to curate and discover the right, trusted data.
• Oracle Big Data Service: Offers a full Cloudera Hadoop implementation, with dramatically simpler management than other Hadoop offerings, including just one click to make a cluster highly available and to implement security. Oracle Big Data Service also includes machine learning for Spark allowing organizations to run Spark machine learning in memory with one product and with minimal data movement.
• Oracle Cloud SQL: Enables SQL queries on data in HDFS, Hive, Kafka, NoSQL and Object Storage. Only CloudSQL enables any user, application or analytics tool that can talk to Oracle databases to transparently work with data in other data stores, with the benefit of push-down, scale-out processing to minimize data movement.
• Oracle Cloud Infrastructure Data Flow: A fully-managed Big Data service that allows users to run Apache Spark applications with no infrastructure to deploy or manage. It enables enterprises to deliver Big Data and AI applications faster. Unlike competing Hadoop and Spark services, Oracle Cloud Infrastructure Data Flow includes a single window to track all Spark jobs making it simple to identify expensive tasks or troubleshoot problems.
• Oracle Cloud Infrastructure Virtual Machines for Data Science: Preconfigured GPU-based environments with common IDEs, notebooks and frameworks that can be up and running in under 15 minutes, for $30 a day.
What Customers Are Saying

AgroScout is dedicated to detecting early stage crop diseases to improve crop yields, reduce pesticide use and increase profits. “Our vision is to make modern agronomy economically accessible to the 1 billion farmers working on 500 million farms worldwide, constituting 30 percent of the global workforce. We plan to achieve this by offering cloud based, AI-driven sustainable agronomy, relying purely on input from low cost drones, mobile phones and manual inputs by growers,” said Simcha Shore, Founder and CEO AgroScout. “Success of this vision relies on the ability to manage a continuous and increasing flow of input data and our own AI-based solution to transform that data into precision and decision agriculture, at scale. Speed, scale and agility of Oracle Cloud has helped us realize our dream. Now, new horizons have opened up with the recent addition of Oracle Cloud Infrastructure Data Science that improves our data scientists’ ability to collaboratively build, train and deploy machine learning models. This addition has reduced costs, increased efficiency and has helped us increase our global footprint faster.”

IDenTV provides advanced video analytics based on AI capabilities powered by computer vision, automated speech recognition and textual semantic classifiers. “With Oracle Cloud Infrastructure Data Science, we are able to scale our data science efforts to deliver business value faster than ever before. Our data science teams can now seamlessly access data without worrying about the complexities of data locations or access mechanisms. While using open-source capabilities like TensorFlow, Keras, and Jupyter notebooks embedded within the environment, we can streamline our model training and deployment tasks resulting in tremendous cost savings and faster results,” said Amro Shihadah, Founder and COO, IDenTV. “We feel that Oracle Cloud Infrastructure Data Science in conjunction with benefits of Autonomous Database will give us the edge we need to be competitive and unique in the market.”
________________________________________

Contact Info
Nicole Maloney
Oracle
+1.650.506.0806
nicole.maloney@oracle.com
Victoria Brown
Oracle
+1.650.850.2009
victoria.brown@oracle.com

________________________________________


Oracle News: New Database Innovations Deliver a Single Database that Supports all Data


Blog 
Jenny Tsai-Smith, vice president, product management—Feb 12, 2020
________________________________________

Today during his keynote at Oracle OpenWorld London, Oracle Executive Vice President Juan Loaiza announced the latest innovations which further strengthen Oracle’s strategy of providing a single converged database engine able to meet all the needs of a business. The new database features enable customers to take advantage of new technology trends—such as employing blockchain for fraud prevention, leveraging the flexibility of JSON documents, or training and evaluating machine learning algorithms inside the database.

The future is data driven, and effective use of data will increasingly determine a company’s competitiveness. Unlocking the full value of an enterprise’s data requires a new generation of data driven apps. Oracle makes it easy to create modern data driven apps utilizing a single database engine which supports the most suitable data model, process type, and development paradigm for a wide variety of business requirements. We enable our customers to easily run many kinds of workloads against the same data. In contrast, other cloud providers require dozens of different specialized databases to handle different data types. Having to deploy multiple single-purpose databases leads additional challenges. Having to implement multiple different database engines will increase complexity, risk, and cost because each database introduces its own security model, its own set of procedures for implementing high availability, its own scalability capabilities, and requires separate skillsets to operate.

Much in the way a single smartphone is now a camera, a calendar, a platform for entertainment, and a messaging system, the same idea applies to Oracle’s converged database engine. With Oracle Database, enterprises are no longer forced into purchasing multiple individual single-purpose databases, when all they need is one converged database engine that handles everything.

Today, Oracle is announcing several new features which extend the converged capabilities in Oracle Database. These include: 

• Oracle Machine Learning for Python (OML4Py): Oracle Machine Learning (OML) inside Oracle Database accelerates predictive insights by embedding advanced ML algorithms which can be applied directly to the data. Because the ML algorithms are already collocated with the data, there is no need to move the data out of the database. Data scientists can also use Python to extend the in-database ML algorithms.
• OML4Py AutoML: With OML4Py AutoML, even non-experts can take advantage of machine learning. AutoML will recommend best-fit algorithms, automate feature selection, and tune hyperparameters to significantly improve model accuracy.
• Native Persistent Memory Store: Database data and redo can now be stored in local Persistent Memory (PMEM). SQL can run directly on data stored in the mapped PMEM file system, eliminating IO code path, and reducing the need for large buffer caches. Allows enterprises to accelerate data access across workloads that demand lower latency, including high frequency trading and mobile communication.
• Automatic In-Memory Management: Oracle Database In-Memory optimizes both analytics and mixed workload online transaction processing, delivering optimized performance for transactions while simultaneously supporting real-time analytics, and reporting. Automatic In-Memory Management greatly simplifies the use of In-Memory by automatically evaluating data usage patterns, and determining, without any human intervention, which tables would most benefit from being placed in the In-Memory Column Store.
• Native Blockchain Tables: Oracle makes it easy to use Blockchain technology to help identify and prevent fraud. Oracle native blockchain tables look like standard tables. They allow SQL inserts, and inserted rows are cryptographically chained. Optionally, row data can be signed to ensure identity fraud protection. Oracle blockchain tables are simple to integrate into apps. They are able to participate in transactions and queries with other tables. Additionally, they support very high insert rates compared to a decentralized blockchain because commits do not require consensus.
• JSON Binary Data Type: JSON documents stored in binary format in the Oracle Database enables 4X faster updates, and scanning up to 10X faster. 
Oracle’s continuing to lead the industry in delivering the world’s most comprehensive data management solutions, including the industry’s first and only self-driving database, Oracle Autonomous Database. The company was recently named the leader in “The Forrester WaveTM: Translytical Data Platforms, Q4 2019 report which cites that, “unlike other vendors, Oracle uses a dual-format database (row and columns for the same table) to deliver optimal translytical performance,” and that “customers like Oracle’s capability to support many workloads including OLTP, IoT, microservices, multi-model, data science, AI/ML, spatial, graph, and analytics.” 

Explore new Always Free services on Oracle Cloud!
________________________________________

Future Product Disclaimer
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
______________________________________

domingo, 9 de febrero de 2020

Oracle Database 19c DBMS_AUTO_INDEX.drop_secondary_indexes

Oracle Database 19c E.E. para Ingienería de Sistemas, introduce la función de indexación automática, que le permite al motor de base de datos tomar algunas decisiones sobre la administración de los índices en la base de datos.

La función de indexación automática hace lo siguiente.

  • Identifica índices automáticos potenciales basados en el uso de la columna de la tabla. La documentación llama a estos "índices candidatos".
  • Crea índices automáticos como índices invisibles, para que no se utilicen en los planes de ejecución. Los nombres de índice incluyen el prefijo "SYS_AI".
  • Prueba los índices automáticos invisibles con las sentencias SQL para asegurarse de que ofrecen un rendimiento mejorado.
  • Si los índices creados, dan como resultado un rendimiento mejorado, se hacen visibles. Si no se mejora el rendimiento, el índice automático relevante se marca como inutilizable y luego se elimina.
  • Las sentencias SQL probadas contra índices automáticos fallidos se incluyen en la lista negra, por lo que no se considerarán para la indexación automática en el futuro.
  • El optimizador no considera los índices automáticos la primera vez que se ejecuta el SQL en la base de datos.
  • Eliminar índices no utilizados.
Como bien lo indiqué al inicio, esto es sólo posible en equipos de Ingeniería de Software de Oracle, o sea, Exadata, sin embargo, en un ambiente de pruebas, podemos utilizar el parámetro "_exadata_feature_on"=true, para simular esta condición especial.

Lo que hace interesante al paquete DBMS_AUTO_INDEX, es uno de sus parámetros de ejecución: "drop_secondary_indexes".

En mis charlas, en distintas ocasiones cuando me preguntaban sobre recomendaciones para migrar de versiones legadas de base de datos a versiones más actualizadas -sobre todo si se estaba haciéndolo de una versión 10g o inferior a 11g o 12c- me gustaba recomendar el borrar todos aquellos índices que no estuvieran involucrados con las políticas de restricción de PK y FK. Esto sobre todo, por el tema del optimizador de consultas, que a partir de la versión 11g, sólo implementa optimización basada en "Costo" y que el tema de los índices superflúos, era un gran dolor de cabeza.

Este parámetro para el paquete DBMS_AUTO_INDEX, elimina de la base de datos, todos aquellos índices que no cumplan con la condición anteriormente brindada, con el fin de que la característica de autonomía de creación de índices, pueda de manera eficiente, determinar que índices se requieren para las consultas ejecutadas en una base de datos y que no sean satisfechas a través de los índices creados paralelamente para los constraints o políticas de restricción PK y FK.

Veamos esto con un pequeño ejemplo.

Tenemos una tabla creada en una instancia de 19c, en un usuario común y silvestre.


SQL> connect user_test/oracle@source
Connected.

SQL> desc datos
 Name                         Null?    Type
 ---------------------------- -------- ---------------
 EMPLOYEE_ID                           NUMBER(6)
 FIRST_NAME                            VARCHAR2(20)
 LAST_NAME                    NOT NULL VARCHAR2(25)
 EMAIL                        NOT NULL VARCHAR2(25)
 PHONE_NUMBER                          VARCHAR2(20)
 HIRE_DATE                    NOT NULL DATE
 JOB_ID                       NOT NULL VARCHAR2(10)
 SALARY                                NUMBER(8,2)
 COMMISSION_PCT                        NUMBER(2,2)
 MANAGER_ID                            NUMBER(6)
 DEPARTMENT_ID                         NUMBER(4)

Esta tabla no tiene índices en este momento. Vamos a crear un índice sobre la columna "employee_id".

SQL> create index idx01_prueba_borrado on datos ( employee_id );

Index created.

Si hacemos al consulta en el diccionario de la base de datos, ahora aparece el índice creado anteriormente asociado a la tabla "Datos" en el esquema "USER_TEST".

SQL> connect system/oracle@source
Connected.

SQL> col owner format a20
SQL> col index_name format a40

SQL> select owner, index_name from dba_indexes
  2  where table_name='DATOS' and owner='USER_TEST';
/

OWNER                INDEX_NAME
-------------------- ----------------------------------------
USER_TEST            IDX01_PRUEBA_BORRADO

Para tener todo configurado para utilizar la característica de autonomía de creación de índices de la base de datos, es necesario ejecutar primero el siguiente paso:

SQL> EXEC DBMS_AUTO_INDEX.drop_secondary_indexes('USER_TEST');

PL/SQL procedure successfully completed.

Al ejecutar nuevamente la consulta sobre los índices existentes en la tabla "DATOS" del esquema indicado, ahora ya no aparece dicho índice.

SQL> select owner, index_name from dba_indexes
  2  where table_name='DATOS' and owner='USER_TEST';

no rows selected

SQL>

Una vez más el tiempo, nos da la razón. Todo cae por su propio peso.

sábado, 8 de febrero de 2020

El algoritmo LRU y las exploraciones de tabla completa – FULL TABLE SCAN

Sabías que ...
Cuando un proceso de usuario realiza un escaneo completo de una tabla (FTS), Oracle reubica los bloques recuperados en el buffer cache, al final de la lista LRU (en vez del extremo MRU -Most Recently Used ).
Esto se debe a que generalmente solo se necesita una vez la tabla completamente escaneada, por lo que los bloques se deben mover rápidamente para dejar disponibles los bloques que se usan con más frecuencia en el caché.
Usted puede controlar este comportamiento predeterminado de los bloques involucrados en las exploraciones de las tablas en una base de datos.
Para especificar que los bloques de la tabla se colocarán en el extremo MRU de la lista durante un escaneo completo de la tabla, se utiliza la cláusula CACHE, al crear o modificar la tabla.
Puede especificar este comportamiento para búsqueda en tablas pequeñas o tablas históricas grandes para evitar el I/O en accesos posteriores de la tabla.

Todos los Sábados a las 8:00PM

Optimismo para una vida Mejor

Optimismo para una vida Mejor
Noticias buenas que comentar